Wednesday, March 21, 2012

Index Performance

Hi,

I am trying to improve performance in some statements i run against a 15M rows table.

I have batchs that inserts about 6M rows and some that insert 500K to 1M rows...

Statistically i have a match of 1 to 3.

I am currently disabling all indexes in the destination table and running the insert batch's.

After it finishes the statements i execute the rebuild of all indexes...

This takes up to 4 hours to run... ( 3 of them used just to rebuild the indexes )

My question is: Am i using the best aproach? Or is it better to leave the indexes enabled and do the inserts batch with them?

I Hope i was clear enough Smile

Best Regards,

Yes, that is most often the best approach. It is much faster to add data to a heap, and then to reorganize the data just once. Generally speaking, doing large quantities of inserts would already likely fragment the indexes and need to be rebuilt anyhow.

|||

No, you do NOT want to leave the indexes 'enabled' during the imports.

You may wish to try the following to determine if performance can be improved.

If your situation allows:

switch the database RECOVERY model to 'Simple',

Drop the indexes,

Make the imports in smaller 'batches' (Perhaps 100K rows), Looping until complete

ReBuild the Indexes

switch back to FULL recovery when finished,

and then make a FULL BACKUP.

Be sure that you Build any Clustered Indexex first!

|||

Thats exactly what i am doing...


DB's are in simple mode forever.. because we dont need transactions.

Indexes are not droped but instead are disabled...

I execute the batch (dont use loops because performance is ok - i only use loops for delete's )

I execute Rebuild with ALL on Table_name parameter.

So i guess theres no way around.... I have to wait 3 hours for the indexes to get rebuild all the time... even if i only insert 1 row... Sad

Also the post above speakes on Reorganizing indexes but i guess he meant Rebuilding no? Changing that amount of rows would definitly need a rebuild right?


Regards,

|||

You can also try, if possible, importing to a dummy table, leaving the original table in tact, then later insert into the real table.

In this case, you have 1 hour to import + 1 hour to INSERT = 2 hours

You've just saved 2 hours on waiting for the indexes to be re-enabled.

Just my twist on it,

Adamus

|||

Sorry?


But my batch is already inserting from a temporary table into a final one... So you would do it with the indexes up?

Regards

|||

So you are bulk INSERTING not bulk IMPORTING.

Correct?

Adamus

|||

Yes sorry... it was my english that was right in the first post and wrong in the second when i mentioned import instead of insert Smile

To take out any doubts i am bulk inserting from one temporary table into a final one...

You are true if i was importing anyway Smile

Best Regards,

|||

Are are overwriting or appending to the existing records in the final table?

What I'm getting at is, instead of touching the records for the sake of reports and processing, why not rename the temp table to the name of the final table, if it exists in SQL, and create a new temp table?

If this is possible, add the indexes to the new table.

If you are overwriting:

|

V

Drop Final Table

|

V

Rename Temp Table to Final Table

|

V

Create New Temp Table

|

V

Apply indexes to new final table

Adamus

|||

>>Also the post above speakes on Reorganizing indexes but i guess he meant Rebuilding no? Changing that amount of rows would definitly need a rebuild right?<<

No,if you are just adding rows to a table, you can let SQL Server reorganize the pages of the index instead of doing a full rebuild. It can be done while others are using the table. In 2005, ALTER INDEX...REORGANIZE, in 2000, it is a DBCC.

>>So i guess theres no way around.... I have to wait 3 hours for the indexes to get rebuild all the time... even if i only insert 1 row... <<

Now here is a tricky question. You would be silly to drop all of the indexes and rebuild the table for a single row insert (unless your only job is to do this, and while you wait you can read books on database design. Then it would be a good thing Smile

Seriously, all the answers we gave you were based on a BULK load of the table. Meaning you were the only user at the time, pumping in a very high percentage of rows. Your original question stated:

>>

I am trying to improve performance in some statements i run against a 15M rows table.

I have batchs that inserts about 6M rows and some that insert 500K to 1M rows..<

You are significantly changing the shape of the table if you are adding or even changing keys of .1 of the table. So the fastest thing to do in this case is like to employ one of the techniques mentioned, and use a BULK operation and rebuild indexes.

We haven't even talked about what you are clustering on, or how many indexes, all important things to discuss, if you are only adding (or changing) far fewer rows. Testing is the key. I would suggest you write at least two or three scenarios and test.

If rowcount < N, then just insert directly.

If rowcount >= N and < M, disable these indexes, insert your rows, then add them back,

if rowcount >= M, disable/drop all indexes (including Clustered) and SSIS the rows in.

Especially if you are running short of time for your (presumably) off hours processing to take place

|||

Hi Arnie,

I can see the benefits of switching to "bulk-logged" recovery model, but siwtching to "simple" will break the sequence of transaction log backups.

Switching from the Full or Bulk-Logged Recovery Model

http://msdn2.microsoft.com/en-us/library/ms190203.aspx

AMB

|||

Alejandro,

I didn't indicate, but assumed (and yes, I know that one shouldn't assume) that a full backup would be made before changing the recovery model

I should have been more directly clear on that point.

Your point about the chain of backups is very important. Thanks for adding that to clarify the conversation.

|||

I guess you are using SS 2005. Do a test without disabling the indexes, because SS can decide to use a strategy sometimes

called index-at-a-time, were instead updating each index by each row inserted, SS gather a group of rows, sort them per each

index and merge them. The advantage is that SS will not access same index page more than once.

You can read about this estrategy in the book:

Inside Microsoft SQL Server 2005: The Storage Engine

by Kalen Delaney

Chapter 7 - Index Internals and Management

Table-Level vs. Index-Level Data Modification

Tame Bulky Inserts

By: Kalen Delaney

http://www.sqlmag.com/Articles/Index.cfm?ArticleID=25521&DisplayTab=Article

AMB

|||

Well my table suffers changes in adding and removing records... so this is alot of change happening here... a complete transformation of the final result.

So i think rebuild is definitly a must :/ I didn't thought it would take that amount of time!

Regards

No comments:

Post a Comment