Friday, February 24, 2012

Index Creation Before Bulk Insert

This weekend, I tried to run some large bulk insert statements followed by create index statements on the tables I inserted the data into. For some reason, it didn't finish the largest insert statement. However, the index for that table started creating. Then my log file grew to a half a terabyte. Now the cancel statement is taking forever. What am I doing wrong? Here is the part of the code.

..........

BULK INSERT dbo.bigtable FROM 'data' WITH (TABLOCK);

GO

USE [Database]

GO

/****** Object: Index [ix_ID] Script Date: 04/27/2007 14:34:41 ******/

CREATE CLUSTERED INDEX [ix_ID] ON [dbo].[bigtable]

(

[ID] ASC

)WITH (PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PS_Year]([DatafileYear])

GO

On a huge table, such as you seem to be describing, I think there is a lot of effort required to order the table data to conform to a CLUSTERED INDEX.

You may want to try a non-clustered index and see what sort of performance penalty you incur.

Just a suggestion...

Dan

|||

Are there any reason why create clustered index after bulk insert ?

If create clustered index after bulk insert, SQL Server need a lot of data and log space for sort/move data.

|||

To try to assess the performance gains of having a CLUSTERED index versus a Non-CLUSTERED index on the same columns, I made some changes today to some of my data tables, on which I did not earlier have a CLUSTERED index.

I used the same columns, same column order, in the CLUSTERED index as in a Non-CLUSTERED index that I deleted.

A typical table size for the 6 tables I changed is 100MB and 1,000,000 rows.

I then ran a process that has maybe 50-100 independent SQL queries that perform various SUMs and other aggregations of the data in these tables, subject to different WHERE clauses, etc. The GROUP BY aspects of these SUMs use the columns specified in the CLUSTERED index. (I have run this process many, many times, so I have reasonable statistics on its typical duration.)

The CLUSTERED indexes seemed to give a performance gain of approximately 10%.

So there seems to be some benefit, for tables of this size, to have a CLUSTERED index that is reasonably well designed. But the benefit seems to be modest, i.e., around 10%, instead of the substantial benefit gained by having ANY appropriate index, whether CLUSTERED or Non-CLUSTERED.

FYI.

Dan

No comments:

Post a Comment