Wednesday, March 7, 2012

Index Discussion

Hi Folks,

Got a topic open for debate.

We currently have an archive table - DDL

CREATE TABLE [dbo].[Audit] (
[id] [int] identity (1,1) NOT NULL ,
[col1] [char] (10) NOT NULL ,
[col2] [char] (15) NOT NULL ,
[col3] [int] NOT NULL ,
[col4] [varchar] (50) NOT NULL ,
[col5] [datetime] NOT NULL ,
[col6] [varchar] (4000) NULL ,
[col7] [char] (3) NULL
)
GO

This table grows to about 40 million rows during the course of the month. The table has a clustered index on the id field and a non clustered index on the col2 and col3. The id column is not used in queries. At the moment we run weekly dbcc reindexes on all the indexes. We are running into a space issue on the reindex of the clustered index (copying the whole table out , ordering etc) and are considering dropping the index or changing to a non clustered index. (The DBCC utility that we have built will only rebuilt all the indexes or none at all.)

I feel this is not a good idea and know my reasons. I would like some input as to why this might prove a bad idea.

Will it increase page splitting? Will the table performance be impacted even if the queries are not specifically using the clustered index?

What are the reasons for and against?

Thanks FolksWhy do yuo have an [id] column if you don't use it?

I am IDENTITY, there for I am....

Drop the Index

The when people start screaming, the create a non unique index...|||The when people start screaming, the create a non unique index...:D :D ... they don't scream... simply blame SQL server with no cause.:mad:|||I never designed the schema. Bag of pish if you ask me.

Nevertheless before I drop the index :

Will the table start to split pages if we lose the clustered ?

The table is inserted into the by date order. There is no index on the datatime field but the id field clustered index maintains the date order of the table. Would go if the clustered index was dropped?

Do the other non-clustered indexes not use the clustered as a backbone? Will the non-clustered grow if the clustered was dropped?

Am basing my concerns over dropping this index mainly from the advice on the link below.

http://www.sql-server-performance.com/clustered_indexes.asp

More conjecture please.|||OK, first, the order of data in a database has no meaning...

Second, (and I should have said this earlier), do NO alteration in a prod environment until you tested ANY approach in DEV

Me telling you to (and off the cuff) to just drop the index was so bad, I had to drink many margaritas to forget it...

Well, ok, I'm always looking for an excuse...

And no, indexes are not dependant on each other...

And keeping an IDENTITY to make sure the dates are in the right order (did I read that right) doesn't make sense to me...

The big question is...

CREATE TABLE with clustered index

Data is meant to be stored in that order...however data will be put on pages where it finds room...how much free space?

When the table is REORG'ed it will order the data by it...

Now the question...you drop a cluster, and then reorg...what happens?

Don't know, I'll have to test it...

However, I think you have bigger issues...|||You know what? I wanted to do the following, and then reorg the data pages...(it's a db2 term I guess), and realized I don't know how, except to unload and load (OK, another db2 term, bcp out and bcp in)...

Anyone?

I've DBCC REINDEX, but it doesn't mention anything about the pages...

got to be a way...

USE Northwind
GO

CREATE TABLE myTable99 (
Col1 int NOT NULL
, Col2 char(1) NOT NULL
)
GO

CREATE UNIQUE CLUSTERED INDEX myTable99_IX1 ON myTable99(Col1)
CREATE INDEX myTable99_IX2 ON myTable99(Col2)
GO

sp_help myTable99
GO

INSERT INTO myTable99(Col1, Col2)
SELECT 1, 'A' UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 3, 'C' UNION ALL
SELECT 4, 'D'
GO

SELECT * FROM myTable99
GO

DROP INDEX myTable99.myTable99_IX1
GO

sp_help myTable99
GO

INSERT INTO myTable99(Col1, Col2)
SELECT 5, 'E' UNION ALL
SELECT 6, 'F' UNION ALL
SELECT 7, 'G' UNION ALL
SELECT 8, 'H'
GO

SELECT * FROM myTable99
GO|||Thanks for the feedback Brett,

Don't worry - Would never drop an index on a table in prod without fully testing and understanding the implications before doing so

Hence this thread...

Going to some testing and get back to you.

Is there a way in T-SQL you can check the size of a specific index?

Don't trust EM ....|||Originally posted by aldo_2003
Don't trust EM ....

Good...

Of a specific index?

Anyone...

sp_spaceused myTable99
GO

Will tell you the size of all...

I'll keep looking...

EDIT: IF this was DB2 I'd have an answer...|||sysindexes tells you how many pages where used

can we simply multiply this by 8kb to get the answer?|||ANytime to get accurate sizes better to DBCC UPDATEUSAGE or use @.UPDATEUSAGE='TRUE' in SP_SPACEUSED statements.|||Brett, Satya ,

Have done a bit of testing

Inserted 6 million rows into this table with all the indexes
i.e 1 clustered and 2 non clustered

Did sp_spaceused with DBCC UPDATEUSAGE
index size = 233278kb

Then i dropped the clustered index

Did sp_spaceused with DBCC UPDATEUSAGE

index size = 284104kb

Why has the total index size gone up when I have dropped the an index?

What is going on ??

"And no, indexes are not dependant on each other..."?

Anybody ?|||Have you performed DBCC DBREINDEX before and after CLustered Index drop?

I've bit doubt in this regard after this it should return correct sizes.|||Have done this and tested

Still get the same result -

i.e increse in overall index size when I drop the clustered index

has anyone else noticed this behaviour or am I the only one.

shame SQL Server has no T-SQL to check out the size of specific indexes|||Thats for sure there is no direct deal to get the result.
http://www.sql-server-performance.com/q&a13.asp - review for information.

HTH|||Thanks Satya,

Still don't know what is happening with my indexes though|||Index internals. I still wish I had a chance to get to that lecture when it was around last, but here is what I do know.

A clustered index is an index that has the data pages as its leaf pages. In otherwords, this is the order in which the data is supposed to be stored on disk. With extent switches, and pages from other indexes peppered in, the data may not be contiguous, but the theory is there. The beauty of a clustered index is that if you are expecting ranges of data to be scanned, the data is all in a nice row on the disk to be scooped up. The bad side is that if you are planting data in the table in a random order, you end up with all sorts of page splits. This is why clustered indexes on Identity columns became all the rage.

You may remember that Microsoft suggests that you make the clustered key as small as possible, as well. With just the above reasons, there is no justification for this, so there is a second reason. Any non-clustered index will use the clustered index key in place of the rowid, if a clustered index exists. This means that a generic record for a non-clustered index looks like this:

indexed column1, indexed column2..., indexed columnn:clustered index column1, clustered index column2,...clustered index columnn

When you have an integer as the sole clustered index key, the second part of the non-clustered index row is quite small (4 bytes), but if you substitute a rowid (I think a rowid is fileid:pageid:slot number), then you have increased the size of the individual records in the index.

Clear as mud?

Now, the question becomes, are you running queries on the archive table?|||Excellent Answer MCrowley

So non clustered indexes do use clustered indexes to assist there own structure.

We do run queries against the archive table but performance is not a key factor. Non OLTP type enviroment.

What is an issue is the space and especially the space when a rebuild of the clustered index occurs.

I'm going to take some of the queries run by our users into our dev enviroment and make sure that the server does not freak out when I run the same queries after dropping the clustered index.

Thanks to all whom have helped me out on this.
Learned quite a bit about indexes this week - time for a bevy ...!|||One last thing I forgot to mention. I believe that when you run dbcc dbreindex against a clustered index, the whole table (i.e. the leaf nodes of the clustered index) is copied to a new location. So effectively you need to have as much free space in the database as the table takes up, in order to be successful. DBCC DBREINDEX against a clustered index (or just run against the table name) also has the unfortunate side effect of rebuilding all of the non-clustered indexes, too, so you have to add that space on, too. DBCC INDEXDEFRAG is not as effective as DBREINDEX, but it is nicer to the system.|||Thanks again

Lets hope future versions of SQL make easier for DBA's to size all objects (i.e specific indexes) in the databases with the ability to attribute the overall size of the database to the sum of the objects within it.

All the best.|||Testing at my end proves to be working in terms of sizes what you're looking for.

For instance with clustered index presence database free space was 2.4gigs and after removal it was 2.8gigs.

I will explain more about this on Monday.:cool:|||From BOL:

Nonclustered indexes can be defined on a table with a clustered index, a heap, or an indexed view. In Microsoft SQL Server 2000, the row locators in nonclustered index rows have two forms:

If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.

If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
Because nonclustered indexes store clustered index keys as their row locators, it is important to keep clustered index keys as small as possible. Do not choose large columns as the keys to clustered indexes if a table also has nonclustered indexes.|||I would also experiment with having col5, col2, and col3 as clustered index, and id as unique constraint against existing queries.|||The whole point is to save some space on server, so not in terms of performance perspective. By dropping the existing clustered index it can save 500megs atleast and by adding this composite clustered index it will addup more space.|||Originally posted by Satya
The whole point is to save some space on server, so not in terms of performance perspective. By dropping the existing clustered index it can save 500megs atleast and by adding this composite clustered index it will addup more space.

First, I said "experiment", second, - I was trying to combine the need for performance to be retained while trying to eliminate the need for reindexing on a weekly basis by structuring the clustered index in such a way that reindexing will not be needed. And I think this approach will work better than dropping the index while still starving for space maybe in a couple of weeks due to increase in data (my 2 cents)|||Originally posted by rdjabarov
First, I said "experiment", second, - I was trying to combine the need for performance to be retained while trying to eliminate the need for reindexing on a weekly basis by structuring the clustered index in such a way that reindexing will not be needed. And I think this approach will work better than dropping the index while still starving for space maybe in a couple of weeks due to increase in data (my 2 cents)
No worries mate, just hurl thru.

No comments:

Post a Comment