Friday, March 30, 2012

Index Tuning

Hi All!
I have this table:
CREATE TABLE [dbo].[Constructions] (
[ConstructionID] [int] IDENTITY (1, 1) NOT NULL , --PK
[TypeID] [int] NOT NULL ,
[TerritoryID] [int] NOT NULL,
[BeginDate] [datetime] NOT NULL ,
[EndDate] [datetime] NOT NULL ,
[IsCancelled] [bit] NOT NULL
) ON [PRIMARY]
With over 1.000.000 records and daily inserts of 10.000 records.
The fields that are updated by users from time to time are BeginDate,
EndDate and IsCancelled.
This is a table of an online database and the indexes could by rebuild once
a day or so.
Here is the question:
What index would be the appropriate for this type of query (highly used):
SELECT
ConstructionID, TypeID, TerritoryID, BeginDate, EndDate
FROM
Constructions
WHERE
TerritoryID = @.TerritoryID
AND
EndDate < GETDATE()
AND
IsCancelled = 0I guess
1) Create an index on any very selective column. If something is very
selective then that's the only index you need!
2) Create in index on any combination of selective criteria. You don't need
everything from the WHERE clause but try and supply the combination of
columns that will yield a selective set!
3) If nothing is selective - even when combined - then cover the query!
"Jorgebg" <Jorgebg@.discussions.microsoft.com> wrote in message
news:C72F8FBF-5E27-4271-A5DA-BD2DD8A3AD3B@.microsoft.com...
> Hi All!
> I have this table:
> CREATE TABLE [dbo].[Constructions] (
> [ConstructionID] [int] IDENTITY (1, 1) NOT NULL , --PK
> [TypeID] [int] NOT NULL ,
> [TerritoryID] [int] NOT NULL,
> [BeginDate] [datetime] NOT NULL ,
> [EndDate] [datetime] NOT NULL ,
> [IsCancelled] [bit] NOT NULL
> ) ON [PRIMARY]
> With over 1.000.000 records and daily inserts of 10.000 records.
> The fields that are updated by users from time to time are BeginDate,
> EndDate and IsCancelled.
> This is a table of an online database and the indexes could by rebuild
> once
> a day or so.
> Here is the question:
> What index would be the appropriate for this type of query (highly used):
> SELECT
> ConstructionID, TypeID, TerritoryID, BeginDate, EndDate
> FROM
> Constructions
> WHERE
> TerritoryID = @.TerritoryID
> AND
> EndDate < GETDATE()
> AND
> IsCancelled = 0|||It is pure guesswork without being able to analyze the actual data in
the table, but I suspect that an index on (TerritoryID, IsCancelled,
EndDate) would be a good place to start.
If I had access to the table I would be trying to get a feel for the
data by running queries along the lines of:
select IsCancelled, count(*) as rows
from Constructions
group by IsCancelled
select count(distinct TerritoryID) from Constructions
select TerritoryID, count(*) as rows,
sum(case when IsCancelled = 1 then 1 else 0 end) as Cancelled
from Constructions
group by TerritoryID
order by 2 desc
The idea behind queries like this is to start to understand the data.
This includes how selective each column is and how evenly distributed
the data is. A table with 100 values for TerritoryID is one thing,
but if 1 TerritoryID out of the hundred has 80% of all rows that is
something else. Likewise knowing if 1%, or 50%, or 99% of the rows
are cancelled makes rather a large difference.
Roy Harvey
Beacon Falls, CT
On Tue, 6 Jun 2006 02:21:01 -0700, Jorgebg
<Jorgebg@.discussions.microsoft.com> wrote:

>Hi All!
>I have this table:
>CREATE TABLE [dbo].[Constructions] (
> [ConstructionID] [int] IDENTITY (1, 1) NOT NULL , --PK
> [TypeID] [int] NOT NULL ,
> [TerritoryID] [int] NOT NULL,
> [BeginDate] [datetime] NOT NULL ,
> [EndDate] [datetime] NOT NULL ,
> [IsCancelled] [bit] NOT NULL
> ) ON [PRIMARY]
>With over 1.000.000 records and daily inserts of 10.000 records.
>The fields that are updated by users from time to time are BeginDate,
>EndDate and IsCancelled.
>This is a table of an online database and the indexes could by rebuild once
>a day or so.
>Here is the question:
>What index would be the appropriate for this type of query (highly used):
>SELECT
> ConstructionID, TypeID, TerritoryID, BeginDate, EndDate
>FROM
> Constructions
>WHERE
> TerritoryID = @.TerritoryID
> AND
> EndDate < GETDATE()
> AND
> IsCancelled = 0|||These are good suggestions. Alternatively, you could try using the Database
Tuning Advisor (or the Indexing Tuning Wizard in SQL Server 2000) to tune
your indexes for your whole workload.
SQL Server 2005 also includes the "Missing Indexes" feature which is
suitable for the task. Check the SQL Server 2005 Books Online for details on
the above options.
Regards,
Leo
"Roy Harvey" <roy_harvey@.snet.net> wrote in message
news:4joa825gpkmpqe6ojmm2ig00fs338873f9@.
4ax.com...
> It is pure guesswork without being able to analyze the actual data in
> the table, but I suspect that an index on (TerritoryID, IsCancelled,
> EndDate) would be a good place to start.
> If I had access to the table I would be trying to get a feel for the
> data by running queries along the lines of:
> select IsCancelled, count(*) as rows
> from Constructions
> group by IsCancelled
> select count(distinct TerritoryID) from Constructions
> select TerritoryID, count(*) as rows,
> sum(case when IsCancelled = 1 then 1 else 0 end) as Cancelled
> from Constructions
> group by TerritoryID
> order by 2 desc
> The idea behind queries like this is to start to understand the data.
> This includes how selective each column is and how evenly distributed
> the data is. A table with 100 values for TerritoryID is one thing,
> but if 1 TerritoryID out of the hundred has 80% of all rows that is
> something else. Likewise knowing if 1%, or 50%, or 99% of the rows
> are cancelled makes rather a large difference.
> Roy Harvey
> Beacon Falls, CT
>
> On Tue, 6 Jun 2006 02:21:01 -0700, Jorgebg
> <Jorgebg@.discussions.microsoft.com> wrote:
>|||On Tue, 6 Jun 2006 02:21:01 -0700, Jorgebg wrote:

>Hi All!
>I have this table:
>CREATE TABLE [dbo].[Constructions] (
> [ConstructionID] [int] IDENTITY (1, 1) NOT NULL , --PK
> [TypeID] [int] NOT NULL ,
> [TerritoryID] [int] NOT NULL,
> [BeginDate] [datetime] NOT NULL ,
> [EndDate] [datetime] NOT NULL ,
> [IsCancelled] [bit] NOT NULL
> ) ON [PRIMARY]
Hi Jorgebg,
The "--PK" comment suggests that the IDENTITY column is the primary key,
but you didn't declare it as such. SQL Server will not automatically
define a PRIMARY KEY constsaint for IDENTITY columns.
In addition to the identity surrogate key, your table should also have a
business key (and you should declare a UNIQUE constraint for it).
(snip)
>Here is the question:
>What index would be the appropriate for this type of query (highly used):
>SELECT
> ConstructionID, TypeID, TerritoryID, BeginDate, EndDate
>FROM
> Constructions
>WHERE
> TerritoryID = @.TerritoryID
> AND
> EndDate < GETDATE()
> AND
> IsCancelled = 0
Suggestion 1:
CREATE NONCLUSTERED INDEX ix1
ON Constructions (TerritoryID, IsCancelled, EndDate)
Suggestion 2 - assuming you don't have a CLUSUTERED index, or that you
can change it:
CREATE CLUSTERED INDEX ix2
ON Constructions (TerritoryID, IsCancelled, EndDate)
Suggestion 3.1 - assuming you have a CLUSTERED index on ConstructionID
and don't want to change it, AND assuming you're using SQL Server 2000:
CREATE NONCLUSTERED INDEX ix3_1
ON Constructions (TerritoryID, IsCancelled, EndDate, TypeID,
BeginDate)
Suggestion 3.2 assuming you have a CLUSTERED index on ConstructionID and
don't want to change it, AND assuming you're using SQL Server 2005
CREATE NONCLUSTERED INDEX ix3_1
ON Constructions (TerritoryID, IsCancelled, EndDate)
INCLUDE (TypeID, BeginDate)
Hugo Kornelis, SQL Server MVP|||I may be mistaken, but since isCancelled is a bit column, it is a poor
choice for an index. In fact, I don't think SQL Server will even allow it
(at least it does not on 2000). Otherwise, the advice you have gotten so
far should get you the performance that you need.
"Hugo Kornelis" <hugo@.perFact.REMOVETHIS.info.INVALID> wrote in message
news:ti9d82d667kunlbek0mrfu6midipucq7g7@.
4ax.com...
> On Tue, 6 Jun 2006 02:21:01 -0700, Jorgebg wrote:
>
> Hi Jorgebg,
> The "--PK" comment suggests that the IDENTITY column is the primary key,
> but you didn't declare it as such. SQL Server will not automatically
> define a PRIMARY KEY constsaint for IDENTITY columns.
> In addition to the identity surrogate key, your table should also have a
> business key (and you should declare a UNIQUE constraint for it).
> (snip)
> Suggestion 1:
> CREATE NONCLUSTERED INDEX ix1
> ON Constructions (TerritoryID, IsCancelled, EndDate)
> Suggestion 2 - assuming you don't have a CLUSUTERED index, or that you
> can change it:
> CREATE CLUSTERED INDEX ix2
> ON Constructions (TerritoryID, IsCancelled, EndDate)
> Suggestion 3.1 - assuming you have a CLUSTERED index on ConstructionID
> and don't want to change it, AND assuming you're using SQL Server 2000:
> CREATE NONCLUSTERED INDEX ix3_1
> ON Constructions (TerritoryID, IsCancelled, EndDate, TypeID,
> BeginDate)
> Suggestion 3.2 assuming you have a CLUSTERED index on ConstructionID and
> don't want to change it, AND assuming you're using SQL Server 2005
> CREATE NONCLUSTERED INDEX ix3_1
> ON Constructions (TerritoryID, IsCancelled, EndDate)
> INCLUDE (TypeID, BeginDate)
>
> --
> Hugo Kornelis, SQL Server MVP|||On Wed, 7 Jun 2006 10:44:48 -0400, Jim Underwood wrote:

>I may be mistaken, but since isCancelled is a bit column, it is a poor
>choice for an index. In fact, I don't think SQL Server will even allow it
>(at least it does not on 2000).
Hi Jim,
Sorry, but you are wrong. Here's a script to prove it. If you execute
just the two SELECT statements with the option to show execution plan
on, you'll see that the index is not only create but also used.
CREATE TABLE BitTest
(PKCol int NOT NULL IDENTITY PRIMARY KEY,
BitCol bit NOT NULL)
go
CREATE INDEX x_bit ON BitTest (BitCol)
go
INSERT INTO BitTest (BitCol) VALUES (0)
INSERT INTO BitTest (BitCol) VALUES (0)
INSERT INTO BitTest (BitCol) VALUES (1)
INSERT INTO BitTest (BitCol) VALUES (1)
INSERT INTO BitTest (BitCol) VALUES (0)
INSERT INTO BitTest (BitCol) VALUES (1)
INSERT INTO BitTest (BitCol) VALUES (0)
go
-- Searched query - uses index s on x_bit
SELECT * FROM BitTest WHERE BitCol = CAST(1 AS bit)
-- Query w/o WHERE - uses index scan on x_bit
SELECT * FROM BitTest
go
DROP TABLE BitTest
go
Here's the version of SQL Server I tested this on:
SELECT @.@.Version
go
Microsoft SQL Server 2000 - 8.00.2187 (Intel X86)
Mar 9 2006 11:38:51
Copyright (c) 1988-2003 Microsoft Corporation
Developer Edition on Windows NT 5.1 (Build 2600: Service Pack 2)
Hugo Kornelis, SQL Server MVP|||Hugo,
Thanks for that example. The results were identical on my installation.
I was misled by this quote from BOL.
bit
Integer data type 1, 0, or NULL.
Remarks
Columns of type bit cannot have indexes on them.
That will teach me to take BOL for granted without testing first.
That said, would it be preferable to put the bit col as the last column in
the index, after EndDate? Where EndDate will be more selective than
isCancelled? Unless, of course, isCancelled has very few values of 0.
"Hugo Kornelis" <hugo@.perFact.REMOVETHIS.info.INVALID> wrote in message
news:tqfe82p289pqt02752cshltfhaoroubl4n@.
4ax.com...
> On Wed, 7 Jun 2006 10:44:48 -0400, Jim Underwood wrote:
>
it
> Hi Jim,
> Sorry, but you are wrong. Here's a script to prove it. If you execute
> just the two SELECT statements with the option to show execution plan
> on, you'll see that the index is not only create but also used.
> CREATE TABLE BitTest
> (PKCol int NOT NULL IDENTITY PRIMARY KEY,
> BitCol bit NOT NULL)
> go
> CREATE INDEX x_bit ON BitTest (BitCol)
> go
> INSERT INTO BitTest (BitCol) VALUES (0)
> INSERT INTO BitTest (BitCol) VALUES (0)
> INSERT INTO BitTest (BitCol) VALUES (1)
> INSERT INTO BitTest (BitCol) VALUES (1)
> INSERT INTO BitTest (BitCol) VALUES (0)
> INSERT INTO BitTest (BitCol) VALUES (1)
> INSERT INTO BitTest (BitCol) VALUES (0)
> go
> -- Searched query - uses index s on x_bit
> SELECT * FROM BitTest WHERE BitCol = CAST(1 AS bit)
> -- Query w/o WHERE - uses index scan on x_bit
> SELECT * FROM BitTest
> go
> DROP TABLE BitTest
> go
> Here's the version of SQL Server I tested this on:
> SELECT @.@.Version
> go
> Microsoft SQL Server 2000 - 8.00.2187 (Intel X86)
> Mar 9 2006 11:38:51
> Copyright (c) 1988-2003 Microsoft Corporation
> Developer Edition on Windows NT 5.1 (Build 2600: Service Pack 2)
> --
> Hugo Kornelis, SQL Server MVP|||> That said, would it be preferable to put the bit col as the last column in
> the index, after EndDate? Where EndDate will be more selective than
> isCancelled?
Possibly. Remember that statistics is only kept for the first column in the
index. So if you have:
WHERE bitcol = 1
AND othercol = 2786
Then SQL Server doesn't know the selectivity for the "othercol = 2768" condi
tion, and because of
that the cost estimation can be off. Having othercol as the first column in
the index mean
statistics is available for that column and selectivity for othercol = 2786
can be determined.
However, if the index is defined over (othercol, bitcol) and you have anothe
r query:
WHERE bitcol = 1
Then SQL Server cannot use that index, as bitcol isn't the first column in t
hat index.
Tradeoffs...
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Jim Underwood" <james.underwoodATfallonclinic.com> wrote in message
news:ecQcasviGHA.4504@.TK2MSFTNGP03.phx.gbl...
> Hugo,
> Thanks for that example. The results were identical on my installation.
> I was misled by this quote from BOL.
> bit
> Integer data type 1, 0, or NULL.
> Remarks
> Columns of type bit cannot have indexes on them.
> That will teach me to take BOL for granted without testing first.
> That said, would it be preferable to put the bit col as the last column in
> the index, after EndDate? Where EndDate will be more selective than
> isCancelled? Unless, of course, isCancelled has very few values of 0.
>
> "Hugo Kornelis" <hugo@.perFact.REMOVETHIS.info.INVALID> wrote in message
> news:tqfe82p289pqt02752cshltfhaoroubl4n@.
4ax.com...
> it
>|||Actually, my thought process was a little different, let me try to explain.
I may be way off here, but this is how I thought multicolumn indexes
worked...
We query for (yes, I realize this is different from the OP) :
TerritoryID = @.TerritoryID
AND EndDate = @.EndDate
AND IsCancelled = 0
Lets say we have 1,000,000 rows of data, and we have an index on
(TerritoryID, IsCancelled, EndDate)
@.TerritoryID narrows the search to 100 records
IsCancelled = 0 cuts that in half to 50 records
@.EndDate narrows those 50 records to 5 records
Now, lets say we have an index on (TerritoryID, EndDate, IsCancelled)
@.TerritoryID narrows the search to 100 records
@.EndDate narrows those 100 records to 10 records
IsCancelled = 0 cuts that in half to 5 records
In the case above wouldn't it be more efficient to have the date as the
second column in the index? Essentially, put the most selective column
before the others in the index. Of course, this assumes that SQL Server's
algorithms use the physical order of the columns to filter data, which may
not be the case.
Upon reviewing the query in the OP, it occurs to me that EndDate <= GetDate
may actually be less selective than IsCancelled = 0, and that for that
specific query having the bit column precede the date may be more efficient
after all.
"Tibor Karaszi" <tibor_please.no.email_karaszi@.hotmail.nomail.com> wrote in
message news:OS58CwviGHA.1600@.TK2MSFTNGP04.phx.gbl...
in
> Possibly. Remember that statistics is only kept for the first column in
the index. So if you have:
> WHERE bitcol = 1
> AND othercol = 2786
> Then SQL Server doesn't know the selectivity for the "othercol = 2768"
condition, and because of
> that the cost estimation can be off. Having othercol as the first column
in the index mean
> statistics is available for that column and selectivity for othercol =
2786 can be determined.
> However, if the index is defined over (othercol, bitcol) and you have
another query:
> WHERE bitcol = 1
> Then SQL Server cannot use that index, as bitcol isn't the first column in
that index.
> Tradeoffs...
> --
> Tibor Karaszi, SQL Server MVP
> http://www.karaszi.com/sqlserver/default.asp
> http://www.solidqualitylearning.com/
>
> "Jim Underwood" <james.underwoodATfallonclinic.com> wrote in message
> news:ecQcasviGHA.4504@.TK2MSFTNGP03.phx.gbl...
in
allow
>

No comments:

Post a Comment