Sunday, February 19, 2012

index bloat?

Hi,

I have a table with about 305 million rows, and a composite primary key
that consists of an ascending int and an ascending varchar(18), which
is typically of length 13. Even if all the keys used the full 18
characters of the varchar, it seems to me each key should be 22 bytes,
so the index should be roughly 6.4GB. However, the size of the index as
shown in EM is about 24GB, and this is slowing everything down
considerably. Does anyone else think this index size is a little
excessive, or know why it should be so large?

Thanks,
Sethsql_server_2000_user (sethpurcell@.comcast.net) writes:
> I have a table with about 305 million rows, and a composite primary key
> that consists of an ascending int and an ascending varchar(18), which
> is typically of length 13. Even if all the keys used the full 18
> characters of the varchar, it seems to me each key should be 22 bytes,
> so the index should be roughly 6.4GB. However, the size of the index as
> shown in EM is about 24GB, and this is slowing everything down
> considerably. Does anyone else think this index size is a little
> excessive, or know why it should be so large?

Is that a clustered index or a non-clustered iodex?

A clustered index has the data pages in the leafs of the index node, so
size of index is basically size of data.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Just a guess, but is your primary key a clustered index? The clustered
index represents the actual contents of each row in your database, so it
would make sense that it would be larger than only the indexed fields.

If this is the case, you might want to change that index to not be
clustered and choose a more appropriate field to make a clustered index.

Thanks,
Tony

"sql_server_2000_user" <sethpurcell@.comcast.net> wrote in message
news:1123161597.120447.165590@.g49g2000cwa.googlegr oups.com:

> Hi,
> I have a table with about 305 million rows, and a composite primary key
> that consists of an ascending int and an ascending varchar(18), which
> is typically of length 13. Even if all the keys used the full 18
> characters of the varchar, it seems to me each key should be 22 bytes,
> so the index should be roughly 6.4GB. However, the size of the index as
> shown in EM is about 24GB, and this is slowing everything down
> considerably. Does anyone else think this index size is a little
> excessive, or know why it should be so large?
> Thanks,
> Seth|||Yes! It is intentionally a clustered index. But if the db includes the
row data pages when it reports the size of a clustered index, why is
the size of the index not just equal to the size of the table (105GB?)

Thanks,
Seth|||I think I would take a look at the FILL FACTOR of the index (the % of
the index that is created empty), and possibly do a DBCC SHOWCONTIG to
show the current fragmentation and DBCC REINDEX on the table during a
time of little or no activity if I thought that the tables contents
might be fragmented.

You can look up any of the all-caps terms in Books Online if you want
more info about the commands.

Good luck,
Tony Sebion

"sql_server_2000_user" <sethpurcell@.comcast.net> wrote in message
news:1123168931.364988.273310@.g47g2000cwa.googlegr oups.com:

> Yes! It is intentionally a clustered index. But if the db includes the
> row data pages when it reports the size of a clustered index, why is
> the size of the index not just equal to the size of the table (105GB?)
> Thanks,
> Seth|||thanks for the tips. the fill factor is zero on the index; i'll
definitely look for fragmentation. what i don't understand is, the
query was fast yesterday (4 secs), and today it's slow (10 mins), and
the only change i made to the table was that i added an index on an
unrelated field. (the new index is 12GB)|||Now that you mention it, you will want to make sure you REINDEX all the
other indexes for that table after you're done reindexing the clustered
one. I'd start there and see what things look like when that is
complete.

Tony

"sql_server_2000_user" <sethpurcell@.comcast.net> wrote in message
news:1123174187.544688.200820@.g44g2000cwa.googlegr oups.com:

> thanks for the tips. the fill factor is zero on the index; i'll
> definitely look for fragmentation. what i don't understand is, the
> query was fast yesterday (4 secs), and today it's slow (10 mins), and
> the only change i made to the table was that i added an index on an
> unrelated field. (the new index is 12GB)|||sql_server_2000_user (sethpurcell@.comcast.net) writes:
> Yes! It is intentionally a clustered index. But if the db includes the
> row data pages when it reports the size of a clustered index, why is
> the size of the index not just equal to the size of the table (105GB?)

From where did you get that? Enterprise Manager? EM is not known to report
sizes very well. I tried to have a look at it, but was not able to dig
out any sizes at all from EM. (Yeah, that's right, I don't use EM that
often.)

Anyway, your real problem appears to be:

> what i don't understand is, the query was fast yesterday (4 secs), and
> today it's slow (10 mins), and the only change i made to the table was
> that i added an index on an unrelated field. (the new index is 12GB)

It appears that some how this cause a shake-up for the query plan. When
you create a new index, statistics are updated with fullscan. Since the PK
is the clustered index, the PK columns are included in the non-clustered
index as well, as the index keys for the clustered index work as row
locator. I don't know if this causes statistics on the PK to be updated
as well. It could also be that SQL Server makes an incorrect estimate
and thinks that the new index is good for the query.

It would help if you posted the query, the CREATE TABLE statement for
the table, as well as the CREATE INDEX statements.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Wow, you guys are great. Yeah, I know EM returns wack object sizes, but
I thought they were kind of close; maybe not. Anyway, here's a test I
find interesting:

query 1:

select * from loan_history where time_period=196 and exloan_id in (
select top 3 exloan_id from loan where deal_no='ML4W1')

query 2:

select * from loan_history where time_period=196 and exloan_id in
('CTSMYX0010515778', 'CTSMYX0010525710', 'CTSMYX0010527475')

Query 1 takes a long time - i don't know how long because i cancel it
after a few minutes. Query 2 is <1 sec. the odd thing is, the subquery
of query 1 is also <1 sec! ?

The query that used to be fast and is now slow, requiring a huge amount
of reads, is simply:

select * from loan
inner join loan_history lh
on loan.exloan_id = lh.exloan_id
where loan.deal_no='ML4W1'
and time_period='196'

Here's the output of DBCC SHOWCONTIG on loan_history (305mm rows):

DBCC SHOWCONTIG scanning 'loan_history' table...
Table: 'loan_history' (949578421); index ID: 1, database ID: 7
TABLE level scan performed.
- Pages Scanned........................: 7497085
- Extents Scanned.......................: 941533
- Extent Switches.......................: 1489125
- Avg. Pages per Extent..................: 8.0
- Scan Density [Best Count:Actual Count]......: 62.93%
[937136:1489126]
- Logical Scan Fragmentation ..............: 93.40%
- Extent Scan Fragmentation ...............: 2.39%
- Avg. Bytes Free per Page................: 535.3
- Avg. Page Density (full)................: 93.39%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.

And for loan (9.8mm rows):

DBCC SHOWCONTIG scanning 'loan' table...
Table: 'loan' (1413580074); index ID: 1, database ID: 7
TABLE level scan performed.
- Pages Scanned........................: 462476
- Extents Scanned.......................: 58046
- Extent Switches.......................: 77279
- Avg. Pages per Extent..................: 8.0
- Scan Density [Best Count:Actual Count]......: 74.81% [57810:77280]
- Logical Scan Fragmentation ..............: 96.45%
- Extent Scan Fragmentation ...............: 0.54%
- Avg. Bytes Free per Page................: 329.5
- Avg. Page Density (full)................: 95.93%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.

Sorry for the long post, but here's the create table statement for
loan:

CREATE TABLE [loan] (
[SERIES_NO] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[POOL_ID] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DEAL_NO] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[GROUP_NO] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[LOAN_ID] [char] (6) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[EXLOAN_ID] [varchar] (18) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[LOAN_NO] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
,
[ADD_DATE] [int] NULL ,
[PROP_ZIP] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[STATE] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PROP_TYPE] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[UNITS_NO] [int] NULL ,
[OCCUPANCY] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ORIG_DATE] [int] NULL ,
[MATURITY] [int] NULL ,
[FIRST_PMT] [int] NULL ,
[ORIG_AMT] [money] NULL ,
[CLOSE_BAL] [money] NULL ,
[CLOSE_INT] [float] NULL ,
[SALE_PRICE] [money] NULL ,
[APP_VALUE] [money] NULL ,
[PROD_TYPE] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[TERM] [int] NULL ,
[INIT_RATE] [float] NULL ,
[UNDER_RAT1] [float] NULL ,
[UNDER_RAT2] [float] NULL ,
[LOAN_TYPE] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PURPOSE] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PMT_FREQ] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LOAN_SRC] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[BUYDOWN] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DOCUMENT] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PMI_CODE] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[CONVERT] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[POOL_INS] [bit] NULL ,
[RECOURSE] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LTV] [float] NULL ,
[SERV_FEE_R] [float] NULL ,
[NEGAM] [bit] NULL ,
[NEG_LIMIT] [float] NULL ,
[INDEX_ID] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[MARGIN] [float] NULL ,
[P_RATE_CAP] [float] NULL ,
[P_RATE_FLO] [float] NULL ,
[P_PAY_CAP] [float] NULL ,
[P_PAY_FLO] [float] NULL ,
[L_RATE_CAP] [float] NULL ,
[L_RATE_FLO] [float] NULL ,
[RATE_RESET] [int] NULL ,
[PAY_RESET] [int] NULL ,
[FIRST_RATE] [int] NULL ,
[FIRST_PAY] [int] NULL ,
[AMORT_TERM] [int] NULL ,
[DOC_RAW] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[FICO] [int] NULL ,
[LIEN] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[GRADE_RAW] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[GRADE_MIC] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PP_PEN] [bit] NULL ,
[PP_TERM] [int] NULL ,
[F_RATE_CAP] [float] NULL ,
[PMI_LEVEL] [float] NULL ,
[PLEDGE_AMT] [money] NULL ,
[EFF_LTV] [float] NULL ,
[FIRST_LTV] [float] NULL ,
[SECOND_LTV] [float] NULL ,
[COMB_LTV] [float] NULL ,
[SERVICER] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ORIGINATOR] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[SILENT_SECOND] [bit] NOT NULL ,
CONSTRAINT [PK_loan] PRIMARY KEY CLUSTERED
(
[EXLOAN_ID]
) ON [PRIMARY]
) ON [PRIMARY]
GO

And for loan_history:

CREATE TABLE [loan_history] (
[TIME_PERIOD] [int] NOT NULL ,
[POOL_ID] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[DEAL_NO] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[GROUP_NO] [char] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[SERVICER] [char] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LOAN_ID] [char] (6) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[EXLOAN_ID] [varchar] (18) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[LAST_INT_P] [int] NULL ,
[BALANCE] [money] NULL ,
[INT_RATE] [float] NULL ,
[TOTPMT_DUE] [money] NULL ,
[SCH_PRINC] [money] NULL ,
[SCH_MNTH_P] [money] NULL ,
[MBA_STAT] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[OTS_STAT] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DELIQ_HIST] [char] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[EXCEPTION] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[FC_START_D] [int] NULL ,
[FC_END_D] [int] NULL ,
[FC_END_TYP] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PAYOFF_D] [int] NULL ,
[PAYOFF_R] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[REO_DATE] [int] NULL ,
[INV_BAL] [money] NULL ,
[NEXT_INT_R] [float] NULL ,
[LOSS_AMT] [money] NULL ,
[INET_RATE] [float] NULL ,
[YYYYMM] AS ((floor(([TIME_PERIOD] / 12)) + 1989) * 100 +
[TIME_PERIOD] % 12 + 1) ,
CONSTRAINT [PK_loan_history] PRIMARY KEY CLUSTERED
(
[TIME_PERIOD],
[EXLOAN_ID]
) ON [PRIMARY]
) ON [PRIMARY]
GO

Thanks for all the help, I really appreciate it.
Seth|||sql_server_2000_user (sethpurcell@.comcast.net) writes:
> query 1:
> select * from loan_history where time_period=196 and exloan_id in (
> select top 3 exloan_id from loan where deal_no='ML4W1')
> query 2:
> select * from loan_history where time_period=196 and exloan_id in
> ('CTSMYX0010515778', 'CTSMYX0010525710', 'CTSMYX0010527475')
> Query 1 takes a long time - i don't know how long because i cancel it
> after a few minutes. Query 2 is <1 sec. the odd thing is, the subquery
> of query 1 is also <1 sec! ?

Well, it's not the case that SQL Server first runs the subquery and given
that result optimizes the outer query. It optimizes everything in one
go.

Then again, it's difficult to say why it would fail here, since it knows
that it will get (at most) three rows from loan, and thus only have to
read three rows from loan_history.

You forgot to include indexes for the tables, but I assume that here is
a non-clustered index on loan.deal_no?

What query plan do you have for the slow query here? Run the query
embedded in SET SHOWPLAN_TEXT ON /OFF. (In separate batches.)

> The query that used to be fast and is now slow, requiring a huge amount
> of reads, is simply:
> select * from loan
> inner join loan_history lh
> on loan.exloan_id = lh.exloan_id
> where loan.deal_no='ML4W1'
> and time_period='196'

For this query it would probably be better if the index on
loan.deal_no was clustered, possibly then it should be (deal_no, exloan_id).
But this could have consequences for other queries.

Again, it would be interesting to see the query plan.

By the way, using SELECT * in production code is not good practice.
Better to list all columns you reallyneed.

> Here's the output of DBCC SHOWCONTIG on loan_history (305mm rows):

Thanks. The tables appears to be in decent shape. Not perfect, but
certainly not alarming.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Yes, there's a non-clustered index on loan.deal_no.

I assume you said that having the table clustered by loan.deal_no would
be advantageous for this query because it would minimize the number of
page reads?

The tables don't look bad?? I thought the logical fragmentation of both
tables was horrendous, given that books online says 0-10% is tolerable
and they're at 93% and 96%.

Here's the plan for query1:

|--Nested Loops(Left Semi Join, OUTER
REFERENCES:([loan_history].[EXLOAN_ID]))
|--Compute
Scalar(DEFINE:([loan_history].[YYYYMM]=(floor([loan_history].[TIME_PERIOD]/12)+1989)*100+[loan_history].[TIME_PERIOD]%12+1))
| |--Clustered Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan_history].[PK_loan_history]),
SEEK:([loan_history].[TIME_PERIOD]=196) ORDERED FORWARD)
|--Filter(WHERE:([loan_history].[EXLOAN_ID]=[loan].[EXLOAN_ID]))
|--Top(3)
|--Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan].[IX_loan_deal_no]),
SEEK:([loan].[DEAL_NO]='ML4W1') ORDERED FORWARD)

Here's the plan for query2:

|--Compute
Scalar(DEFINE:([loan_history].[YYYYMM]=(floor([loan_history].[TIME_PERIOD]/12)+1989)*100+[loan_history].[TIME_PERIOD]%12+1))
|--Clustered Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan_history].[PK_loan_history]),
SEEK:([loan_history].[TIME_PERIOD]=196 AND
[loan_history].[EXLOAN_ID]='CTSMYX0010515778' OR
[loan_history].[TIME_PERIOD]=196 AND [loan_history].[EXLOAN_ID]='C

(the line was truncated in the QA window)

Here are the indices on loan:

CREATE INDEX [IX_loan_deal_no] ON [dbo].[loan]([DEAL_NO]) ON
[PRIMARY]
CREATE INDEX [IX_loan_fico] ON [dbo].[loan]([FICO]) ON [PRIMARY]
CREATE INDEX [IX_loan_state] ON [dbo].[loan]([STATE]) ON [PRIMARY]
CREATE INDEX [IX_loan_orig_date] ON [dbo].[loan]([ORIG_DATE]) ON
[PRIMARY]
CREATE INDEX [IX_loan_grade_mic] ON [dbo].[loan]([GRADE_MIC]) ON
[PRIMARY]
CREATE INDEX [IX_loan_loan_no] ON [dbo].[loan]([LOAN_NO]) ON
[PRIMARY]
CREATE INDEX [IX_loan_series_no] ON [dbo].[loan]([SERIES_NO]) ON
[PRIMARY]

and on loan_history:

CREATE INDEX [IX_loan_history_deal_no] ON
[dbo].[loan_history]([DEAL_NO]) ON [PRIMARY]
CREATE INDEX [IX_loan_history_balance] ON
[dbo].[loan_history]([BALANCE]) ON [PRIMARY]

The last index was the one I added recently.

What I don't understand is, I had an SP that uses this simple join
query that ran in 4 secondss yesterday and today it takes 2 hours, and
I'm at a loss to understand what changed (besides the index I added)
that may have caused this. I can accept that it is supposed to take two
hours, but then why did it used to take four seconds?? When I hover the
cursor over a stage in the execution plan in QA, it gives me
ridiculously low CPU and I/O costs for operations - nothing like the 40
minutes of wall-clock time I'm seeing. The execution plan for the query
run by the SP are all index seeks, nested loop joins - no bookmark
lookups, I don't get it. And that little 'top 3' subquery, I just can't
fathom - I mean, it's only three rows, what on earth can it be doing
for 10 minutes??

Thanks again,
Seth|||Actually, I think know what it's doing - reading off disk. If I check
sysprocesses, the process shows lastwaittype=PAGEIOLATCH_SH and the
waitresource is just spinning through hundreds and hundreds of pages...
just for three rows.|||sql_server_2000_user (sethpurcell@.comcast.net) writes:
> I assume you said that having the table clustered by loan.deal_no would
> be advantageous for this query because it would minimize the number of
> page reads?

The idea is that SQL Server then could start with finding all rows
with that deal_no, and the look up in loan history. I am the assuming
that the number of rows per deal_no is moderate.

> The tables don't look bad?? I thought the logical fragmentation of both
> tables was horrendous, given that books online says 0-10% is tolerable
> and they're at 93% and 96%.

You are right. I mainly looked at Scan Density (which is decent) and
Extent Scan Fragmentation (which is excellent).

> Here's the plan for query1:

So what it does is to do a clustered index seek on loan_history - but
only on time_period. So in practice it scans all rows for time_period = 196
and for each row looks it up if exloan_id matches the subquery. Assuming
that there are a couple of million rows for one time_period, that is
going to take a long time!

Had the the clustered index on loan been on (deal_no, exloan_id), it
could at least have done a merge join. But that would be slow too.
I would rather expect some thing like first running the subquery, and
then to a nested loop join to loan_history and using both columns of
the clustered index.

> Here's the plan for query2:

But was the quick one. What about the join:

select * from loan
inner join loan_history lh
on loan.exloan_id = lh.exloan_id
where loan.deal_no='ML4W1'
and time_period='196'

I can't see that you included the plan for that one.

> (the line was truncated in the QA window)

Tools->Options->Results, here you can change "Max characters per column"
to rectify this.

> When I hover the cursor over a stage in the execution plan in QA, it
> gives me ridiculously low CPU and I/O costs for operations - nothing
> like the 40 minutes of wall-clock time I'm seeing.

Of course! I mean it's not picking a bad plan, because it's evil, but
because the estimates are wrong.

> The execution plan for the query run by the SP are all index seeks,
> nested loop joins - no bookmark lookups, I don't get it.

As we saw above, a nested loop join in the wrong place can be quite
bad. If you include the plan, I might be able to say more.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Here's the plan for the actual join:

StmtText
|--Nested Loops(Inner Join, OUTER REFERENCES:([lh].[EXLOAN_ID]))
|--Compute
Scalar(DEFINE:([lh].[YYYYMM]=(floor([lh].[TIME_PERIOD]/12)+1989)*100+[lh].[TIME_PERIOD]%12+1))
| |--Clustered Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan_history].[PK_loan_history]
AS [lh]), SEEK:([lh].[TIME_PERIOD]=196) ORDERED FORWARD)
|--Clustered Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan].[PK_loan]),
SEEK:([loan].[EXLOAN_ID]=[lh].[EXLOAN_ID]),
WHERE:([loan].[DEAL_NO]='ML4W1') ORDERED FORWARD)

I got the CPU and I/O numbers from the execution plan that came up
after running the query, not from the estimated execution plan you can
get without running the query, so I thought they were the actual
measurements, not estimates.

So you're saying the problem is the nested loop join is in the wrong
direction? I guess what I expect the optimizer to do is: get the
exloan_id from the loan table for the few thousand records with the
specified deal_no, and then do an index seek in the primary key of the
loan_history table (since I specify the time period and the subquery
provides the other part of the composite key), thus doing just a few
thousand index seeks on a clustered index to get the row data. If this
is right, then why doesn't the optimizer do this? Can I get it to do it
this way? Do I have to give hints to the query optimizer, or do I need
to do something to the tables? Should I rebuild them with a >0 fill
factor to defrag anyway? And is it somehow possible that the optimizer
was generating the correct execution plan, and that by adding an index,
it broke? Because I swear, the queries were taking like 5 seconds
before.

Thanks a lot,
Seth|||sql_server_2000_user (sethpurcell@.comcast.net) writes:
> I got the CPU and I/O numbers from the execution plan that came up
> after running the query, not from the estimated execution plan you can
> get without running the query, so I thought they were the actual
> measurements, not estimates.

No, they are still estimates.

> So you're saying the problem is the nested loop join is in the wrong
> direction?

Yes, it's the very same problem with this query: it starts off in
loah_history and scanning everything with time_period = 196.

> I guess what I expect the optimizer to do is: get the exloan_id from the
> loan table for the few thousand records with the specified deal_no, and
> then do an index seek in the primary key of the loan_history table
> (since I specify the time period and the subquery provides the other
> part of the composite key), thus doing just a few thousand index seeks
> on a clustered index to get the row data.

Yeah, that sounds like a bad plan. Maybe you can get a job as query
optimizer. :-)

> If this is right, then why doesn't the optimizer do this?

Good question. More good questions?

> Can I get it to do it this way? Do I have to give hints to the query
> optimizer,

Yes, an "OPTION (FORCE ORDER)" at the end of the query should do the
trick, but don't go there yet.

> or do I need to do something to the tables? Should I rebuild them with a
> >0 fill factor to defrag anyway? And is it somehow possible that the
> optimizer was generating the correct execution plan, and that by adding
> an index, it broke? Because I swear, the queries were taking like 5
> seconds before.

Well, the one thing I can think of is that by adding an index, SQL
Server updated the statistics for the PK, and this new information
left the optimizer in a maze. But given the distribution, I don't
really see how it could into this mess.

Let's try this: run "UPDATE STATISTICS loan WITH FULLSCAN" and try the
query. (It could be that with new statistics for loan_history, but
antiquated statistics for loan, you get the bad plan.) If the plan is
equally rotten try "UPDATE STATISTICS loan_history WITH FULLSCAN".

If the plan is still bad, at this point reindexing only helps to make
run a little less slow. That is, reindexing as such does not affect
the query plan. But since it will include a complete update of
statistics, it could have an effect of query plan directly.

Anyway, if the plan is still bad at this point, try the query hint.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||I updated statistics on the loan table and it had no effect.

Was that a typo and you meant to say it sounds like a good plan?
Because I forced this plan with the following query:

select * from (
select * from loan
where loan.deal_no='ML4W1') as x
inner join loan_history as lh
on x.exloan_id=lh.exloan_id
where time_period=196
option (force order, loop join)

and it is so fast it makes me cry - 1 second if cached, 5 seconds if
not cached and a large deal. Here's the plan:

StmtText
|--Nested Loops(Inner Join, OUTER REFERENCES:([loan].[EXLOAN_ID])
WITH PREFETCH)
|--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([LoanPerformance].[dbo].[loan]))
| |--Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan].[IX_loan_deal_no]),
SEEK:([loan].[DEAL_NO]='ML4W1') ORDERED FORWARD)
|--Compute
Scalar(DEFINE:([lh].[YYYYMM]=(floor([lh].[TIME_PERIOD]/12)+1989)*100+[lh].[TIME_PERIOD]%12+1))
|--Clustered Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan_history].[PK_loan_history]
AS [lh]), SEEK:([lh].[TIME_PERIOD]=196 AND
[lh].[EXLOAN_ID]=[loan].[EXLOAN_ID]) ORDERED FORWARD)

I tried this query:

select * from loan
inner join loan_history lh
on loan.exloan_id = lh.exloan_id
where loan.deal_no='ML4W1'
and time_period='196'
option (force order)

and it takes a couple minutes if not cached, 5 seconds if cached -
definitely better than a couple hours, but nowhere near my
corrected-nested-loop. Here's the plan:

StmtText
|--Merge Join(Inner Join,
MERGE:([loan].[EXLOAN_ID])=([lh].[EXLOAN_ID]),
RESIDUAL:([lh].[EXLOAN_ID]=[loan].[EXLOAN_ID]))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([LoanPerformance].[dbo].[loan]))
| |--Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan].[IX_loan_deal_no]),
SEEK:([loan].[DEAL_NO]='ML4W1') ORDERED FORWARD)
|--Compute
Scalar(DEFINE:([lh].[YYYYMM]=(floor([lh].[TIME_PERIOD]/12)+1989)*100+[lh].[TIME_PERIOD]%12+1))
|--Clustered Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan_history].[PK_loan_history]
AS [lh]), SEEK:([lh].[TIME_PERIOD]=196) ORDERED FORWARD)

And now for the obvious/hard question: now that I know what the right
execution plan is, how can I get the query optimizer to generate it -
*without the hints*? Because some users of the DB are using report
designing software that will just generate the join query with no
hints, and I don't want them sitting there for hours when they don't
have to. Does the optimizer not choose the merge join because it
requires a bookmark lookup? Does it not choose my nested loop join
because the statistics are wrong somewhere?

Thanks again,
Seth|||sql_server_2000_user (sethpurcell@.comcast.net) writes:
> I updated statistics on the loan table and it had no effect.

And you used FULLSCAN?

Did you also update statistics (WITH FULLSCAN) on loan_history?

> Was that a typo and you meant to say it sounds like a good plan?

Yes. Don't believe everything I say. :-)

> Because I forced this plan with the following query:
> select * from (
> select * from loan
> where loan.deal_no='ML4W1') as x
> inner join loan_history as lh
> on x.exloan_id=lh.exloan_id
> where time_period=196
> option (force order, loop join)
> and it is so fast it makes me cry - 1 second if cached, 5 seconds if
> not cached and a large deal. Here's the plan:

And that is indeed the plan we are looking for!

> and it takes a couple minutes if not cached, 5 seconds if cached -
> definitely better than a couple hours, but nowhere near my
> corrected-nested-loop. Here's the plan:

Certainly better, but still scanning the entire time_period 196.

> And now for the obvious/hard question: now that I know what the right
> execution plan is, how can I get the query optimizer to generate it -
> *without the hints*? Because some users of the DB are using report
> designing software that will just generate the join query with no
> hints, and I don't want them sitting there for hours when they don't
> have to. Does the optimizer not choose the merge join because it
> requires a bookmark lookup? Does it not choose my nested loop join
> because the statistics are wrong somewhere?

I have no idea. I have posted a question to our internal MVP forum
to get some suggestions. So, OK, I have a vague guess: time_period
is a bit too unselective to be the first column in the index. The
statistics for the index, has a distribution histogramme for this
column only.

Thus, here is an idea: Make the PK on loan_history unclustered,
and then add a clustered index on (exloan_id, time_period), that is
the reverse key. But this is a very costly operation to do on a
305 million row table, and there are other queries that could take
a tool. (Hm, if you add a non-clustered index on the reverse only?
Could that help?)

There is also the idea that I've mentioned before, make the
index on loan.deal_no the clustered on that table. That would at
least be faster to implement. But that is also likely to have
ramifications on other queries.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||
sql_server_2000_user wrote:
> I updated statistics on the loan table and it had no effect.
>
> And now for the obvious/hard question: now that I know what the right
> execution plan is, how can I get the query optimizer to generate it -
> *without the hints*? Because some users of the DB are using report
> designing software that will just generate the join query with no
> hints, and I don't want them sitting there for hours when they don't
> have to. Does the optimizer not choose the merge join because it
> requires a bookmark lookup? Does it not choose my nested loop join
> because the statistics are wrong somewhere?

This can be hard, but one place to start that I don't
see done in this thread is to find the bad estimate, if
there is one causing the problem. Usually this will be
a bad rowcount estimate (rowcount estimates cannot
always be improved with better statistics, so this can
happen even after FULLSCAN) - sometimes it is a row size estimate.

First, though, just one thought - You noted that the index is
very fragmented, though the density is good. This could be
a problem costing the seek on time period 196. The optimizer might
expect all the qualifying rows to be physically contiguous, but
they might be scattered all over the place. If the optimizer
expects 40-50 rows per page, and perhaps 1,000,000 qualifying
rows, the I/O for the scan will be ~20,000 pages, or ~150MB.
If instead, the index is so fragmented that to follow the
logical order requires 10 times as many pages, you could move
from an "all in memory" query to a disk-grinder.

I'm not sure whether the optimizer considers fragmentation at
all in planning - I assume not, but could be wrong.

Can you find out how many pages contain rows with time period
196? You could clear the buffers and run a simple select like
select <some columns only in the clustered index>
from loan_history
where time_period = 196

It would be good here also to see if the estimated and actual
row count for that value are close in this simple query.

Then look at estimates in the actual slow query and see if any
are bad. If you run one of the too-slow queries (the slowest
you can stand to run) after typing CTRL-K for "show execution plan",
you will see the estimated execution plan with a couple of
extra actual values in addition to most of the estimates you can
get with CTRL-L. In particular, you will see the actual number
of rows (Row Count) and number of executions (Number of Executes)
of each operator.

Compare (Estimated Rowcount)*(Estimated number of executes) with
(Row Count) for the key operators. You will have to get the
estimated number of executes from CTRL-L, because it is not repeated
in the actual plan. Note that the actual rowcount already accounts
for the number of executes, so don't multiply it in.

Also for the key operators, look at Estimated Row Size.
It should be the average row size of an index row, and if it is
very far off, it can cause the wrong plan to be chosen. I
doubt this is your problem, but it can't hurt to look.0

If you see bad estimates, let us know.

If this is the problem, and it can't be fixed with
statistics, you might consider trying to change the estimate
with extra predicates that don't affect the query result.
Changing the estimate to a smaller number can be done by
repeating or adding predicates that have no real effect but
which the optimizer expects will restrict the number of rows.
Making estimates larger in the hope of leading the optimizer
away from an index that is not helping might be possible
by adding superfluous OR conditions, or with a non-SARGable
predicate that cannot use statistics and will use a generic
estimate.

Perhaps some of this will help, in addition to what Erland's
good advice is doing.

Steve Kass
Drew University

> Thanks again,
> Seth|||Yes, I updated statistics with a full scan, and no, I didn't update the
loan_history statistics - this would be a really large job, and they
haven't changed since before the behavior changed, so I didn't think
that could be a problem.

As an experiment, I built a table identical to loan_history but with
the primary key reversed (still clustered). I populated the table with
10% of the original loan_history table, and here is the simple join and
its execution plan:

from loan
inner join loan_history lh
on loan.exloan_id=lh.exloan_id
where loan.deal_no='EQ604'
and time_period=81

StmtText
|--Merge Join(Inner Join,
MERGE:([loan].[EXLOAN_ID])=([lh].[EXLOAN_ID]),
RESIDUAL:([lh].[EXLOAN_ID]=[loan].[EXLOAN_ID]))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([LoanPerformance].[dbo].[loan]))
| |--Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan].[IX_loan_deal_no]),
SEEK:([loan].[DEAL_NO]='EQ604') ORDERED FORWARD)
|--Compute
Scalar(DEFINE:([lh].[YYYYMM]=(floor([lh].[TIME_PERIOD]/12)+1989)*100+[lh].[TIME_PERIOD]%12+1))
|--Clustered Index
Seek(OBJECT:([LoanPerformance].[dbo].[loan_history].[PK_loan_history]
AS [lh]), SEEK:([lh].[TIME_PERIOD]=81) ORDERED FORWARD)

As you can see, it's identical to the plan generated on the original
table when I use the option (force order) hint, which is good but still
not as good as the nested loop join I came up with. Why can't it find
the nested loop plan? What do I know about the table that the optimizer
doesn't? Could this be due to the different scales of the tables?
Perhaps somewhere between 10% and 100% it becomes more efficient to use
my nested loop join as opposed to this merge join, and if the tables
were the same size, I would see this?

I am working on the rowcount/rowsize comparisons.

Thanks for all the great help,
Seth|||Wow, this is really interesting:

I made a spreadsheet with one row per time period, showing the join
method, row estimates and actual row counts, execution time, etc., and
discovered some interesting things. A key aspect of this query is how
the number of rows in loan_history increases as the time_period
increases, and how this causes the execution plan to change. This
should give you some idea of how the number of records increases:

time_period record count
40 596
50 6178
60 12998
70 37802
80 65614
90 108234
100 227347
110 410063
120 801426
130 1404053
140 1957924
150 2487100
160 3473813
170 4435225
180 5622040
190 7571203

Pretty much exponential. So now that you know that, here's what QA does
with this query:

select *
from loan
inner join loan_history lh
on loan.exloan_id=lh.exloan_id
where loan.deal_no=@.deal_no
and time_period=@.time_period

time_period join method
39-40 loop; loan_history on top
41-42 merge; loan on top
43-44 merge; loan_history on top
45-95 merge; loan on top
96-194 loop; loan on top
195-197 loop; loan_history on top

It all makes a lot of sense and is very very fast *until* you hit time
period 195, when it all goes to hell. Here are the row count estimates
and actual values for various time periods in the above ranges (there
are 596 rows in loan for the deal I used in my test queries):

time_period top est. bottom est. top actual bottom actual
40 647 1 596 596
42 1089 1450 596 597
44 2785 1089 596 597
95 1089 145187 596 64429
194 1089 2265 2265 1
197 1 1 6,570,403 2265

I'm sure you see the problem: the estimates for time_period 197 are
nuts.

Some of the other comparisons between estimates and actual values look
a little funny to me, but there's one thing I noticed that left me
completely perplexed: I thought the optimizer was basing its plan
purely on the number of rows, but time period 197 currently has *fewer
rows* than time period 194 (6,570,403 vs 8,260,954), yet time period
197 uses the same join method as 195 & 196, which have 8,261,776 and
8,273,743 rows, respectively. So how is it picking the broken plan, and
why? Is there any way I can tell if the stats are screwed up besides
rebuilding them and seeing if the problem goes away? It seems like
stats were built that essentially say time periods 195, 196 and 197
have almost no rows, since the plan is the same as that used in the
earliest two time periods - and the rows for these time periods were
added after the initial load of the table, when I was doing things to
the table that almost certainly caused stats to be updated. I think I'm
going to look into stats for loan_history, and see if I can get these
plans back on track. But how can I avoid this in the future? Do I have
to update stats every time I add a new time period? Is this normal?

Thanks everybody for all your help!
Seth|||Ok, well, I'm a dunce: dbcc show_statistics (loan_history,
pk_loan_history) shows lovely stats for all values of time_period up to
194, then nothing. I checked the stats a while ago, when I was under
the impression that the optimizer was generating the wrong plan for all
time periods, and I didn't notice this then.

UpdatedRowsRows SampledStepsDensityAverage key length
Jul 1 2005 10:40AM 2810359682810359761560.022.0

I find it strange that the number of rows sampled is greater than the
number of rows in the table.

But why hasn't it updated stats? The index is clustered, so I can't
even set statistics norecompute.

Thanks,
Seth|||Ok, I found what I was looking for: autostats is triggered after 20% of
the rows in a table are updated, and this is why there are simply no
stats for the last few time periods. About 56 million updates need to
happen before autostats runs, and the rowcount is currently at 24
million, so time to update statistics manually.

Thanks again,
Seth|||sql_server_2000_user (sethpurcell@.comcast.net) writes:
> As an experiment, I built a table identical to loan_history but with
> the primary key reversed (still clustered). I populated the table with
> 10% of the original loan_history table, and here is the simple join and
> its execution plan:

Judging from that plan, you somehow failed to reverse the key. I mean:

> Seek(OBJECT:([LoanPerformance].[dbo].[loan_history].[PK_loan_history]
> AS [lh]), SEEK:([lh].[TIME_PERIOD]=81) ORDERED FORWARD)

would not be possible with a reverse key.

Anyway, it seems that you have performed one hell of a job to track
this down yourself, and found the answer. I must say that I'm full of
admiration for your feat. It's very unusual to see someone put so
much work into his problem - and also report back to the newsgroup.
Big thanks for doing this!

> It seems like stats were built that essentially say time periods 195,
> 196 and 197 have almost no rows, since the plan is the same as that used
> in the earliest two time periods - and the rows for these time periods
> were added after the initial load of the table, when I was doing things
> to the table that almost certainly caused stats to be updated. I think
> I'm going to look into stats for loan_history, and see if I can get
> these plans back on track. But how can I avoid this in the future? Do I
> have to update stats every time I add a new time period? Is this
> normal?

Yes, as you have found this is kind of normal.

There is a problem with contiguously growing keys. In our system
a colleague of mine has set up a job that reindexes tables. But to
avoid that this job takes too long time to run, he uses SHOWCONTIG,
and if fragmentation is moderate, he skips the table.

Some time ago, one of our customers reported that one certain function
was slow, and I tracked it down to stored procedure. The customer
had just gotten 8.10 of our system, and the procedure was indeed changed
in that version. But, the strange thing was that the query that was
slow had not changed. And as it was a plain join of four tables, I
didn't really feel like tweaking it.

I found with the stats_date() function that statistics for one of tables
was a tad old, and an UPDATE STATISTICS WITH FULLSCAN on this table and
another resovled the issue. These tables both have monotonically growing
(or almost monotonically) primary keys that are clustered. The good
thing with that is that you don't get fragmentation. The bad thing is
that statistics may not be updated, if you skip tables when you run
you maintenance job. So my colleauge is now finding a strategy for how
run UPDATE STATISTICS on the tables he does not reindex.

But why had this happened with the new version? Well, since the procedure
had changed, the query plan was flushed from the cache, new balls - and,
oops bad plan.

And now I know why your new index opened Pandora's box: that caused
the query plans for the query to be flushed. So until you created the
index, you had the old statistics - and an old query to go with it.

Thus, it seems that if you just don't do anything, and don't rock the
boat nothing will happen. But since the most trivial thing to flush a
plan from the cache is a server reboot, it's a fragile strategy.

So you should schedule UPDATE STATISTICS with some frequency on the
table. It does not have to be WITH FULLSCAN, but you may need a
higher sample percent than the default. Since you know the presumptions,
you should be able to monitor this, and find out what works and what
does not.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||I'm glad to hear you worked this out. Thanks
very much for posting so much detail, and the
resolution. That's nice for those of us reading
now and will also help anyone with a similar
problem who finds this thread in the future.

SK

sql_server_2000_user wrote:
> Ok, I found what I was looking for: autostats is triggered after 20% of
> the rows in a table are updated, and this is why there are simply no
> stats for the last few time periods. About 56 million updates need to
> happen before autostats runs, and the rowcount is currently at 24
> million, so time to update statistics manually.
> Thanks again,
> Seth|||Looks like you caught another one of my mistakes - when I cut and
pasted my query I forgot to change the table name to loan_history2. I
investigated this (with the correct query this time) and the join
method the optimizer picks is the nested loops with two index seeks,
independent of the chosen time period. This makes sense, because the
number of rows for any given value of the first column of the index
(now exloan_id) is now basically constant, not varying by four orders
of magnitude like the time-period based index is.

Ah, the query plan cache - this is another thing I poked around in
early on and then decided it wasn't the problem, hahaha. Thanks for
explaining this to me, it makes perfect sense now. It would have been
awful to have had this happen for the first time after some maintenance
reboot of the server - I'm glad it happened beforehand and I'm on top
of it now. I will consider where to go from here, as far as reindexing
to defragment, setting up a plan to update statistics after loading new
data, etc.

The statistics are updating as I write this - and have been for the
past few hours.

Thanks for everything,
Seth|||sql_server_2000_user (sethpurcell@.comcast.net) writes:
> Ah, the query plan cache - this is another thing I poked around in
> early on and then decided it wasn't the problem, hahaha.

I also feel kind of stupid that I did not think of this earlier. That
would have led us faster to the outdated statistics.

Oh well, while it's a whole lot of work, exercises like this one
are good lessons for the future. (What I didn't say in my war story
was that a second customer ran into the same problem when they later
got 8.10. That time I could just tell our helpdesk folks "try this".

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment