Sunday, February 19, 2012

Index / Join / Where clause very slow

Hello,

first of all, some facts of the case:


Table Master Table Dimension
ID Code Price ID Name
1 A44333 5000 1 "Scanner"
2 D442 3000 2 "Notebook"
3 D6644 4000 3 "Banana"

I join both tables on ID and search one time for ID and another time for Name. Looks like

(a)
SELECT AVG(Price) From Master JOIN Dimension ON Master.id = Dimension.id
WHERE master.id=1
AND Code like 'A44'
(b)
SELECT AVG(Price) From Master JOIN Dimension ON Master.id = Dimension.id
WHERE Name = 'Scanner'
AND Code like 'A44'

Why does query (b) take longer than query (a)? Dimension has 12 Rows and
Master has about 24M Rows.

For index I did
Create Index IX_Master_ID on Master(ID)
Create Index IX_Master_Code on Master(Code)
Create Index IX_Dimension_ID on Dimension(ID)
Create Index IX_Dimension_Name on Dimension(Name)

I noticed, that when i leave the Code like 'A44' clause, query (a) and (b) do take same time. I'm really confused. Can someone please help me out?

Thank you

Silaslike 'A44' should really be = 'A44'.

Have you looked at the query plans or run it with set statistics io on?

The efficiency will improve also if the optimiser knows which indexes are unique. I suspect (but don't know) that the below two are unique:
Create Index IX_Dimension_ID on Dimension(ID)
Create Index IX_Dimension_Name on Dimension(Name)
I also suspect a composite index on master would be unique:

Create Index IX_Master_ID_Code on Master(ID, Code)|||like 'A44' should really be = 'A44'.
like 'A44%' :-)

Both ID's are not unique. Of course there is a unique field but the ID in this example is just kind of a foreign key. Moreover, there are a lot of Dimensions and much much more data fields. I think it was 14 foreign keys and about 40 data fields. But I wanted to keep things simple, so I did not mention.

EDIT: Oh, my fault. In Dimension table, ID and NAME are unique!

I did take a look at the query plan, but I can't really make sense of this.
I see, that both plans are not equal, when I keep the like clause.

But i found out something very interesting. If I create a composite index , as you suggested, and delete IX_Master_ID and IX_Master_Code, so that the server only uses the composite index, then both queries have the same execution time.

Finally, this is a big problem. To grant best execution times, I will have to create a composite index, that includes nearly all columns that can be affected by a user query. Is this usual?|||Here's my 2p;
Should this not really be a LEFT JOIN?

In query B you need it to read Dimension.Name (could be an ambiguous column name - certainly sounds it!)
EDIT: 'name' is also a reserved word - you should avoid using this!

As mentioned before - your LIKE clause is wrong.
Using a LIKE comparison means that your index on the column are ignored.|||What is the Count(*) value associated with the averages in your query? Your sample data doesn't help me get a handle on the query scale.

It would help me a bunch if you could execute:SET SHOWPLAN_TEXT ON
GO
meta_your_SQL_goes_here
GO
SET SHOWPLAN_TEXT OFF
GO...and post the results so we could see just what your server is doing.

My first guess is bad statistics, but that's only a guess at this point.

Oh, and by the way George:Using a LIKE comparison means that your index on the column are ignored. is not always true. If the LIKE is unambiguous on the left (meaning there are no leading wildcards), then a LIKE can ride an index.

-PatP|||Now I'm totally confused. I did another index on Price and what happens, query (b) now is 3 times faster that (a) :confused:

Indexes are
PK_objects_year_quarter_obid
IX_objects_price
IX_objects_eid
IX_objects_zip

IX_dim_estate_estate
PK_dim_estate_eid

PLAN A

|--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010]/CONVERT_IMPLICIT(decimal(19,0),[globalagg1008],0) END))
|--Stream Aggregate(DEFINE:([globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009])))
|--Nested Loops(Inner Join)
|--Clustered Index Seek(OBJECT:([testdb].[dbo].[dim_estate].[PK_dim_Estate_eid]), SEEK:([testdb].[dbo].[dim_estate].[EID]=(3.)) ORDERED FORWARD)
|--Stream Aggregate(DEFINE:([partialagg1007]=COUNT_BIG([testdb].[dbo].[Objects].[PRICE]), [partialagg1009]=SUM([testdb].[dbo].[Objects].[PRICE])))
|--Nested Loops(Inner Join, OUTER REFERENCES:([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID], [Expr1012]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Merge Join(Inner Join, MERGE:([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID])=([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID]), RESIDUAL:([testdb].[dbo].[Objects].[YEAR] = [testdb].[dbo].[Objects].[YEAR] AND [testdb].[dbo].[Objects].[QUARTER] = [testdb].[dbo].[Objects].[QUARTER] AND [testdb].[dbo].[Objects].[OBID] = [testdb].[dbo].[Objects].[OBID]))
| |--Sort(ORDER BY:([testdb].[dbo].[Objects].[YEAR] ASC, [testdb].[dbo].[Objects].[QUARTER] ASC, [testdb].[dbo].[Objects].[OBID] ASC))
| | |--Index Seek(OBJECT:([testdb].[dbo].[Objects].[IX_objects_zip]), SEEK:([testdb].[dbo].[Objects].[ZIP] >= N'44' AND [testdb].[dbo].[Objects].[ZIP] < N'45'), WHERE:([testdb].[dbo].[Objects].[ZIP] like N'44%') ORDERED FORWARD)
| |--Index Seek(OBJECT:([testdb].[dbo].[Objects].[IX_objects_eid]), SEEK:([testdb].[dbo].[Objects].[EID]=(3.)) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([testdb].[dbo].[Objects].[PK_Objects_year_quarter_obid]), SEEK:([testdb].[dbo].[Objects].[YEAR]=[testdb].[dbo].[Objects].[YEAR] AND [testdb].[dbo].[Objects].[QUARTER]=[testdb].[dbo].[Objects].[QUARTER] AND [testdb].[dbo].[Objects].[OBID]=[testdb].[dbo].[Objects].[OBID]) LOOKUP ORDERED FORWARD)

PLAN B

|--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010]/CONVERT_IMPLICIT(decimal(19,0),[globalagg1008],0) END))
|--Stream Aggregate(DEFINE:([globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009])))
|--Nested Loops(Inner Join, OUTER REFERENCES:([testdb].[dbo].[Objects].[EID]))
|--Hash Match(Aggregate, HASH:([testdb].[dbo].[Objects].[EID]), RESIDUAL:([testdb].[dbo].[Objects].[EID] = [testdb].[dbo].[Objects].[EID]) DEFINE:([partialagg1007]=COUNT_BIG([testdb].[dbo].[Objects].[PRICE]), [partialagg1009]=SUM([testdb].[dbo].[Objects].[PRICE])))
| |--Hash Match(Inner Join, HASH:([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID])=([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID]), RESIDUAL:([testdb].[dbo].[Objects].[YEAR] = [testdb].[dbo].[Objects].[YEAR] AND [testdb].[dbo].[Objects].[QUARTER] = [testdb].[dbo].[Objects].[QUARTER] AND [testdb].[dbo].[Objects].[OBID] = [testdb].[dbo].[Objects].[OBID]))
| |--Hash Match(Inner Join, HASH:([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID])=([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID]), RESIDUAL:([testdb].[dbo].[Objects].[YEAR] = [testdb].[dbo].[Objects].[YEAR] AND [testdb].[dbo].[Objects].[QUARTER] = [testdb].[dbo].[Objects].[QUARTER] AND [testdb].[dbo].[Objects].[OBID] = [testdb].[dbo].[Objects].[OBID]))
| | |--Index Seek(OBJECT:([testdb].[dbo].[Objects].[IX_objects_zip]), SEEK:([testdb].[dbo].[Objects].[ZIP] >= N'44' AND [testdb].[dbo].[Objects].[ZIP] < N'45'), WHERE:([testdb].[dbo].[Objects].[ZIP] like N'44%') ORDERED FORWARD)
| | |--Index Scan(OBJECT:([testdb].[dbo].[Objects].[IX_objects_eid]))
| |--Index Scan(OBJECT:([testdb].[dbo].[Objects].[IX_objects_price]))
|--Clustered Index Seek(OBJECT:([testdb].[dbo].[dim_estate].[PK_dim_Estate_eid]), SEEK:([testdb].[dbo].[dim_estate].[EID]=[testdb].[dbo].[Objects].[EID]), WHERE:([testdb].[dbo].[dim_estate].[Estate]='Villa') ORDERED FORWARD)

Hope you can read this language ;-)|||Your second query is much more explicit, it queries a much smaller number of rows on the estate dimension.

Once both queries are executed a few times so that the relevant data is loaded into cache, I would expect the queries to both run quickly, but the second one ought to run faster than the first one no matter how much RAM buffer you've got.

-PatP|||Thanks Pat, but isn't this strange? Look at this

(A)
Select AVG(Price) from dbo.objects
where eid=3 and zip like '44%'
(34 seconds)

is slower than

(B)
Select AVG(Price)
from dbo.objects join dbo.dim_estate on dbo.objects.eid=dbo.dim_estate.eid
where estate = 'Villa' and zip like '44%'
(13 seconds)

This doesn't make sense to me. Is it because of Microsoft SQL Server?
Ok, finally, the fastest query is the more userfriendly query (user does not have to know the right eid) but it still leaves kind of a bad taste. Just as you said (B) is better than (A).
Another question, is there something like Oracles index only table in sql server?|||There are apparently many things about your schema that are not intuitively obvious, based on the showplan output versus the code snippets that you've posted. My guess is that you've "simplified" the code snippets in some way, but posted the showplan output as it is actually generated by the SQL engine. Without a lot more "inside knowledge" to help me understand the differences, I can't offer a useful opinion.

The MS-SQL and Oracle database engines are radically different in the way that they do things. Each has its strong and weak points, neither is intrinsically "better" or "worse" than the other, they are just different. Because of the differences in implementation, I can't think of anything quite like the Oracle index only table in the Microsoft SQL environment for this example.

-PatP|||I only snipped Plan A. Showplan (b) belongs to Query (b). Nothing changed there. The Code for A originaly was

Select AVG(Price) from
dbo.objects join dbo.dim_estate on dbo.objects.eid=dbo.dim_estate.eid
where dbo.objects.eid=3 and zip like '44%'

Obviously, I do not really need the join here, that's why i left it. The execution time in both cases is the same for (a).

Anyway, here is the showplan for snipped (a) (without the fat text)

|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011]/CONVERT_IMPLICIT(decimal(19,0),[Expr1010],0) END))
|--Stream Aggregate(DEFINE:([Expr1010]=COUNT_BIG([testdb].[dbo].[Objects].[PRICE]), [Expr1011]=SUM([testdb].[dbo].[Objects].[PRICE])))
|--Nested Loops(Inner Join, OUTER REFERENCES:([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID], [Expr1009]) WITH UNORDERED PREFETCH)
|--Merge Join(Inner Join, MERGE:([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID])=([testdb].[dbo].[Objects].[YEAR], [testdb].[dbo].[Objects].[QUARTER], [testdb].[dbo].[Objects].[OBID]), RESIDUAL:([testdb].[dbo].[Objects].[YEAR] = [testdb].[dbo].[Objects].[YEAR] AND [testdb].[dbo].[Objects].[QUARTER] = [testdb].[dbo].[Objects].[QUARTER] AND [testdb].[dbo].[Objects].[OBID] = [testdb].[dbo].[Objects].[OBID]))
| |--Sort(ORDER BY:([testdb].[dbo].[Objects].[YEAR] ASC, [testdb].[dbo].[Objects].[QUARTER] ASC, [testdb].[dbo].[Objects].[OBID] ASC))
| | |--Index Seek(OBJECT:([testdb].[dbo].[Objects].[IX_objects_zip]), SEEK:([testdb].[dbo].[Objects].[ZIP] >= N'44' AND [testdb].[dbo].[Objects].[ZIP] < N'45'), WHERE:([testdb].[dbo].[Objects].[ZIP] like N'44%') ORDERED FORWARD)
| |--Index Seek(OBJECT:([testdb].[dbo].[Objects].[IX_objects_eid]), SEEK:([testdb].[dbo].[Objects].[EID]=(3.)) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([testdb].[dbo].[Objects].[PK_Objects_year_quarter_obid]), SEEK:([testdb].[dbo].[Objects].[YEAR]=[testdb].[dbo].[Objects].[YEAR] AND [testdb].[dbo].[Objects].[QUARTER]=[testdb].[dbo].[Objects].[QUARTER] AND [testdb].[dbo].[Objects].[OBID]=[testdb].[dbo].[Objects].[OBID]) LOOKUP ORDERED FORWARD)

No comments:

Post a Comment