Friday, February 24, 2012

Index corruption on table without key: what could be the reasons ?

Dear group,

we are running a SQL-Server Database which is about 30 GB large. The
purpose of this database is to contain periodic data from automatic
devices which insert values into some tables.

Unfortunately most of these tables don't have a key (and a key can only
be introduced when the application programmers have changed their
software). Tables have this structure

deviceno timestamp data

where we expect for every device and timestamp one row of data.

In the ongoing operation it happens that the index of this large table
gets corrupted and a select from this table yields 2 rows for some
devices.

In fact a select "SELECT DEVICENO, TIMESTAMP, COUNT(*) FROM TABLE GROUP
BY DEVICENO, TIMESTAMP HAVING COUNT(*) > 1" returns lots of data.

After rebuild of the indexes the table is "clean" again.

What could cause the index corruption ?

Missing key?
Faulty application program ?
a combination of both ?

How can i prevent this from happening again, as long as there is no
updated database / application ?

I'd be grateful for any useful comment

Regards

UliJust to clarify a common misconception: key constraints are used to
enfore data integrity, and indexes are used to enhance performance.
While primary keys also introduce an index upon creation, they are not
the same thing; you can create indexes without having primary keys at
all (this is a technique that can be used in data warehouses, where the
normalization rules are a bit "looser"). Index fragmentation is
unrelated to the presence or absence of keys; it can be, however,
related to the underlying clustered index that is (by default)
associated with the primary key.

A couple of questions: do you have a clustered index on this table?
On what column is it located?

Do you know how to use the DBCC command SHOWCONTIG? It will help you
identify it is fragmentation, and where it is occurring.

http://www.sql-server-performance.c..._showcontig.asp

HTH,
Stu|||Hi

You do not say what the index is on, and whether it is a unique index.

You may want to read:
http://www.aspfaq.com/show.asp?id=2081
http://www.aspfaq.com/etiquette.asp?id=5006

John

<uli2003wien@.lycos.at> wrote in message
news:1126693834.052381.263790@.g44g2000cwa.googlegr oups.com...
> Dear group,
> we are running a SQL-Server Database which is about 30 GB large. The
> purpose of this database is to contain periodic data from automatic
> devices which insert values into some tables.
> Unfortunately most of these tables don't have a key (and a key can only
> be introduced when the application programmers have changed their
> software). Tables have this structure
> deviceno timestamp data
> where we expect for every device and timestamp one row of data.
> In the ongoing operation it happens that the index of this large table
> gets corrupted and a select from this table yields 2 rows for some
> devices.
> In fact a select "SELECT DEVICENO, TIMESTAMP, COUNT(*) FROM TABLE GROUP
> BY DEVICENO, TIMESTAMP HAVING COUNT(*) > 1" returns lots of data.
> After rebuild of the indexes the table is "clean" again.
> What could cause the index corruption ?
> Missing key?
> Faulty application program ?
> a combination of both ?
> How can i prevent this from happening again, as long as there is no
> updated database / application ?
> I'd be grateful for any useful comment
> Regards
> Uli|||(uli2003wien@.lycos.at) writes:
> we are running a SQL-Server Database which is about 30 GB large. The
> purpose of this database is to contain periodic data from automatic
> devices which insert values into some tables.
> Unfortunately most of these tables don't have a key (and a key can only
> be introduced when the application programmers have changed their
> software). Tables have this structure
> deviceno timestamp data
> where we expect for every device and timestamp one row of data.
> In the ongoing operation it happens that the index of this large table
> gets corrupted and a select from this table yields 2 rows for some
> devices.
> In fact a select "SELECT DEVICENO, TIMESTAMP, COUNT(*) FROM TABLE GROUP
> BY DEVICENO, TIMESTAMP HAVING COUNT(*) > 1" returns lots of data.
> After rebuild of the indexes the table is "clean" again.
> What could cause the index corruption ?
> Missing key?
> Faulty application program ?
> a combination of both ?

If the duplicates disappear after a DBCC DBREINDEX (or DROP + CREATE, then
it is the index that is corrupted.

I seem to recall that there is an issue with heap tables that could cause
this. (A heap table is a table that does not have a clustered index.)
Can you define the index as clustered? Even better if you can add UNIQUE to
enforce uniqueness. Then again, it sounds as if the application is able
to insert duplicates?

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||dont ever make a single table in your whole life that doesn't have a
single-column identity primary key|||Please tell me you're joking.

Stu|||<uli2003wien@.lycos.at> wrote:

> Dear group,
> we are running a SQL-Server Database which is about 30 GB large. The
> purpose of this database is to contain periodic data from automatic
> devices which insert values into some tables.
> Unfortunately most of these tables don't have a key (and a key can only
> be introduced when the application programmers have changed their
> software). Tables have this structure
> deviceno timestamp data
> where we expect for every device and timestamp one row of data.
> In the ongoing operation it happens that the index of this large table
> gets corrupted and a select from this table yields 2 rows for some
> devices.
> In fact a select "SELECT DEVICENO, TIMESTAMP, COUNT(*) FROM TABLE GROUP
> BY DEVICENO, TIMESTAMP HAVING COUNT(*) > 1" returns lots of data.
> After rebuild of the indexes the table is "clean" again.
> What could cause the index corruption ?
> Missing key?
> Faulty application program ?
> a combination of both ?
> How can i prevent this from happening again, as long as there is no
> updated database / application ?
> I'd be grateful for any useful comment
> Regards
> Uli

Uli,

Depending on how often your devices are firing inserts and how the timestamp
column is defined, I would expect to find multiple rows for a single
deviceno and timestamp combination (especially if you're using
smalldatetime).

But if I understand you correctly, you drop and re-create an index (or use
DBCC REINDEX) and the SELECT you posted returns different results (and this
happens consistently). If that's the case, you need to call MS support:
that's not supposed to happen.

Craig|||<dbahooker@.hotmail.com> wrote in message
news:1127919145.122247.19780@.g49g2000cwa.googlegro ups.com...
> dont ever make a single table in your whole life that doesn't have a
> single-column identity primary key

This is amazingly poor advice.
|||why?

aren't all indexes slower if you have your clustered index on a
multi-column varchar field?

i mean.. INT IDENTITY EVERYWHERE|||no greg im serious

why is it bad advice?

i just dont believe in tables that dont have a single numeric primary
key.

I'm just too friggin tired to use 'A345FD' as a primary key anymore..
use INTs or BIGINTs if you want. i would rather use a bigint than a
varchar any day of the week

but.. i mean.. no keys?

are you drunk?|||(dbahooker@.hotmail.com) writes:
> no greg im serious
> why is it bad advice?
> i just dont believe in tables that dont have a single numeric primary
> key.
> I'm just too friggin tired to use 'A345FD' as a primary key anymore..
> use INTs or BIGINTs if you want. i would rather use a bigint than a
> varchar any day of the week
> but.. i mean.. no keys?
> are you drunk?

Greg is usually not drunk when he posts. In fact, I have never felt
any smell of liquor from his posts.

Having a one-column key in all tables is actually a really poor idea.
Typical example: an order has a one-column key, which is an orderid.
But a detail row on a order has a two column key: an orderid and a
row number (or a product id).

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||<dbahooker@.hotmail.com> wrote in message
news:1129052916.884917.318460@.o13g2000cwo.googlegr oups.com...
> no greg im serious
> why is it bad advice?

Because it doesn't necessarily model reality.

> i just dont believe in tables that dont have a single numeric primary
> key.

So if you were designing a database of cars you'd create your own single
numeric primary key rather than VIN?

Or if you were designing a database for Books you'd eschew the ISBN and use
an Identity column?

> I'm just too friggin tired to use 'A345FD' as a primary key anymore..

Then I suggest you get some more sleep.

> use INTs or BIGINTs if you want. i would rather use a bigint than a
> varchar any day of the week
> but.. i mean.. no keys?

Who said anything about no keys. Certainly not I. It's not a table w/o
keys.

> are you drunk?

Not hardly.
|||Erland Sommarskog wrote:
> Having a one-column key in all tables is actually a really poor idea.
> Typical example: an order has a one-column key, which is an orderid.
> But a detail row on a order has a two column key: an orderid and a
> row number (or a product id).

A problem arises when you have more than a simple system. You have:

Project (ProjectNo)
Order (ProjectNo, OrderNo)
Items (ProjectNo, OrderNo, ItemNo)
Expediting Split components [1] (ProjectNo, OrderNo, ItemNo, CompSplit)
Expediting Split Deliveries (ProjectNo, OrderNo, ItemNo, CompSplit,
DelSplit)
In real world, expeditor has no idea that freight forwarder can't fit
1,000,000 bricks on one truck (don't laff, I've seen it happen) so MMTs
(Material Movement Ticket) are split (ProjectNo, OrderNo, ItemNo,
CompSplit, DelSplit, MMTNo)
The MMT may go onto a container, then later on is split into trucks
(that move at different speeds), or some items get held longer in
customs than others resulting in many MRRs (Material Received Report)
for one MMT, MMR Item has a key that looks like a complete table before
any meaningful data gets there.

Now, given that having 7 columns in a PK is stupid, do you:

a) Give a surrogate key to some tables and a natural key to others and
thereby be inconsistent or

b) Be consistent and use surrogate keys?

Given that no-one in their right mind would use just a surrogate key as
the unique constraint but also have a candidate key so no real
duplicates can arise. Also that these surrogate keys (Identity columns)
are *never* presented to the user, only the computer knows they exist.

[1] e.g. you order 1 x lifeboat but supplier later advises split
delivery of 1 x dingy, 6 x oars, 6 x life jackets, 1 x motor, etc.|||First, your advice indicates that you don't understand the difference
between an index and a key; since you seem to have a background in
Access programming, that's understandable because Access goes out of
its way to hide the distinction. However, in SQL Server, indexes are
very different than keys, with different purposes. A key is used to
establish and maintain data integrity; an index is used to expedite
data retrieval.

Second, (as others have pointed out) a surrogate key has problems. It
doesn't reflect reality and it cannot be used for data validation.
However, I will grant you that using a simple surrogate key (like an
integer) can have benefits in terms of performance, scalability, and
data maintenance; however, those benefits come with a cost.

Which leads to my third point: by specifying an absolute, you have
pigeonholed your designs. It's a bit like a mechanic insisting on
using a #2 Phillips head bit on a #3 hole; all you get is a stripped
out, screwed up design. Again, using surrogate keys with a clustered
index is NOT a bad idea in the right situation; but in the wrong
situation, it's a dumb idea.

Stu|||Trevor Best (nospam@.localhost.invalid) writes:
> Erland Sommarskog wrote:
>> Having a one-column key in all tables is actually a really poor idea.
>> Typical example: an order has a one-column key, which is an orderid.
>> But a detail row on a order has a two column key: an orderid and a
>> row number (or a product id).
> A problem arises when you have more than a simple system. You have:

Dunno. With something like 500 tables in the data model I maintain, I
guess that it qualifies as "more than a simple system". And, no, not
all of them have one-column keys.

> Project (ProjectNo)
> Order (ProjectNo, OrderNo)
> Items (ProjectNo, OrderNo, ItemNo)
> Expediting Split components [1] (ProjectNo, OrderNo, ItemNo, CompSplit)
> Expediting Split Deliveries (ProjectNo, OrderNo, ItemNo, CompSplit,
> DelSplit)
> In real world, expeditor has no idea that freight forwarder can't fit
> 1,000,000 bricks on one truck (don't laff, I've seen it happen) so MMTs
> (Material Movement Ticket) are split (ProjectNo, OrderNo, ItemNo,
> CompSplit, DelSplit, MMTNo)
> The MMT may go onto a container, then later on is split into trucks
> (that move at different speeds), or some items get held longer in
> customs than others resulting in many MRRs (Material Received Report)
> for one MMT, MMR Item has a key that looks like a complete table before
> any meaningful data gets there.
> Now, given that having 7 columns in a PK is stupid, do you:

I don't know the business domain well enough, so I prefer to not comment
on the details.

But I was once in the situation where I had a table with a four-column
key, and a need arose for a subtable with two more keys. So, I added
a surrogate to the existing table.

Much later I had reason to write code to maintain data in those tables,
and the UPDATE/INSERT for the subtable were complex. Because of the
surrogate key.

That surrogate key is no more by the way. (But for other reasons, two
other key levels have also gone away.)

On the top of my head I know we have a table with a five-column key. But
I would shudder for a seven-colunm key if it was the natural one.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment