Thursday, March 22, 2012

Changing the clustered index on a table - tips?

Hi,
SQL Server 2000
I've got a table that doesn't change during the day but has inserts
overnight. currently 4 million rows in the table and I would say no
more than 5000 rows being added nightly. Db server is totally
overloaded and sometimes queries on this table take a long time.
Queries usually filter on a client_id and a input_date range. Input
date range is not normally very selective. There can be up to about
100k rows per client_id. There are normally additional group by
clauses or further filters but they are not applied every time. Also
frequently used is a bit flag called billed. billed = 0 is very
selective (0-5%), billed = 1 is not (>95%). The clustering is
currently on the identity PK so is completely wasted - only benefit is
that the fill factor can be set to 90 as allrows are currently added
to the end of the table.
I want to change the clustering index to help improve performance of
queries. My initial thought is:
(client_id, input_date). Does this seem remotely sensible? Should it
be the other way round?
Given that I would always expect client_id to be used in a query,
would putting a non-clustered index on (billed) achieve anything? Or
on (client_id, billed)?
The suggested clustered index seems to work quite well and although
not making much difference to CPU time seems to cut logical reads by
about 40x. I think it will have more effect on Live which is much
busier and much less powerful than the dev server (yes I know that's
daft!)
I think this will increase fragmentation on the table. Should I be
worried about this - I intend to rebuild all indexes with a fillfactor
of 70? My plan would be to follow the following:
DROP all existing indexes and constraints
Create new clustered index with fillfactor of 70
recreate nonclustered indexes and constraints with fillfactor of 70
Is it okay to build a clustered index on a populated table or is it
better to create a new table, add clustered index and then populate,
then rename?
Many thanks.
Cheers,
James
Jim,
I have done similar(composite nonunique CI's for ranges) with good results
but it sounds like this is a big deal so I would test as much as possible. I
would also consider your NCI's during testing. Does adding a column or two
to an NCI cover more queries? 70% on the fillfactor sounds low to me if you
are just adding 5k rows\day. How often are you reindexing?
Jason Massie
www: http://statisticsio.com
rss: http://feeds.feedburner.com/statisticsio
"JimLad" <jamesdbirch@.yahoo.co.uk> wrote in message
news:c894d0b7-bd3e-4ab0-a890-33b3fdf094e7@.p73g2000hsd.googlegroups.com...
> Hi,
> SQL Server 2000
> I've got a table that doesn't change during the day but has inserts
> overnight. currently 4 million rows in the table and I would say no
> more than 5000 rows being added nightly. Db server is totally
> overloaded and sometimes queries on this table take a long time.
> Queries usually filter on a client_id and a input_date range. Input
> date range is not normally very selective. There can be up to about
> 100k rows per client_id. There are normally additional group by
> clauses or further filters but they are not applied every time. Also
> frequently used is a bit flag called billed. billed = 0 is very
> selective (0-5%), billed = 1 is not (>95%). The clustering is
> currently on the identity PK so is completely wasted - only benefit is
> that the fill factor can be set to 90 as allrows are currently added
> to the end of the table.
> I want to change the clustering index to help improve performance of
> queries. My initial thought is:
> (client_id, input_date). Does this seem remotely sensible? Should it
> be the other way round?
> Given that I would always expect client_id to be used in a query,
> would putting a non-clustered index on (billed) achieve anything? Or
> on (client_id, billed)?
> The suggested clustered index seems to work quite well and although
> not making much difference to CPU time seems to cut logical reads by
> about 40x. I think it will have more effect on Live which is much
> busier and much less powerful than the dev server (yes I know that's
> daft!)
> I think this will increase fragmentation on the table. Should I be
> worried about this - I intend to rebuild all indexes with a fillfactor
> of 70? My plan would be to follow the following:
> DROP all existing indexes and constraints
> Create new clustered index with fillfactor of 70
> recreate nonclustered indexes and constraints with fillfactor of 70
> Is it okay to build a clustered index on a populated table or is it
> better to create a new table, add clustered index and then populate,
> then rename?
> Many thanks.
> Cheers,
> James
|||If the database is in Simple or Bulk Logged mode the Creation and rebuilding
of the indexes can be minimally logged and faster as a result. Just be sure
to switch back to Full and take a full backup when done.
Andrew J. Kelly SQL MVP
Solid Quality Mentors
"JimLad" <jamesdbirch@.yahoo.co.uk> wrote in message
news:64479e9a-191e-4445-b52d-4a9a89ef8a13@.i29g2000prf.googlegroups.com...
On Feb 27, 2:03 am, "jason" <jason-r3m...@.statisticsio.com> wrote:
> Jim,
> I have done similar(composite nonunique CI's for ranges) with good results
> but it sounds like this is a big deal so I would test as much as possible.
> I
> would also consider your NCI's during testing. Does adding a column or two
> to an NCI cover more queries? 70% on the fillfactor sounds low to me if
> you
> are just adding 5k rows\day. How often are you reindexing?
> --
> Jason Massie
> www:http://statisticsio.com
> rss:http://feeds.feedburner.com/statisticsio
> "JimLad" <jamesdbi...@.yahoo.co.uk> wrote in message
> news:c894d0b7-bd3e-4ab0-a890-33b3fdf094e7@.p73g2000hsd.googlegroups.com...
>
>
>
>
>
>
>
> - Show quoted text -
Hi Jason,
I checked the maintenance plans and they get rebuilt every Sunday with
a fixed free space percentage of 10% - so fillfactor is irrelevant.
Anyway following on from your comments I'll specify fillfactor as 90.
I've had a play with the indexes and the suggested clustered index
works extremely well. The nonclustered one I suggested on billed or
client_id, billed is never actually used as it always favours the
clustered index seek with a WHERE clause on the billed column.
Thanks for your help.
Just one more question - the database is fully logged. Does it make
any difference to speed/transaction log size whether you recluster
the index in situ or in a new replacement table?
James

No comments:

Post a Comment