Thursday, March 22, 2012

Changing the clustered index on a table - tips?

Hi,
SQL Server 2000
I've got a table that doesn't change during the day but has inserts
overnight. currently 4 million rows in the table and I would say no
more than 5000 rows being added nightly. Db server is totally
overloaded and sometimes queries on this table take a long time.
Queries usually filter on a client_id and a input_date range. Input
date range is not normally very selective. There can be up to about
100k rows per client_id. There are normally additional group by
clauses or further filters but they are not applied every time. Also
frequently used is a bit flag called billed. billed = 0 is very
selective (0-5%), billed = 1 is not (>95%). The clustering is
currently on the identity PK so is completely wasted - only benefit is
that the fill factor can be set to 90 as allrows are currently added
to the end of the table.
I want to change the clustering index to help improve performance of
queries. My initial thought is:
(client_id, input_date). Does this seem remotely sensible? Should it
be the other way round?
Given that I would always expect client_id to be used in a query,
would putting a non-clustered index on (billed) achieve anything? Or
on (client_id, billed)?
The suggested clustered index seems to work quite well and although
not making much difference to CPU time seems to cut logical reads by
about 40x. I think it will have more effect on Live which is much
busier and much less powerful than the dev server (yes I know that's
daft!)
I think this will increase fragmentation on the table. Should I be
worried about this - I intend to rebuild all indexes with a fillfactor
of 70? My plan would be to follow the following:
DROP all existing indexes and constraints
Create new clustered index with fillfactor of 70
recreate nonclustered indexes and constraints with fillfactor of 70
Is it okay to build a clustered index on a populated table or is it
better to create a new table, add clustered index and then populate,
then rename?
Many thanks.
Cheers,
JamesJim,
I have done similar(composite nonunique CI's for ranges) with good results
but it sounds like this is a big deal so I would test as much as possible. I
would also consider your NCI's during testing. Does adding a column or two
to an NCI cover more queries? 70% on the fillfactor sounds low to me if you
are just adding 5k rows\day. How often are you reindexing?
--
Jason Massie
www: http://statisticsio.com
rss: http://feeds.feedburner.com/statisticsio
"JimLad" <jamesdbirch@.yahoo.co.uk> wrote in message
news:c894d0b7-bd3e-4ab0-a890-33b3fdf094e7@.p73g2000hsd.googlegroups.com...
> Hi,
> SQL Server 2000
> I've got a table that doesn't change during the day but has inserts
> overnight. currently 4 million rows in the table and I would say no
> more than 5000 rows being added nightly. Db server is totally
> overloaded and sometimes queries on this table take a long time.
> Queries usually filter on a client_id and a input_date range. Input
> date range is not normally very selective. There can be up to about
> 100k rows per client_id. There are normally additional group by
> clauses or further filters but they are not applied every time. Also
> frequently used is a bit flag called billed. billed = 0 is very
> selective (0-5%), billed = 1 is not (>95%). The clustering is
> currently on the identity PK so is completely wasted - only benefit is
> that the fill factor can be set to 90 as allrows are currently added
> to the end of the table.
> I want to change the clustering index to help improve performance of
> queries. My initial thought is:
> (client_id, input_date). Does this seem remotely sensible? Should it
> be the other way round?
> Given that I would always expect client_id to be used in a query,
> would putting a non-clustered index on (billed) achieve anything? Or
> on (client_id, billed)?
> The suggested clustered index seems to work quite well and although
> not making much difference to CPU time seems to cut logical reads by
> about 40x. I think it will have more effect on Live which is much
> busier and much less powerful than the dev server (yes I know that's
> daft!)
> I think this will increase fragmentation on the table. Should I be
> worried about this - I intend to rebuild all indexes with a fillfactor
> of 70? My plan would be to follow the following:
> DROP all existing indexes and constraints
> Create new clustered index with fillfactor of 70
> recreate nonclustered indexes and constraints with fillfactor of 70
> Is it okay to build a clustered index on a populated table or is it
> better to create a new table, add clustered index and then populate,
> then rename?
> Many thanks.
> Cheers,
> James|||On Feb 27, 2:03=A0am, "jason" <jason-r3m...@.statisticsio.com> wrote:
> Jim,
> I have done similar(composite nonunique CI's for ranges) with good results=
> but it sounds like this is a big deal so I would test as much as possible.= I
> would also consider your NCI's during testing. Does adding a column or two=
> to an NCI cover more queries? 70% on the fillfactor sounds low to me if yo=u
> are just adding 5k rows\day. How often are you reindexing?
> --
> Jason Massie
> www:http://statisticsio.com
> rss:http://feeds.feedburner.com/statisticsio
> "JimLad" <jamesdbi...@.yahoo.co.uk> wrote in message
> news:c894d0b7-bd3e-4ab0-a890-33b3fdf094e7@.p73g2000hsd.googlegroups.com...
>
> > Hi,
> > SQL Server 2000
> > I've got a table that doesn't change during the day but has inserts
> > overnight. currently 4 million rows in the table and I would say no
> > more than 5000 rows being added nightly. Db server is totally
> > overloaded and sometimes queries on this table take a long time.
> > Queries usually filter on a client_id and a input_date range. Input
> > date range is not normally very selective. There can be up to about
> > 100k rows per client_id. There are normally additional group by
> > clauses or further filters but they are not applied every time. Also
> > frequently used is a bit flag =A0called billed. billed =3D 0 is very
> > selective (0-5%), billed =3D 1 is not (>95%). The clustering is
> > currently on the identity PK so is completely wasted - only benefit is
> > that the fill factor can be set to 90 as allrows are currently added
> > to the end of the table.
> > I want to change the clustering index to help improve performance of
> > queries. My initial thought is:
> > (client_id, input_date). Does this seem remotely sensible? Should it
> > be the other way round?
> > Given that I would always expect client_id to be used in a query,
> > would putting a non-clustered index on (billed) achieve anything? Or
> > on (client_id, billed)?
> > The suggested clustered index seems to work quite well and although
> > not making much difference to CPU time seems to cut logical reads by
> > about 40x. I think it will have more effect on Live which is much
> > busier and much less powerful than the dev server (yes I know that's
> > daft!)
> > I think this will increase fragmentation on the table. Should I be
> > worried about this - I intend to rebuild all indexes with a fillfactor
> > of 70? My plan would be to follow the following:
> > DROP all existing indexes and constraints
> > Create new clustered index with fillfactor of 70
> > recreate nonclustered indexes and constraints with fillfactor of 70
> > Is it okay to build a clustered index on a populated table or is it
> > better to create a new table, add clustered index and then populate,
> > then rename?
> > Many thanks.
> > Cheers,
> > James- Hide quoted text -
> - Show quoted text -
Hi Jason,
I checked the maintenance plans and they get rebuilt every Sunday with
a fixed free space percentage of 10% - so fillfactor is irrelevant.
Anyway following on from your comments I'll specify fillfactor as 90.
I've had a play with the indexes and the suggested clustered index
works extremely well. The nonclustered one I suggested on billed or
client_id, billed is never actually used as it always favours the
clustered index seek with a WHERE clause on the billed column.
Thanks for your help.
Just one more question - the database is fully logged. Does it make
any difference to speed/transaction log size whether you recluster
the index in situ or in a new replacement table?
James|||If the database is in Simple or Bulk Logged mode the Creation and rebuilding
of the indexes can be minimally logged and faster as a result. Just be sure
to switch back to Full and take a full backup when done.
--
Andrew J. Kelly SQL MVP
Solid Quality Mentors
"JimLad" <jamesdbirch@.yahoo.co.uk> wrote in message
news:64479e9a-191e-4445-b52d-4a9a89ef8a13@.i29g2000prf.googlegroups.com...
On Feb 27, 2:03 am, "jason" <jason-r3m...@.statisticsio.com> wrote:
> Jim,
> I have done similar(composite nonunique CI's for ranges) with good results
> but it sounds like this is a big deal so I would test as much as possible.
> I
> would also consider your NCI's during testing. Does adding a column or two
> to an NCI cover more queries? 70% on the fillfactor sounds low to me if
> you
> are just adding 5k rows\day. How often are you reindexing?
> --
> Jason Massie
> www:http://statisticsio.com
> rss:http://feeds.feedburner.com/statisticsio
> "JimLad" <jamesdbi...@.yahoo.co.uk> wrote in message
> news:c894d0b7-bd3e-4ab0-a890-33b3fdf094e7@.p73g2000hsd.googlegroups.com...
>
> > Hi,
> > SQL Server 2000
> > I've got a table that doesn't change during the day but has inserts
> > overnight. currently 4 million rows in the table and I would say no
> > more than 5000 rows being added nightly. Db server is totally
> > overloaded and sometimes queries on this table take a long time.
> > Queries usually filter on a client_id and a input_date range. Input
> > date range is not normally very selective. There can be up to about
> > 100k rows per client_id. There are normally additional group by
> > clauses or further filters but they are not applied every time. Also
> > frequently used is a bit flag called billed. billed = 0 is very
> > selective (0-5%), billed = 1 is not (>95%). The clustering is
> > currently on the identity PK so is completely wasted - only benefit is
> > that the fill factor can be set to 90 as allrows are currently added
> > to the end of the table.
> > I want to change the clustering index to help improve performance of
> > queries. My initial thought is:
> > (client_id, input_date). Does this seem remotely sensible? Should it
> > be the other way round?
> > Given that I would always expect client_id to be used in a query,
> > would putting a non-clustered index on (billed) achieve anything? Or
> > on (client_id, billed)?
> > The suggested clustered index seems to work quite well and although
> > not making much difference to CPU time seems to cut logical reads by
> > about 40x. I think it will have more effect on Live which is much
> > busier and much less powerful than the dev server (yes I know that's
> > daft!)
> > I think this will increase fragmentation on the table. Should I be
> > worried about this - I intend to rebuild all indexes with a fillfactor
> > of 70? My plan would be to follow the following:
> > DROP all existing indexes and constraints
> > Create new clustered index with fillfactor of 70
> > recreate nonclustered indexes and constraints with fillfactor of 70
> > Is it okay to build a clustered index on a populated table or is it
> > better to create a new table, add clustered index and then populate,
> > then rename?
> > Many thanks.
> > Cheers,
> > James- Hide quoted text -
> - Show quoted text -
Hi Jason,
I checked the maintenance plans and they get rebuilt every Sunday with
a fixed free space percentage of 10% - so fillfactor is irrelevant.
Anyway following on from your comments I'll specify fillfactor as 90.
I've had a play with the indexes and the suggested clustered index
works extremely well. The nonclustered one I suggested on billed or
client_id, billed is never actually used as it always favours the
clustered index seek with a WHERE clause on the billed column.
Thanks for your help.
Just one more question - the database is fully logged. Does it make
any difference to speed/transaction log size whether you recluster
the index in situ or in a new replacement table?
James|||On Feb 27, 1:48=A0pm, "Andrew J. Kelly" <sqlmvpnooos...@.shadhawk.com>
wrote:
> If the database is in Simple or Bulk Logged mode the Creation and rebuildi=ng
> of the indexes can be minimally logged and faster as a result. Just be sur=e
> to switch back to Full and take a full backup when done.
> --
> Andrew J. Kelly =A0 =A0SQL MVP
> Solid Quality Mentors
> "JimLad" <jamesdbi...@.yahoo.co.uk> wrote in message
> news:64479e9a-191e-4445-b52d-4a9a89ef8a13@.i29g2000prf.googlegroups.com...
> On Feb 27, 2:03 am, "jason" <jason-r3m...@.statisticsio.com> wrote:
>
>
> > Jim,
> > I have done similar(composite nonunique CI's for ranges) with good resul=ts
> > but it sounds like this is a big deal so I would test as much as possibl=e.
> > I
> > would also consider your NCI's during testing. Does adding a column or t=wo
> > to an NCI cover more queries? 70% on the fillfactor sounds low to me if
> > you
> > are just adding 5k rows\day. How often are you reindexing?
> > --
> > Jason Massie
> > www:http://statisticsio.com
> > rss:http://feeds.feedburner.com/statisticsio
> > "JimLad" <jamesdbi...@.yahoo.co.uk> wrote in message
> >news:c894d0b7-bd3e-4ab0-a890-33b3fdf094e7@.p73g2000hsd.googlegroups.com...=
> > > Hi,
> > > SQL Server 2000
> > > I've got a table that doesn't change during the day but has inserts
> > > overnight. currently 4 million rows in the table and I would say no
> > > more than 5000 rows being added nightly. Db server is totally
> > > overloaded and sometimes queries on this table take a long time.
> > > Queries usually filter on a client_id and a input_date range. Input
> > > date range is not normally very selective. There can be up to about
> > > 100k rows per client_id. There are normally additional group by
> > > clauses or further filters but they are not applied every time. Also
> > > frequently used is a bit flag called billed. billed =3D 0 is very
> > > selective (0-5%), billed =3D 1 is not (>95%). The clustering is
> > > currently on the identity PK so is completely wasted - only benefit is=
> > > that the fill factor can be set to 90 as allrows are currently added
> > > to the end of the table.
> > > I want to change the clustering index to help improve performance of
> > > queries. My initial thought is:
> > > (client_id, input_date). Does this seem remotely sensible? Should it
> > > be the other way round?
> > > Given that I would always expect client_id to be used in a query,
> > > would putting a non-clustered index on (billed) achieve anything? Or
> > > on (client_id, billed)?
> > > The suggested clustered index seems to work quite well and although
> > > not making much difference to CPU time seems to cut logical reads by
> > > about 40x. I think it will have more effect on Live which is much
> > > busier and much less powerful than the dev server (yes I know that's
> > > daft!)
> > > I think this will increase fragmentation on the table. Should I be
> > > worried about this - I intend to rebuild all indexes with a fillfactor=
> > > of 70? My plan would be to follow the following:
> > > DROP all existing indexes and constraints
> > > Create new clustered index with fillfactor of 70
> > > recreate nonclustered indexes and constraints with fillfactor of 70
> > > Is it okay to build a clustered index on a populated table or is it
> > > better to create a new table, add clustered index and then populate,
> > > then rename?
> > > Many thanks.
> > > Cheers,
> > > James- Hide quoted text -
> > - Show quoted text -
> Hi Jason,
> I checked the maintenance plans and they get rebuilt every Sunday with
> a fixed free space percentage of 10% - so fillfactor is irrelevant.
> Anyway following on from your comments I'll specify fillfactor as 90.
> I've had a play with the indexes and the suggested clustered index
> works extremely well. The nonclustered one I suggested on billed =A0or
> client_id, billed is never actually used as it always favours the
> clustered index seek with a WHERE clause on the billed column.
> Thanks for your help.
> Just one more question - the database is fully logged. Does it make
> any difference =A0to speed/transaction log size whether you recluster
> the index in situ or in a new replacement table?
> James- Hide quoted text -
> - Show quoted text -
Thanks. I'll do that.
What happens if a transaction log backup runs before the next full
backup? Will it fall over or just be corrupt? If it falls over, does
it bring the db down?
In other words, can I just leave the maintenance plans running knowing
that a couple of log backups will fail or do I need to either disable
them or do an immediate manual full backup? We're set up for half
hourly log backups. I'll probably make the change just after 6pm with
users off the system and the nightly full backup is at 9pm. Noone will
be using the db in the meantime.
James|||<<What happens if a transaction log backup runs before the next full
backup?>>
It depends on version and details. If the db is in simple mode when you execute the BACKUP LOG
command, you get an error message; regardless of version.
If db was in full and you did log backups and you then put it to simple and then backup to full and
you now do a log backup, then:
In 2000, the log backup is produced, with a lame message (not error) that it isn't usable.
In 2005, you get an error message.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi
"JimLad" <jamesdbirch@.yahoo.co.uk> wrote in message
news:19208e61-4e35-450e-8da2-d36208ad7cbe@.n75g2000hsh.googlegroups.com...
On Feb 27, 1:48 pm, "Andrew J. Kelly" <sqlmvpnooos...@.shadhawk.com>
wrote:
> If the database is in Simple or Bulk Logged mode the Creation and rebuilding
> of the indexes can be minimally logged and faster as a result. Just be sure
> to switch back to Full and take a full backup when done.
> --
> Andrew J. Kelly SQL MVP
> Solid Quality Mentors
> "JimLad" <jamesdbi...@.yahoo.co.uk> wrote in message
> news:64479e9a-191e-4445-b52d-4a9a89ef8a13@.i29g2000prf.googlegroups.com...
> On Feb 27, 2:03 am, "jason" <jason-r3m...@.statisticsio.com> wrote:
>
>
> > Jim,
> > I have done similar(composite nonunique CI's for ranges) with good results
> > but it sounds like this is a big deal so I would test as much as possible.
> > I
> > would also consider your NCI's during testing. Does adding a column or two
> > to an NCI cover more queries? 70% on the fillfactor sounds low to me if
> > you
> > are just adding 5k rows\day. How often are you reindexing?
> > --
> > Jason Massie
> > www:http://statisticsio.com
> > rss:http://feeds.feedburner.com/statisticsio
> > "JimLad" <jamesdbi...@.yahoo.co.uk> wrote in message
> >news:c894d0b7-bd3e-4ab0-a890-33b3fdf094e7@.p73g2000hsd.googlegroups.com...
> > > Hi,
> > > SQL Server 2000
> > > I've got a table that doesn't change during the day but has inserts
> > > overnight. currently 4 million rows in the table and I would say no
> > > more than 5000 rows being added nightly. Db server is totally
> > > overloaded and sometimes queries on this table take a long time.
> > > Queries usually filter on a client_id and a input_date range. Input
> > > date range is not normally very selective. There can be up to about
> > > 100k rows per client_id. There are normally additional group by
> > > clauses or further filters but they are not applied every time. Also
> > > frequently used is a bit flag called billed. billed = 0 is very
> > > selective (0-5%), billed = 1 is not (>95%). The clustering is
> > > currently on the identity PK so is completely wasted - only benefit is
> > > that the fill factor can be set to 90 as allrows are currently added
> > > to the end of the table.
> > > I want to change the clustering index to help improve performance of
> > > queries. My initial thought is:
> > > (client_id, input_date). Does this seem remotely sensible? Should it
> > > be the other way round?
> > > Given that I would always expect client_id to be used in a query,
> > > would putting a non-clustered index on (billed) achieve anything? Or
> > > on (client_id, billed)?
> > > The suggested clustered index seems to work quite well and although
> > > not making much difference to CPU time seems to cut logical reads by
> > > about 40x. I think it will have more effect on Live which is much
> > > busier and much less powerful than the dev server (yes I know that's
> > > daft!)
> > > I think this will increase fragmentation on the table. Should I be
> > > worried about this - I intend to rebuild all indexes with a fillfactor
> > > of 70? My plan would be to follow the following:
> > > DROP all existing indexes and constraints
> > > Create new clustered index with fillfactor of 70
> > > recreate nonclustered indexes and constraints with fillfactor of 70
> > > Is it okay to build a clustered index on a populated table or is it
> > > better to create a new table, add clustered index and then populate,
> > > then rename?
> > > Many thanks.
> > > Cheers,
> > > James- Hide quoted text -
> > - Show quoted text -
> Hi Jason,
> I checked the maintenance plans and they get rebuilt every Sunday with
> a fixed free space percentage of 10% - so fillfactor is irrelevant.
> Anyway following on from your comments I'll specify fillfactor as 90.
> I've had a play with the indexes and the suggested clustered index
> works extremely well. The nonclustered one I suggested on billed or
> client_id, billed is never actually used as it always favours the
> clustered index seek with a WHERE clause on the billed column.
> Thanks for your help.
> Just one more question - the database is fully logged. Does it make
> any difference to speed/transaction log size whether you recluster
> the index in situ or in a new replacement table?
> James- Hide quoted text -
> - Show quoted text -
Thanks. I'll do that.
What happens if a transaction log backup runs before the next full
backup? Will it fall over or just be corrupt? If it falls over, does
it bring the db down?
In other words, can I just leave the maintenance plans running knowing
that a couple of log backups will fail or do I need to either disable
them or do an immediate manual full backup? We're set up for half
hourly log backups. I'll probably make the change just after 6pm with
users off the system and the nightly full backup is at 9pm. Noone will
be using the db in the meantime.
James

No comments:

Post a Comment