database collation: Changing join order based on parameters

Hi,
I keep coming across a performance problem in SQL that I feel must have
an elegant solution, I just can't find it.
The problem is quite obvious:
I have a query that performs a search based on some search parameters
which are given default values in the query of null.
I have a heirachical structure of tables which might be used in the
filtering
e.g.
SELECT *
FROM TableA
INNER JOIN TableB on FKa = FKb
INNER JOIN TableC on FKb = FKc
INNER JOIN TableD on FKc = FKd
I filter the result by adding to these join filters something like
SELECT *
FROM TableA
INNER JOIN TableB on FKa = FKb and (@.filterCol1 is null OR TableB.col1
= @.filterCol1 )
etc...
The problem comes when there are a few hundred thousand rows in each
table, and my search wants to filter on a parameter which affects
TableD. It appears from the execution plan (and indeed only seems
logical) that SQL has to join the entirity of tables A,B and C on their
foreign keys, before joining to D in order to filter. This of course
makes the query take ages.
I've got ideas of how I could get around this, but they are at best
"clunky".
for example:
1) I could have several different "selects" each joining in a
different order based on the parameters that are set.
2) I could build up a "filter" temporary table based on the parameters
passed, then only do the select once, starting with this filter table
first
3) I could come up with a nice elegant solution which currently eludes
me.
Any suggestions?
WillWill

> SELECT *
> FROM TableA
> INNER JOIN TableB on FKa = FKb and (@.filterCol1 is null OR TableB.col1
> = @.filterCol1 )
SELECT *
FROM TableA
INNER JOIN TableB on FKa = FKb OR TableB.col1 =
COALESCE(@.filterCol1,TableB.col1 )
WHERE @.filterCol1 IS NULL
"Will" <william_pegg@.yahoo.co.uk> wrote in message
news:1143715517.502232.295550@.z34g2000cwc.googlegroups.com...
> Hi,
> I keep coming across a performance problem in SQL that I feel must have
> an elegant solution, I just can't find it.
> The problem is quite obvious:
> I have a query that performs a search based on some search parameters
> which are given default values in the query of null.
> I have a heirachical structure of tables which might be used in the
> filtering
> e.g.
> SELECT *
> FROM TableA
> INNER JOIN TableB on FKa = FKb
> INNER JOIN TableC on FKb = FKc
> INNER JOIN TableD on FKc = FKd
> I filter the result by adding to these join filters something like
> SELECT *
> FROM TableA
> INNER JOIN TableB on FKa = FKb and (@.filterCol1 is null OR TableB.col1
> = @.filterCol1 )
> etc...
> The problem comes when there are a few hundred thousand rows in each
> table, and my search wants to filter on a parameter which affects
> TableD. It appears from the execution plan (and indeed only seems
> logical) that SQL has to join the entirity of tables A,B and C on their
> foreign keys, before joining to D in order to filter. This of course
> makes the query take ages.
> I've got ideas of how I could get around this, but they are at best
> "clunky".
> for example:
> 1) I could have several different "selects" each joining in a
> different order based on the parameters that are set.
> 2) I could build up a "filter" temporary table based on the parameters
> passed, then only do the select once, starting with this filter table
> first
> 3) I could come up with a nice elegant solution which currently eludes
> me.
> Any suggestions?
> Will
>|||Uri,
Thanks for replying.
I must admit I'm having trouble fully understanding what the aim of the
change is. As I see it if I pass something not null as @.filterCol1 then
I won't get any results back. This isn't what I wanted. Essentially the
point of the @.filtercol1 was to indicate that I'm using optional
parameters to filter it, so sometimes I want to filter on @.filterCol1,
sometimes the user might not select any filter for that, in which case
I'll want to filter on another variable @.filterCol2 (sorry I should
have put a fuller example in, I was abreviating it and only using one
variable). This behaviour in itself isn't causing me any problems. My
issue is that if the user only selects a parameter which filters on
tableD, I need this to be used to optimise my query. e.g.:
let's say the user selects @.filterCol1 = null and @.filterCol2 = 27
SELECT *
FROM TableA
INNER JOIN TableB on FKa = FKb and (@.filterCol1 is null or @.filterCol1
= TableB.Col1)
INNER JOIN TableC on FKb = FKc
INNER JOIN TableD on FKc = FKd and (@.filterCol2 is null or @.filterCol2
= TableD.Col2)
I need my sql query to realise that TableD is going to be the bounding
result set, evaluate this one first, then use that to restrict the
scope of the joins on the other 3 tables. However if the user were to
put a value into @.filterCol1 and null for @.filterCol2 then I would need
it to evaluate tableB first as this will be the smallest result set.
I don't think there's going to be a perfect answer, but this problem
must be encountered a lot.
Will|||Will
Well if I understood you cannot dictate to optimizer what order of joins to
be chosen. Yes, there are some JOIN hints but not in this case I think. Have
you considered ( I don't know your business requirements) to separe this
SELECT statement i mean JOIN two tables/three tables based on
parameteters that you've got?
"Will" <william_pegg@.yahoo.co.uk> wrote in message
news:1143720216.439397.65770@.t31g2000cwb.googlegroups.com...
> Uri,
> Thanks for replying.
> I must admit I'm having trouble fully understanding what the aim of the
> change is. As I see it if I pass something not null as @.filterCol1 then
> I won't get any results back. This isn't what I wanted. Essentially the
> point of the @.filtercol1 was to indicate that I'm using optional
> parameters to filter it, so sometimes I want to filter on @.filterCol1,
> sometimes the user might not select any filter for that, in which case
> I'll want to filter on another variable @.filterCol2 (sorry I should
> have put a fuller example in, I was abreviating it and only using one
> variable). This behaviour in itself isn't causing me any problems. My
> issue is that if the user only selects a parameter which filters on
> tableD, I need this to be used to optimise my query. e.g.:
> let's say the user selects @.filterCol1 = null and @.filterCol2 = 27
> SELECT *
> FROM TableA
> INNER JOIN TableB on FKa = FKb and (@.filterCol1 is null or @.filterCol1
> = TableB.Col1)
> INNER JOIN TableC on FKb = FKc
> INNER JOIN TableD on FKc = FKd and (@.filterCol2 is null or @.filterCol2
> = TableD.Col2)
> I need my sql query to realise that TableD is going to be the bounding
> result set, evaluate this one first, then use that to restrict the
> scope of the joins on the other 3 tables. However if the user were to
> put a value into @.filterCol1 and null for @.filterCol2 then I would need
> it to evaluate tableB first as this will be the smallest result set.
> I don't think there's going to be a perfect answer, but this problem
> must be encountered a lot.
> Will
>|||Uri,
Yes, splitting the select out into different ones dependent on
parameters seems to be the only way to go (but unfortunately not an
option as I won't be able to justify a potentially breaking change such
as that for performance increases). I was hoping that there would be
some nice sql trick that allows you to always hit the smaller tables
first.
Thanks for your help
Will|||If not specified otherwise (through the use of join hints) the Query
Optimizer will always choose the best order according to indexes, statistics
,
etc.
ML
http://milambda.blogspot.com/|||I know that the optimiser will try, and that I can override with join
hints, but in this case the optimiser can't help because the logic of
the join forces it to be evaluated in that order. What I was hoping for
was some kind of new way of laying out the query such that I could
"enter" my data structure from different directions depending on which
one has parameters defined to filter it. I suspect it's not possible,
but doesn't anyone else find this a problem? is there some better way
to either lay out the tables or to have filter procs with multiple
optional parameters?
Cheers
Will|||One way to filter the rows before the join is to use inline table functions,
that accept filtering values as parameters, I'm not sure, however, if this
will actually improve the performance. It's an option to consider and test.
ML
http://milambda.blogspot.com/|||"Will" <william_pegg@.yahoo.co.uk> wrote in message
news:1143723585.520821.232940@.e56g2000cwe.googlegroups.com...
>I know that the optimiser will try, and that I can override with join
> hints, but in this case the optimiser can't help because the logic of
> the join forces it to be evaluated in that order. What I was hoping for
> was some kind of new way of laying out the query such that I could
> "enter" my data structure from different directions depending on which
> one has parameters defined to filter it. I suspect it's not possible,
> but doesn't anyone else find this a problem? is there some better way
> to either lay out the tables or to have filter procs with multiple
> optional parameters?
>
The problem is that SQL Server will pick one query plan and reuse it for all
parameter sets. If this is in a stored procedure, you could try to mark it
WITH RECOMPILE, or use dynamic SQL.
David|||You have hit one of the problems with SQL Server. It was built for a
"departmetn level" database on a small machine. It keeps one exectuion
plan for each stored procedure and uses that plan when the proc is
invoked. They added parameter sniffing, but that can actually hurt.
Better products, meant for enterprise level applications and VLDB hold
multiple plans (Is DB2 keeping 16 or 32 now? I do not remember). Thus
they know that when the query looks like this (I will use the @. for
parameters even tho that is not Standard SQL):
SELECT x.y.x
FROM Foobar AS F1
WHERE F1.a = COALESCE (@.p1, a)
AND F1.b = COALESCE (@.p2, b)
AND WHERE F1.c = COALESCE (@.p3, c)
SQL needs to generate 8 execution plans, one for each combination of
NULLs. The best one will be picked at run time when we know (@.p1, @.p2,
@.p3) and twher the NULLs are that will turn a search condition into a
constant TRUE, FALSE or UNKNOWN.
This is your solution #1 -- fake a good optimizer and RDBMS engine in
T-SQL by hand. You can do it with IF-THEN-ELSE control flow in T-SQL.
It is a XXXXX to maintain, but easy to write with a cut and paste in a
text editor. It can run much faster than what you have now, however.

Friday, February 24, 2012

Changing join order based on parameters

No comments:

Post a Comment

database collation

Blog Archive

About Me