Sometimes, when writing SELECTs, we come across situations in which we we need to write a join containing a condition
of some sort, so that the join processed will vary for each row
depending on the data. For example, some rows join to table A, others
to table B. You might think to do this with either a CASE expression or
with some OR boolean logic in
your JOIN expression. Often, you might encounter syntax errors,
performance problems, or results not being returned the way you expect
when trying this. There's a much better way to approach the problem.
Instead of trying to alter a single INNER JOIN's relation from row to row, instead use multiple LEFT OUTER JOINS in your SELECT, one for each table or condition that you need to consider. With each possibility JOIN'ed separately, you can now use a CASE expression in your SELECT clause to return data from the join that you need based on conditions or the join results.
For example, if you need to join Employees to either Stores or Offices depending on where they work, many SQL beginners think that a CASE will somehow let them get the results they need:
Instead, you simply LEFT OUTER JOIN to both tables, and in your SELECT clause, return data from the one that matches:
We did, however, change the fact that two left outer joins doesn't filter the employees table the way an INNER JOIN would. If you want to return only employees that have a location (i.e., you want to inner join to either of these two tables) you would add that criteria to your WHERE clause:
You might think to write something like this:
Well, there is, and it follows the same basic technique I described earlier. Instead, let's do two separate LEFT OUTER JOINs to the Billing Rates table, giving each copy of the BillingRates table a nice alias to indicate what it is returning:
Instead of trying to alter a single INNER JOIN's relation from row to row, instead use multiple LEFT OUTER JOINS in your SELECT, one for each table or condition that you need to consider. With each possibility JOIN'ed separately, you can now use a CASE expression in your SELECT clause to return data from the join that you need based on conditions or the join results.
For example, if you need to join Employees to either Stores or Offices depending on where they work, many SQL beginners think that a CASE will somehow let them get the results they need:
select
E.EmployeeName from
Employee E inner join
case when ... ???? .. then Offices else Stores end on ... ??? ....
Even if you did manage to come up with a creative new syntax for doing
"CASE JOINS" in SQL, it would not compile or execute in SQL since it is
invalid. It's also the wrong approach.E.EmployeeName from
Employee E inner join
case when ... ???? .. then Offices else Stores end on ... ??? ....
Instead, you simply LEFT OUTER JOIN to both tables, and in your SELECT clause, return data from the one that matches:
select
E.EmployeeName, coalesce(s.store,o.office) as Location from
Employees E left outer join
Stores S on ... left outer join
Offices O on ...
In the above, we use coalesce() to return whichever value did not came
back as null; if for some reason we got matching rows in both tables, it
is easy to write a CASE to handle that situation as well. The key here
is that we kept it simple and clear, and we get the correct results
using valid SQL.E.EmployeeName, coalesce(s.store,o.office) as Location from
Employees E left outer join
Stores S on ... left outer join
Offices O on ...
We did, however, change the fact that two left outer joins doesn't filter the employees table the way an INNER JOIN would. If you want to return only employees that have a location (i.e., you want to inner join to either of these two tables) you would add that criteria to your WHERE clause:
select
E.EmployeeName, coalesce(s.store,o.office) as Location from
Employees E left outer join
Stores S on ... left outer join
Offices O on ... where
O.Office is not null OR S.Store is not null
Similarly, sometimes you might need to join to a single table on different expressions depending on a condition. For example, if you have a table of Billing Rates, set up like this:E.EmployeeName, coalesce(s.store,o.office) as Location from
Employees E left outer join
Stores S on ... left outer join
Offices O on ... where
O.Office is not null OR S.Store is not null
EmpType Dept Rate
------ --------- ----
A DEFAULT $50
A 4 $55
B 1 $25
B 2 $45
B 3 $55
B DEFAULT $40
C DEFAULT $70
C 1 $60
In the above table, notice that billing rates can be set by a
combination of employee type and department, but if a rate doesn't exist
for that combo, we use the "default" rate for that Employee Type. (Note
that this is not really a good database design, but I've seen it often
enough unfortunately. )You might think to write something like this:
select
E.EmployeeName, B.Rate as BillingRate from
Employee E inner join
BillRates B on ((B.EmpType = E.EmpType and B.Department = E.Department) OR
(B.EmpType = E.EmpType and B.Department = 'DEFAULT'))
In this case, that is valid SQL and it will compile and execute.
However, it is not only inefficient, but it doesn't work. For employee
rows that match a specific EmpType/Dept billing rate, it will still also return another row for the default rate since the second part of the OR expression still holds. This means that you will get duplicated data
with different rates in your result set. You could try to aggregate
those results to return 1 row per Employee, and figure out how to pull
just the billing rate that you need with a MAX(CASE ...) expression or
something similar, but now you are really over-complicating things.
There must be an easier way!E.EmployeeName, B.Rate as BillingRate from
Employee E inner join
BillRates B on ((B.EmpType = E.EmpType and B.Department = E.Department) OR
(B.EmpType = E.EmpType and B.Department = 'DEFAULT'))
Well, there is, and it follows the same basic technique I described earlier. Instead, let's do two separate LEFT OUTER JOINs to the Billing Rates table, giving each copy of the BillingRates table a nice alias to indicate what it is returning:
select
E.EmployeeName, coalesce(DefaultRates.Rate, DeptRates.Rate) as BillingRate from
Employee E left outer join
BillRates DefaultRates on E.EmpType = DefaultRates.EmpType and
DefaultRates.Dept='DEFAULT' left outer join
BillRates DeptRates on E.EmpType = DeptRates.EmpType and
E.Dept = DeptRates.Dept
Now, we will only get back one row per Employee, and it is clear how we
are joining to the Billing Rates table in two different ways. We can
add criteria to the WHERE clause to ensure that only employees with a
valid, matching billing rate is returned, much in the same way as the
previous example:E.EmployeeName, coalesce(DefaultRates.Rate, DeptRates.Rate) as BillingRate from
Employee E left outer join
BillRates DefaultRates on E.EmpType = DefaultRates.EmpType and
DefaultRates.Dept='DEFAULT' left outer join
BillRates DeptRates on E.EmpType = DeptRates.EmpType and
E.Dept = DeptRates.Dept
WHERE DefaultRates.Rate is not null OR DeptRates.Rate is not null
So, the next time you think you need an OR condition or a CASE expression
in your JOIN clause, consider instead doing multiple LEFT OUTER JOINs.
You'll find that your code is easier to write, returns correct results,
and is also more efficient.
No comments:
Post a Comment