Thursday, 13 June 2013

Conditional Joins in SQL Server

Sometimes, when writing SELECTs, we come across situations in which we we need to write a join containing a condition of some sort, so that the join processed will vary for each row depending on the data.   For example, some rows join to table A, others to table B.  You might think to do this with either a CASE expression or with some OR boolean logic in your JOIN expression.  Often, you might encounter syntax errors, performance problems, or results not being returned the way you expect when trying this.   There's a much better way to approach the problem.
Instead of trying to alter a single INNER JOIN's relation from row to row, instead use multiple LEFT OUTER JOINS in your SELECT, one for each table or condition that you need to consider.  With each possibility JOIN'ed separately, you can now use a CASE expression in your SELECT clause to return data from the join that you need based on conditions or the join results. 
For example, if you need to join Employees to either Stores or Offices depending on where they work, many SQL beginners think that a CASE will somehow let them get the results they need:

select
  E.EmployeeName
from
  Employee E
inner join
 case when ... ???? .. then Offices else Stores end on ... ??? ....
Even if you did manage to come up with a creative new syntax for doing "CASE JOINS" in SQL, it would not compile or execute in SQL since it is invalid.  It's also the wrong approach.
Instead, you simply LEFT OUTER JOIN to both tables, and in your SELECT clause, return data from the one that matches:

select
  E.EmployeeName, coalesce(s.store,o.office) as Location
from
  Employees E
left outer join
  Stores S on ...
left outer join
  Offices O on ...
In the above, we use coalesce() to return whichever value did not came back as null; if for some reason we got matching rows in both tables, it is easy to write a CASE to handle that situation as well.  The key here is that we kept it simple and clear, and we get the correct results using valid SQL.
We did, however, change the fact that two left outer joins doesn't filter the employees table the way an INNER JOIN would.  If you want to return only employees that have a location (i.e., you want to inner join to either of these two tables) you would add that criteria to your WHERE clause:

select
  E.EmployeeName, coalesce(s.store,o.office) as Location
from
  Employees E
left outer join
  Stores S on ...
left outer join
  Offices O on ...
where
  O.Office is not null OR S.Store is not null
Similarly, sometimes you might need to join to a single table on different expressions depending on a condition.  For example, if you have a table of Billing Rates, set up like this:

EmpType  Dept      Rate ------   --------- ---- A        DEFAULT   $50 A        4         $55 B        1         $25 B        2         $45 B        3         $55 B        DEFAULT   $40 C        DEFAULT   $70 C        1         $60
In the above table, notice that billing rates can be set by a combination of employee type and department, but if a rate doesn't exist for that combo, we use the "default" rate for that Employee Type. (Note that this is not really a good database design, but I've seen it often enough unfortunately.  )
You might think to write something like this:

select
  E.EmployeeName, B.Rate as BillingRate
from
  Employee E
inner join
  BillRates B on ((B.EmpType = E.EmpType and B.Department = E.Department) OR
                  (B.EmpType = E.EmpType and B.Department = 'DEFAULT'))
In this case, that is valid SQL and it will compile and execute.  However, it is not only inefficient, but it doesn't work.  For employee rows that match a specific EmpType/Dept billing rate, it will still also return another row for the default rate since the second part of the OR expression still holds.  This means that you will get duplicated data with different rates in your result set.  You could try to aggregate those results to return 1 row per Employee, and figure out how to pull just the billing rate that you need with a MAX(CASE ...) expression or something similar, but now you are really over-complicating things.  There must be an easier way!
Well, there is, and it follows the same basic technique I described earlier.  Instead, let's do two separate LEFT OUTER JOINs to the Billing Rates table, giving each copy of the BillingRates table a nice alias to indicate what it is returning:

select
  E.EmployeeName, coalesce(DefaultRates.Rate, DeptRates.Rate) as BillingRate
from
  Employee E
left outer join
  BillRates DefaultRates on E.EmpType = DefaultRates.EmpType and
                            DefaultRates.Dept='DEFAULT'
left outer join
  BillRates DeptRates on E.EmpType = DeptRates.EmpType and
                         E.Dept = DeptRates.Dept
Now, we will only get back one row per Employee, and it is clear how we are joining to the Billing Rates table in two different ways.  We can add criteria to the WHERE clause to ensure that only employees with a valid, matching billing rate is returned, much in the same way as the previous example:

WHERE DefaultRates.Rate is not null OR DeptRates.Rate is not null
So, the next time you think you need an OR condition or a CASE expression in your JOIN clause, consider instead doing multiple LEFT OUTER JOINs.  You'll find that your code is easier to write, returns correct results, and is also more efficient.

No comments:

Post a Comment

C# LINQ Joins With SQL

There are  Different Types of SQL Joins  which are used to query data from more than one database tables. In this article, you will learn a...