(Mis)using the WITH clause in Oracle

The WITH clause (or subquery factoring clause) is a very useful construct. It allows us to materialize the results of a SQL statement to use multiple times, without having to re-execute that statement. Here is an academic example of that in practice.

With demo As  (Select *     From Employees e   Where e.job_id = 'IT_PROG') Select * From demo d, demo d_mgr  Where d_mgr.employee_id = d.manager_id;  

It’s very unlikely that we’d produce the above requirement as-is, however for the purposes of this demo it’ll suffice.


 

------------------------------------------------------------------------------------------------------------------- | Id  | Operation                             | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                      |                           |     1 |   266 |     6   (0)| 00:00:01 | |   1 |  TEMP TABLE TRANSFORMATION            |                           |       |       |            |          | |   2 |   LOAD AS SELECT                      | SYS_TEMP_0FD9D6606_20F045 |       |       |            |          | |   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES                 |     5 |   345 |     2   (0)| 00:00:01 | |*  4 |     INDEX RANGE SCAN                  | EMP_JOB_IX                |     5 |       |     1   (0)| 00:00:01 | |*  5 |   HASH JOIN                           |                           |     1 |   266 |     4   (0)| 00:00:01 | |   6 |    VIEW                               |                           |     5 |   665 |     2   (0)| 00:00:01 | |   7 |     TABLE ACCESS FULL                 | SYS_TEMP_0FD9D6606_20F045 |     5 |   345 |     2   (0)| 00:00:01 | |   8 |    VIEW                               |                           |     5 |   665 |     2   (0)| 00:00:01 | |   9 |     TABLE ACCESS FULL                 | SYS_TEMP_0FD9D6606_20F045 |     5 |   345 |     2   (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------      4 - access("E"."JOB_ID"='IT_PROG')    5 - access("D_MGR"."EMPLOYEE_ID"="D"."MANAGER_ID") 

So we can see we first load the results of the WITH block “demo” using the index range scan to select those employees with a of IT_PROG. This is loaded into a temporary table, not to be confused with a Global Temporary Table (although the internal implementation might be similar) SYS_TEMP_0FD9D6606_20F045, and then that is used as the source for the two scan we require before they are hash-joined together in step 5.
All good and what we’d expect from our understanding of the WITH clause.

What I quite often see though is (often complex) statements defined as a WITH block and then only actually queried from once in the subsequent select. We don’t need a complex example though, just something simple like this:

With demo As  (Select e.first_name, e.last_name, e.manager_id, j.min_salary    From Employees e, Jobs j   Where j.job_id = e.job_id      And j.job_title = 'Accountant') Select d.first_name, d.last_name, d.min_salary, e.phone_number From demo d, Employees e  Where e.employee_id = d.manager_id; 

Many people will believe that in the above example, the demo block will be first materialized into a temporary table and then joined back to employees to get the manager. So let’s take a look:

-------------------------------------------------------------------------------------------- | Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     | -------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT              |            |     6 |   468 |     7   (0)| 00:00:01 | |*  1 |  HASH JOIN                    |            |     6 |   468 |     7   (0)| 00:00:01 | |   2 |   NESTED LOOPS                |            |     6 |   354 |     4   (0)| 00:00:01 | |   3 |    NESTED LOOPS               |            |     6 |   354 |     4   (0)| 00:00:01 | |*  4 |     TABLE ACCESS FULL         | JOBS       |     1 |    31 |     3   (0)| 00:00:01 | |*  5 |     INDEX RANGE SCAN          | EMP_JOB_IX |     6 |       |     0   (0)| 00:00:01 | |   6 |    TABLE ACCESS BY INDEX ROWID| EMPLOYEES  |     6 |   168 |     1   (0)| 00:00:01 | |   7 |   TABLE ACCESS FULL           | EMPLOYEES  |   107 |  2033 |     3   (0)| 00:00:01 | --------------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------      1 - access("E"."EMPLOYEE_ID"="E"."MANAGER_ID")    4 - filter("J"."JOB_TITLE"='Accountant')    5 - access("J"."JOB_ID"="E"."JOB_ID") 

So we can see that is kind of what happens, but note that there is no temporary table in sight. The key point here is that the optimizer can choose whether or not to materialize a query, and if you are only selecting from that statement once then it’s highly unlikely it will choose to do so. The query has been re-written to effectively be:

Select e.first_name, e.last_name, j.min_salary, m.phone_number   From Employees e, Jobs j, Employees m   Where m.employee_id = e.manager_id   And j.job_id = e.job_id   And j.job_title = 'Accountant';   

We can force the materialization with the materialize hint, however that isn’t recommended. It’s likely to be more inefficient anyway because of the additional step of materialization required.

With demo As  (Select /*+materialize*/ e.first_name, e.last_name, e.manager_id, j.min_salary    From Employees e, Jobs j   Where j.job_id = e.job_id      And j.job_title = 'Accountant') Select d.first_name, d.last_name, d.min_salary, e.phone_number From demo d, Employees e  Where e.employee_id = d.manager_id;    Plan hash value: 365502358   ------------------------------------------------------------------------------------------------------------ | Id  | Operation                      | Name                      | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------------------------------------ |   0 | SELECT STATEMENT               |                           |     5 |   355 |     9   (0)| 00:00:01 | |   1 |  TEMP TABLE TRANSFORMATION     |                           |       |       |            |          | |   2 |   LOAD AS SELECT               | SYS_TEMP_0FD9D660B_20F045 |       |       |            |          | |   3 |    NESTED LOOPS                |                           |     6 |   354 |     4   (0)| 00:00:01 | |   4 |     NESTED LOOPS               |                           |     6 |   354 |     4   (0)| 00:00:01 | |*  5 |      TABLE ACCESS FULL         | JOBS                      |     1 |    31 |     3   (0)| 00:00:01 | |*  6 |      INDEX RANGE SCAN          | EMP_JOB_IX                |     6 |       |     0   (0)| 00:00:01 | |   7 |     TABLE ACCESS BY INDEX ROWID| EMPLOYEES                 |     6 |   168 |     1   (0)| 00:00:01 | |*  8 |   HASH JOIN                    |                           |     5 |   355 |     5   (0)| 00:00:01 | |   9 |    VIEW                        |                           |     6 |   312 |     2   (0)| 00:00:01 | |  10 |     TABLE ACCESS FULL          | SYS_TEMP_0FD9D660B_20F045 |     6 |   138 |     2   (0)| 00:00:01 | |  11 |    TABLE ACCESS FULL           | EMPLOYEES                 |   107 |  2033 |     3   (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------      5 - filter("J"."JOB_TITLE"='Accountant')    6 - access("J"."JOB_ID"="E"."JOB_ID")    8 - access("E"."EMPLOYEE_ID"="D"."MANAGER_ID") 

Some developers use the WITH clause simply to separate out blocks of code to make it more readable (fine), however in some cases I think there are misconceptions as to the effect it is having on a query. A developer thinks they are helping out the efficiency of a plan by WITH’ing up a piece of code, when in fact it’s very likely that the query is just being re-written anyway and it’s having no effect whatsoever. Personally I only use the clause when I do actually intend to materialize something – otherwise I find it harder to read and understand what’s going on. I see it entering queries more and more often but actually doing nothing. If you do use WITH and your plan doesn’t contain a source of SYS_TEMP_… then you probably aren’t actually using WITH. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

seven + 14 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.