I’m quite a fan of row generators. If I know I need a row for every day of the year for example, I generally avoid holding (and maintaining) a table of these dates, and instead will generate them. This is by no means a universal rule, so take each case on its merits.
select trunc(sysdate,'YYYY') + level -1 dt from dual connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY');
You can see a few methods (of many) of generating rows on one of my previous posts. The issue with generating rows in this manner is that the optimizer estimates cardinalities based on the number of rows in the table. Which for dual is… 1.
So when we look at the execution plan for such a query, we see this.
SQL_ID 16r2my83pj187, child number 0 ------------------------------------- select /*+gather_plan_statistics*/ trunc(sysdate,'YYYY') + level -1 dt from dual connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY') Plan hash value: 1236776825 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 365 |00:00:00.01 | | | | | 1 | CONNECT BY WITHOUT FILTERING| | 1 | | 365 |00:00:00.01 | 2048 | 2048 | 2048 (0)| | 2 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | | | | ----------------------------------------------------------------------------------------------------------------
We thought we would get just 1 row back – but we in fact got 365. Once you start joining out to other tables, the magnitude of that soon becomes more prominent. Let’s say I want to generate a row for each employee who is a manager, for each day of the year.
select /*+gather_plan_statistics*/ e.first_name, e.last_name, j.job_title, cal.dt from employees e, jobs j, (select trunc(sysdate,'YYYY') + level -1 dt from dual connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY')) cal where j.job_id = e.job_id and j.job_title like '%Manager%';
SQL_ID 0smma673xxzb2, child number 0 ------------------------------------- select /*+gather_plan_statistics*/ e.first_name, e.last_name, j.job_title, cal.dt from employees e, jobs j, (select trunc(sysdate,'YYYY') + level -1 dt from dual connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY')) cal where j.job_id = e.job_id and j.job_title like '%Manager%' Plan hash value: 706006494 ------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 5110 |00:00:00.01 | 31 | | | | | 1 | NESTED LOOPS | | 1 | 5 | 5110 |00:00:00.01 | 31 | | | | | 2 | NESTED LOOPS | | 1 | 6 | 5110 |00:00:00.01 | 22 | | | | | 3 | MERGE JOIN CARTESIAN | | 1 | 1 | 2190 |00:00:00.01 | 9 | | | | |* 4 | TABLE ACCESS FULL | JOBS | 1 | 1 | 6 |00:00:00.01 | 9 | | | | | 5 | BUFFER SORT | | 6 | 1 | 2190 |00:00:00.01 | 0 | 18432 | 18432 |16384 (0)| | 6 | VIEW | | 1 | 1 | 365 |00:00:00.01 | 0 | | | | | 7 | CONNECT BY WITHOUT FILTERING| | 1 | | 365 |00:00:00.01 | 0 | 2048 | 2048 | 2048 (0)| | 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | | |* 9 | INDEX RANGE SCAN | EMP_JOB_IX | 2190 | 6 | 5110 |00:00:00.01 | 13 | | | | | 10 | TABLE ACCESS BY INDEX ROWID | EMPLOYEES | 5110 | 6 | 5110 |00:00:00.01 | 9 | | | | ------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("J"."JOB_TITLE" LIKE '%Manager%') 9 - access("J"."JOB_ID"="E"."JOB_ID")
Due to the cardinality mis-estimate on our calendar generation (plan line 6) the optimzer has chosen an index access into employees which has resulted in the index being scanned 2190 times, and the employee table 5110 times based on the rowid’s fetch from the previous index scan.
Had the optimizer known the calendar would return 365 rows instead of 1, it is unlikely to have chosen an index/nested loops access approach. But how can we provide that information? Well, we know that we are generating a row per day of the year, and we know that in general there are 365 days in the year. So we can use the cardinality
hint in our calendar:
(select /*+cardinality(365)*/ trunc(sysdate,'YYYY') + level -1 dt from dual connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY')) cal
which then yields a somewhat different plan where we don’t :
SQL_ID 7vd1csauqf761, child number 0 ------------------------------------- select /*+gather_plan_statistics*/ e.first_name, e.last_name, j.job_title, cal.dt from employees e, jobs j, (select /*+cardinality(365)*/ trunc(sysdate,'YYYY') + level -1 dt from dual connect by level <= add_months(trunc(sysdate,'YYYY'),12) - trunc(sysdate,'YYYY')) cal where j.job_id = e.job_id and j.job_title like '%Manager%' Plan hash value: 3120087941 --------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 5110 |00:00:00.02 | 17 | 2 | | | | |* 1 | HASH JOIN | | 1 | 1953 | 5110 |00:00:00.02 | 17 | 2 | 1376K| 1376K| 1668K (0)| | 2 | VIEW | index$_join$_001 | 1 | 107 | 107 |00:00:00.01 | 8 | 2 | | | | |* 3 | HASH JOIN | | 1 | | 107 |00:00:00.01 | 8 | 2 | 1316K| 1316K| 1690K (0)| | 4 | INDEX FAST FULL SCAN | EMP_NAME_IX | 1 | 107 | 107 |00:00:00.01 | 4 | 1 | | | | | 5 | INDEX FAST FULL SCAN | EMP_JOB_IX | 1 | 107 | 107 |00:00:00.01 | 4 | 1 | | | | | 6 | MERGE JOIN CARTESIAN | | 1 | 347 | 2190 |00:00:00.01 | 9 | 0 | | | | |* 7 | TABLE ACCESS FULL | JOBS | 1 | 1 | 6 |00:00:00.01 | 9 | 0 | | | | | 8 | BUFFER SORT | | 6 | 365 | 2190 |00:00:00.01 | 0 | 0 | 18432 | 18432 |16384 (0)| | 9 | VIEW | | 1 | 365 | 365 |00:00:00.01 | 0 | 0 | | | | | 10 | CONNECT BY WITHOUT FILTERING| | 1 | | 365 |00:00:00.01 | 0 | 0 | 2048 | 2048 | 2048 (0)| | 11 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | 0 | | | | --------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("J"."JOB_ID"="E"."JOB_ID") 3 - access(ROWID=ROWID) 7 - filter("J"."JOB_TITLE" LIKE '%Manager%')
There are still some differences on the “MERGE JOIN CARTESIAN” step, however we are a lot closer than we were previously.
This of course is a small demonstration with very low rowcounts, however I witnessed a query at a client site this week which was generating rows for three years in a similar fashion to above, and the execution plan ended up doing a nested loops operation which was being executed over a billion times due to this exact issue… because the optimizer estimated it was going to get back 1 row instead of nearer 1100.
So whilst the cardinality hint is always discouraged from production code, I’d say when you are geneating a known number of rows in a row generator then it’s pretty much essential.