SQL and PL/SQL Mistakes, Misunderstandings and Misconceptions – Part 2

If you missed part 1 then you can find it here. In this post I’ll be focussing on a somewhat common theme – indexes. Whilst this may be more database design than coding concepts, the principles of course underpin any SQL statement that you write and such concepts are arguably one of the most important aspects of writing any piece of code.

The following highlights a few common mistakes or misunderstandings around the usage of indexes.

 

1) Indexes not being used

This is quite a common situation I find myself in. A developer asks us to review a piece of code that is performing poorly. “It’s doing full table scans – we’re trying to hint it to use indexes but it’s still going slow“. The first point here is should it be using indexes? Indexes are not always faster. In fact they can be far slower! Do you understand what happens when a row is accessed via an index? If not, then my previous blog post discusses this briefly. Consider the following example where we join two tables together of similar size.

Now look at the plan the optimizer has chosen.

Is that bad? Should it not be using that index we have created? Of course not – why would it?

 

2) Unnecessary columns

Most people are aware that SELECT * FROM … is generally bad coding practice due to the unforeseen consequences that can occur should that table be modified. Equally important however (particularly when columns are indexed) is choosing only the columns you need in a query.

Typically when you select data from a table using an index, the index is access and then the corresponding blocks in the table are read (as per (1) above). However; if all the columns you select are part of the index then there is no need to go and read the table block – you can read just from the index. This can be seen using the following example.

CREATE TABLE demo (   id        number,   full_name varchar2(20),   addr      varchar2(20),   PRIMARY KEY (id) );  CREATE INDEX demo_idx ON demo(full_name);  INSERT INTO demo SELECT level, 'Person ' || level, 'Address ' || level FROM DUAL  CONNECT BY LEVEL < 1000;  COMMIT; 

Now look what happens when we execute the following two statements:

SQL> SELECT full_name FROM demo WHERE full_name='Person 1';  FULL_NAME -------------------- Person 1  SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());  PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- SQL_ID  fbz5s7yrh94mb, child number 0 ------------------------------------- SELECT full_name FROM demo WHERE full_name='Person 1'  Plan hash value: 3229557269  ----------------------------------------------------------------------------- | Id  | Operation        | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ----------------------------------------------------------------------------- |   0 | SELECT STATEMENT |          |       |       |     1 (100)|          | |*  1 |  INDEX RANGE SCAN| DEMO_IDX |     1 |    12 |     1   (0)| 00:00:01 | -----------------------------------------------------------------------------  Predicate Information (identified by operation id): ---------------------------------------------------     1 - access("FULL_NAME"='Person 1')  Note -----    - dynamic sampling used for this statement (level=2)   22 rows selected.  SQL> SELECT full_name, addr FROM demo WHERE full_name='Person 1';  FULL_NAME            ADDR -------------------- -------------------- Person 1             Address 1  SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());  PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- SQL_ID  06bgq6b3uj8sv, child number 0 ------------------------------------- SELECT full_name, addr FROM demo WHERE full_name='Person 1'  Plan hash value: 1165830477  ---------------------------------------------------------------------------------------- | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ---------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |          |       |       |     1 (100)|          | |   1 |  TABLE ACCESS BY INDEX ROWID| DEMO     |     1 |    24 |     1   (0)| 00:00:01 | |*  2 |   INDEX RANGE SCAN          | DEMO_IDX |     4 |       |     1   (0)| 00:00:01 | ----------------------------------------------------------------------------------------  Predicate Information (identified by operation id): ---------------------------------------------------     2 - access("FULL_NAME"='Person 1')  Note -----    - dynamic sampling used for this statement (level=2)   23 rows selected.  SQL>  

See in the second statement, the index scan is done but then there is an extra step to access the table using the ROWID retrieved from the index. For this particular example (accessing a single row) the difference is negligible, however if used in say a nested loops with range scans being executed thousands of times it becomes something to consider. Note that dynamic sampling as used on the above table as we didn’t gather statistics after populating.

 

3) Index Organized Tables

It is common practice to have a small lookup table where an index covers all or most of the columns, particularly with tables implementing a many-many relationship. Consider what happens when we create this table with an index using standard notation.

CREATE TABLE demo_link (   id1 NUMBER,   id2 NUMBER,   PRIMARY KEY (id1,id2) );

First the data for the table is held in the TABLE’s tablespace, then an index containing pretty much a copy of the data is held in the INDEX tablespace. Why should be store two (almost) identical copies of the same data? Well… we shouldn’t. Index Organized Tables (IOT’s) are our friend. An IOT doesn’t store the table data and then the index, it represents the table as the index (generally within the index tablespace), completely doing away with the table. This eliminates the step of retrieving the data block. 

select table_name, tablespace_name from user_tables where table_name='DEMO_LINK';  TABLE_NAME                     TABLESPACE_NAME ------------------------------ ------------------------------ DEMO_LINK                      SYSTEM  

Now consider the IOT version:

CREATE TABLE demo_link (   id1 NUMBER,   id2 NUMBER,   PRIMARY KEY (id1,id2) ) ORGANIZATION INDEX;  select table_name, tablespace_name from user_tables where table_name='DEMO_LINK';  TABLE_NAME                     TABLESPACE_NAME ------------------------------ ------------------------------ DEMO_LINK  select index_name, index_type from user_indexes where table_name='DEMO_LINK';  INDEX_NAME                     INDEX_TYPE ------------------------------ --------------------------- SYS_IOT_TOP_23976              IOT - TOP  

So in the second example we can see that we’re no longer using space in the table tablespace (ignore the fact it’s using SYSTEM – that was just setup that I didn’t change for this demo instance). No big deal for an empty table but maybe more so if we’re storing hundreds of millions of records. Plus think about the point we discussed in (2) – it’s quite likely that the table would never be hit anyway (apart from if a full scan is deemed more cost effective) because the index alone would be used.

 

4) Indexing NULL values

Up until 11g it was necessary to employ some subtle “tricks” in order to use indexes on null columns. However with 11g came a method to index NULL values.

CREATE TABLE null_demo (  val number );  create index null_demo_idx on null_demo(val);  insert into null_demo select case when level > 99000 then null else level end  from dual connect by level < 100000;  select * from null_demo where val is null;  SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());  PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- SQL_ID  76zz3r72awxyy, child number 0 ------------------------------------- select * from null_demo where val is null  Plan hash value: 3693720157  ------------------------------------------------------------------------------- | Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------- |   0 | SELECT STATEMENT  |           |       |       |    87 (100)|          | |*  1 |  TABLE ACCESS FULL| NULL_DEMO |  3195 | 41535 |    87   (4)| 00:00:02 | -------------------------------------------------------------------------------  Predicate Information (identified by operation id): ---------------------------------------------------     1 - filter("VAL" IS NULL)  Note -----    - dynamic sampling used for this statement (level=2)   22 rows selected.  

Now let’s drop that index and create it with NULL columns enabled:

create index null_demo_idx on null_demo(val,1);

select * from null_demo where val is null; SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR()); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- SQL_ID 76zz3r72awxyy, child number 0 ------------------------------------- select * from null_demo where val is null Plan hash value: 1760960562 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 15 (100)| | |* 1 | INDEX RANGE SCAN| NULL_DEMO_IDX | 3 | 39 | 15 (0)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("VAL" IS NULL) Note ----- - dynamic sampling used for this statement (level=2) 22 rows selected.

The addition of the “1” means the composite index will always have a value. Gathering stats on the table, we can now see the cardinality estimates are spot on! (You may need to flush the shared pool to replicate the below)

exec dbms_stats.gather_table_stats(ownname=>'DEMO',tabname=>'NULL_DEMO');  SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());  PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------- SQL_ID  76zz3r72awxyy, child number 0 ------------------------------------- select * from null_demo where val is null  Plan hash value: 1760960562  ---------------------------------------------------------------------------------- | Id  | Operation        | Name          | Rows  | Bytes | Cost (%CPU)| Time     | ---------------------------------------------------------------------------------- |   0 | SELECT STATEMENT |               |       |       |     4 (100)|          | |*  1 |  INDEX RANGE SCAN| NULL_DEMO_IDX |   999 |  4995 |     4   (0)| 00:00:01 | ----------------------------------------------------------------------------------  Predicate Information (identified by operation id): ---------------------------------------------------     1 - access("VAL" IS NULL)   18 rows selected. 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

three + eighteen =

This site uses Akismet to reduce spam. Learn how your comment data is processed.