Star Transformations with Bitmap Indexes

One area of data warehousing that often confuses people (typically those who have come from an OLTP development background) is bitmap indexes… why do we use them from a design point of view? How does the optimizer use them? What benefits do they have?

Whilst this topic alone can (and does) cover entire books, I’m hoping to give a brief overview of the usefulness of bitmap indexes in the context of star transformations and show what actually happens when you write a query that can be transformed.

First, a very quick introduction to how a bitmap index differs from “normal” (B-Tree) indexes. In a B-Tree index, the index structure is arranged in a tree, and searching is done by walking the tree structure until the node is found which matches the predicate. This node points to the location of the row within the table and that row is then retrieved (assuming the data you require isn’t all contained within the index).

B-Tree Scan simple

Bitmap indexes differ in that for a single row, the index is as wide as the number of distinct values in that column, with a “bit” set whether the column equals that value. For example, consider the following 5 row table.

ID Name Location
1 Alice UK
2 Bob UK
3 Carol FR
4 Dave DK
5 Eve UK

A bitmap index on the Location column would typically look like this (we use the ID column simply to give a meaningful row pointer)

ID      
  UK FR DK
1 1 0 0
2 1 0 0
3 0 1 0
4 0 0 1
5 1 0 0

Therefore if we want to retrieve all the rows where location = UK then doing so if very quick. Where multiple columns have bitmap indexes defined upon them and a query is issued with predicates against more than one of these, the indexes can be combined as an AND operation; it’s just boolean algebra!

This structure makes bitmap indexes very fast to query, but also quite expensive (cost-wise) to maintain. As you will also notice, the bitmap is as wide as the number of distinct values in your table, which is why such indexes are traditionally used on columns with a low level of selectivity. The maintenance cost overhead means they tend not to be used on transactional systems. Furthermore, on an OLAP system, bitmap indexes are generally dropped before data is loaded and then re-created as this is more efficient than attempting to maintain the index for each value inserted.

So – that aside, how can we make use of them on a data warehouse? Introducing the Star Transformation. The star transformation is a process undertaken by the Cost Based Optimizer whereby it can rewrite a query (automatically) to give a more optimal execution plan. Take an extremely simple star schema.

Star Schema

Let’s populate a those structures with some very basic sample data.

create table sales ( customer_wid number not null, location_wid number not null, amount number not null );  create table customer (   customer_wid number not null,   customer_name varchar2(20) not null );  create table location (   location_wid number not null,   location_name varchar2(20) not null );  insert into customer(customer_wid, customer_name) select level, 'Customer - ' || level from dual connect by level <= 5; insert into location(location_wid, location_name) select level, 'Location - ' || level from dual connect by level <=10; insert into sales(customer_wid, location_wid, amount) select mod(level,5), mod(level*level,10), dbms_random.value(1,10000) from dual connect by level <= 100000;  create bitmap index b1 on sales(customer_wid); create bitmap index b2 on sales(location_wid); create index c1 on customer(customer_name); create index l1 on location(location_name);  exec dbms_stats.gather_schema_stats('DEMO');

Now let’s consider a very simple query.

select sum(s.amount) from sales s, customer c, location l where l.location_wid=s.location_wid and c.customer_wid=s.customer_wid and c.customer_name='Customer - 1' and l.location_name = 'Location - 1';

The data volumes we are dealing with in this example are quite light, however in an OLTP system without the bitmap indexes, we would expect the plan that is produced to be something along the lines of index access on the dimensions and then a hash or nested looks to the fact. However; with the addition of bitmap indexes and star transformation enabled, the optimizer can actually re-write the above query to this:

select sum(s.amount) from sales s, customer c, location l where l.location_wid=s.location_wid and c.customer_wid=s.customer_wid and s.location_wid in (select c.customer_wid from customer c where c.customer_name='Customer - 1') and s.customer_wid in (select l.location_wid from location l where l.location_name='Location - 1');

The fact table is materialized (although this can be disabled), the dimensions are accessed to return the appropriate ID’s, which are then used as access paths into the fact using the bitmap indexes. Given that the fact is likely to be the largest table of the query (possibly hundreds of millions of records), utilizing this fast method of pruning the data is very desirable. The transformation can be seen in the execution plan:

SQL_ID  44yasr7duv2n6, child number 0 ------------------------------------- SELECT--+star_transformation gather_plan_statistics SUM (s.amount)    FROM sales s, customer c, location l  WHERE     l.location_wid =  s.location_wid        AND c.customer_wid = s.customer_wid        AND  c.customer_name = 'Customer - 1'        AND l.location_name = 'Location - 1'   Plan hash value: 2104526426  ------------------------------------------------------------------------------------------------------------------------------------------- | Id  | Operation                           | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                    |                |      1 |        |      1 |00:00:00.04 |     126 |       |       |          | |   1 |  SORT AGGREGATE                     |                |      1 |      1 |      1 |00:00:00.04 |     126 |       |       |          | |*  2 |   HASH JOIN                         |                |      1 |   5556 |  10000 |00:00:00.04 |     126 |  1128K|  1128K|  716K (0)| |   3 |    MERGE JOIN CARTESIAN             |                |      1 |      1 |      1 |00:00:00.01 |       4 |       |       |          | |   4 |     TABLE ACCESS BY INDEX ROWID     | CUSTOMER       |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          | |*  5 |      INDEX RANGE SCAN               | C1             |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          | |   6 |     BUFFER SORT                     |                |      1 |      1 |      1 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)| |   7 |      TABLE ACCESS BY INDEX ROWID    | LOCATION       |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          | |*  8 |       INDEX RANGE SCAN              | L1             |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          | |   9 |    VIEW                             | VW_ST_34C376F1 |      1 |   5556 |  10000 |00:00:00.03 |     122 |       |       |          | |  10 |     NESTED LOOPS                    |                |      1 |   5556 |  10000 |00:00:00.03 |     122 |       |       |          | |  11 |      BITMAP CONVERSION TO ROWIDS    |                |      1 |   5555 |  10000 |00:00:00.01 |      10 |       |       |          | |  12 |       BITMAP AND                    |                |      1 |        |      1 |00:00:00.01 |      10 |       |       |          | |  13 |        BITMAP MERGE                 |                |      1 |        |      1 |00:00:00.01 |       5 |  1024K|   512K|15360  (0)| |  14 |         BITMAP KEY ITERATION        |                |      1 |        |      1 |00:00:00.01 |       5 |       |       |          | |  15 |          TABLE ACCESS BY INDEX ROWID| LOCATION       |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          | |* 16 |           INDEX RANGE SCAN          | L1             |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          | |* 17 |          BITMAP INDEX RANGE SCAN    | B2             |      1 |        |      1 |00:00:00.01 |       3 |       |       |          | |  18 |        BITMAP MERGE                 |                |      1 |        |      1 |00:00:00.01 |       5 |  1024K|   512K|15360  (0)| |  19 |         BITMAP KEY ITERATION        |                |      1 |        |      1 |00:00:00.01 |       5 |       |       |          | |  20 |          TABLE ACCESS BY INDEX ROWID| CUSTOMER       |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          | |* 21 |           INDEX RANGE SCAN          | C1             |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          | |* 22 |          BITMAP INDEX RANGE SCAN    | B1             |      1 |        |      1 |00:00:00.01 |       3 |       |       |          | |  23 |      TABLE ACCESS BY USER ROWID     | SALES          |  10000 |      1 |  10000 |00:00:00.01 |     112 |       |       |          | -------------------------------------------------------------------------------------------------------------------------------------------   Predicate Information (identified by operation id): ---------------------------------------------------      2 - access("L"."LOCATION_WID"="ITEM_2" AND "C"."CUSTOMER_WID"="ITEM_1")    5 - access("C"."CUSTOMER_NAME"='Customer - 1')    8 - access("L"."LOCATION_NAME"='Location - 1')   16 - access("L"."LOCATION_NAME"='Location - 1')   17 - access("S"."LOCATION_WID"="L"."LOCATION_WID")   21 - access("C"."CUSTOMER_NAME"='Customer - 1')   22 - access("S"."CUSTOMER_WID"="C"."CUSTOMER_WID")   Note -----   - star transformation used for this statement

I used the star_transformation hint to force the transformation just for the demo, and gather_plan_statistics so we can see the cardinality estimates. The graphical view of the plans from the Toad tool illustrate quite well the differences in the two plans:

No Star TransformationWith Star Transformation
With Star With Star

The above is a very simple example, however using this technique can give great performance benefits in a warehouse environment. It is also why the technical design process is particularly important – ensuring the correct index types are used (generally a bitmap index on foreign key fact table columns – dimensions attributes should have consideration made as to the selectivity of data before choosing a Bitmap or B-Tree).

Leave a Reply

Your email address will not be published. Required fields are marked *

7 + twelve =

This site uses Akismet to reduce spam. Learn how your comment data is processed.