Prune Days on an E-Business Suite Source Environment

Prune Days on an E-Business Suite Source Environment

Prune days… often misunderstood, often just left “as default”. Possibly not helped by the fact that entering the search term into Google yields the following as the top hit:

The LAST_UPD column in Siebel transactional database tables is used for incremental change capture. This timestamp reflects the actual event time. It is therefore possible for a data row to be committed to the transactional database with a LAST_UPD date that is older than the date on which the last refresh was executed. This will cause the data row to be missed in the subsequent extract (if based purely on LAST_UPD date). However, the LAST_UPD date column still provides an opportunity to improve the change capture process by overlapping the extraction date window by the number of days set in this parameter. The records extracted in the overlapped window are filtered by comparing this information with information in the Image table. The Prune Days setting ensures that the rows that had values in LAST_UPD older than values in LAST_REFRESH_DATE are not missed. This is a parameter that can be set based on experience with processes, such as remote sync, that potentially can cause records to be missed. This parameter cannot be less than 1. For example: Assume the table W_PERSON_D was refreshed on January 15th by querying the table S_CONTACT. And, the Prune Days setting was set to 5. The next time S_CONTACT is queried to load W_PERSON_D, the records that have a LAST_UPD value since January 10 are compared with the ROW_ID of the Image table to cover for any missing records between January 15 and January 10 (the overlap period).

Source: http://docs.oracle.com/cd/E12102_01/books/AnyInstAdm/AnyInstAdmWinInstall43.html

First it’s probably worth explaining what this parameter actually does. PRUNE_DAYS is a number of days which is subtracted from the LAST_EXTRACT_DATE parameter before it is passed by the DAC to Informatica. So if we have a LAST_EXTRACT_DATE of 02-JAN-2015 21:00:00 and a PRUNE_DAYS of 1, the value actually used will be 01-JAN-2015 21:00:00.

So do you need to use this on an E-Business System? The answer is… yes. But why?
Consider a scenario where you have an incremental ETL process scheduled to begin at 9pm each evening to extract records from table X. At 20:55 an automated process starts (or even a user manually inputting records) which inserts a batch of records into that table. The process takes 20 minutes to complete and (correctly) commits at the end of processing. It uses a value of SYSDATE for the last_update_date field, which is calculated at the point of insert.
In the mean time however, the ETL process has completed its load of data by 21:10, and has reset the LAST_EXTRACT_DATE parameter to be 21:00 on the current evening in preparation for the next incremental ETL process. However it has no visibility of the records inserted by the batch process as these were not committed at the point of select.

Without prune days some or all of the data inserted by your batch process will be lost as the last_update_date field is prior to the current LAST_EXTRACT_DATE value for those records. What the PRUNE_DAYS parameter does is reduce the window by a day (or whatever value you have set) to pick up such events.

Prune Days

There is of course a trade off with using a higher value for prune days in that the incremental ETL process will be reconsidering more data that it has previously already processed (and will therefore disregard).

Leave a Reply

Your email address will not be published. Required fields are marked *

one + four =

This site uses Akismet to reduce spam. Learn how your comment data is processed.