Prune days… often misunderstood, often just left “as default”. Possibly not helped by the fact that entering the search term into Google yields the following as the top hit:
First it’s probably worth explaining what this parameter actually does. PRUNE_DAYS is a number of days which is subtracted from the LAST_EXTRACT_DATE parameter before it is passed by the DAC to Informatica. So if we have a LAST_EXTRACT_DATE of 02-JAN-2015 21:00:00 and a PRUNE_DAYS of 1, the value actually used will be 01-JAN-2015 21:00:00.
So do you need to use this on an E-Business System? The answer is… yes. But why?
Consider a scenario where you have an incremental ETL process scheduled to begin at 9pm each evening to extract records from table X. At 20:55 an automated process starts (or even a user manually inputting records) which inserts a batch of records into that table. The process takes 20 minutes to complete and (correctly) commits at the end of processing. It uses a value of SYSDATE for the last_update_date field, which is calculated at the point of insert.
In the mean time however, the ETL process has completed its load of data by 21:10, and has reset the LAST_EXTRACT_DATE parameter to be 21:00 on the current evening in preparation for the next incremental ETL process. However it has no visibility of the records inserted by the batch process as these were not committed at the point of select.
Without prune days some or all of the data inserted by your batch process will be lost as the last_update_date field is prior to the current LAST_EXTRACT_DATE value for those records. What the PRUNE_DAYS parameter does is reduce the window by a day (or whatever value you have set) to pick up such events.
There is of course a trade off with using a higher value for prune days in that the incremental ETL process will be reconsidering more data that it has previously already processed (and will therefore disregard).