I wasn’t too sure of what title to use for this blog post. I wanted to share some things that I see quite regularly at various sites in SQL or PL/SQL code where the developer has either not fully understood what they have written, or have done so under assumptions or risk. These are not necessarily always “mistakes” as such (and they may similarly be neither a misunderstanding nor a misconception), but they are things that a developer should consider, be aware of, and prepare for.
Hopefully I’ll carry on with this “series”, hence the Part 1 suffix.
In no particular order…
1) Explicit cursor to return a single row.
This is something I see quite regularly…
DECLARE CURSOR c_mycursor (p_id a_table.id%TYPE) IS SELECT a.something FROM a_table a WHERE a.id = p_id; l_something a_table.something%TYPE; BEGIN OPEN c_mycursor(12345); FETCH c_mycursor INTO l_something; CLOSE c_mycursor; -- Rest of process END;
DECLARE l_something a_table.something%TYPE; BEGIN SELECT a.something INTO l_something FROM a_table a WHERE a.id = 12345; -- Rest of process END;
The usual reason given for such code is “I know it’s going to only return one row because the cursor queries on the primary key and SELECT INTO has the overhead of an extra call to check for TOO_MANY_ROWS“.
And that is true of course. And the code will work just fine. However… let’s assume this isn’t the only piece of custom code you have to maintain. Let’s say you have a very modest 100 custom modules, all based on tables from your purchased ERP System.
Suddenly one day the ERP vendor releases a patch/set which has some “additional functionality” which your DBA fails to notice. What they have done is change the table a_table to be date-tracked, meaning the primary key is no longer id but (id, effective_start_date).
Your original code throws no errors (great!) and seems to be working ok (great again!) However – it’s now a ticking timebomb that will be a nightmare to debug! There might be 10 records for a given ID now – your code could chose any of those. It could even be a different one each time the piece of code runs – it all depends on which row gets spat out of the database first.
The second example catches the error – it throws an exception. The development team get notified because the code no longer works. A lot of people seem scared of exceptions being thrown (and I’ve seen the horrific “EXCEPTION WHEN OTHERS THEN NULL” in production code before now!), but that is exactly their purpose – to alert you when there’s a problem. I can assure you from experience that it’s a lot easier to fix a piece of code upfront than it is to dig around trying to retro-fix records (and having to apply the same fix anyway). There might be some stern words from managers when the system can’t process orders for a short time, but they are likely to be a lot sterner when you have to bring the system down for an extended maintenance period to clean up the mess.
So do use “SELECT INTO” implicit cursors for “single row” fetches. If you are really determined to use explicit cursors (why?) then you should always do a second fetch and check whether there are more rows, and if so then do something (throw an exception! 🙂 ). Why would you do all that extra code though when you get it all for free?
2) EXCEPTION WHEN OTHERS THEN RAISE
Whilst we’re talking exceptions, here’s a little bugbear of mine. The developer has created a procedure, has recognised that there may be some error conditions and trapped the exception (all ok so far) and then thinks something along the lines of “I’m unsure what other things might go wrong here, so I’ll just raise anything else“.
DECLARE l_n NUMBER; BEGIN SELECT id INTO l_n FROM my_table; -- Some other processing EXCEPTION WHEN NO_DATA_FOUND THEN --- Do something EXCEPTION WHEN OTHERS THEN RAISE; END;
Unless you are doing something else in the OTHERS handler, there is absolutely no need to do this. Whilst surplus code is only a minor problem, it has the side effect of masking the line number on which the error actually occurred, making debugging more difficult and time consuming for yourself. Help yourself (and others) out a little.
3) Functions on columns
SQL now, and here is something I’ve seen numerous times…
SELECT id FROM my_table WHERE TRUNC(record_date) = TRUNC(SYSDATE);
The developer wants to get all records created today… and the above query does just that. Applying a function on a column however has a number of underlying problems.
- Any index (unless it’s a function based index) on record_date will not be used.
- Column statistics for record_date become useless.
If you apply a function to a column in a predicate then (function based indexes non-withstanding), the optimizer doesn’t know about the spread of data within that column, the number of distinct values, etc. So your cardinalities can be way off, and hence you end up with a bad execution plan. Always aim to use functions on constants within predicates… this is a far better option:
SELECT id FROM my_table WHERE record_date >= TRUNC(SYSDATE) AND record_date < TRUNC(SYSDATE)+1;
The optimizer knows information about the data in that column and it has the option of using an index range scan if it wishes.
If you find the data model is such that most of your queries on this table end up looking like this then you can consider adding a function-based index on TRUNC(record_date). Remember though – don’t just go adding indexes for the sake of it!
4) Distinct Unique
Again something I see now and again are these two, used for a variety of different reasons. Whilst there are completely legitimate cases for such keywords, the vast majority of time I’ve seen them written is due to poor code. If you find yourself using DISTINCT or UNIQUE then stop and have a think – there is a good chance that it’s not actually needed and that your code is deficient in a predicate or join. Remember what a sort actually does – it can be quite resource intensive, especially with larger datasets. If you can eliminate duplicate rows elsewhere then aim to do so.
Similarly used are aggregate functions in scalar subqueries…
SELECT col, col2, (SELECT MAX(mycol3) FROM tab2 t2 WHERE t2.mycol1=t1.col3) col3 FROM tab1 t1;
Sometimes the developer does actually want the MAX() value, but other times it is just to ensure a single row is returned (i.e. the data model is such that all rows in the data set have the same value). What the aggregate function does is mean all rows in the dataset need to be considered, when you are in fact just interested in one. Use ROWNUM=1 if you absolutely have to, at least then when the database begins churning out rows the optimizer has the option to stop once it gets a row.
5) UNION instead of UNION ALL
This is usually a mistake made by rookie developers and so this point isn’t aimed at experienced coders… but having seen it numerous times (in production code too), I think it’s worth mentioning… and that is developers using UNION where UNION ALL would be perfectly acceptable. Sure UNION may work perfectly (and most people use it for this reason – because it’s always worked for them), however if UNION ALL can be used then always do so. Not doing so forces a sort on the database to remove duplicate rows.
That said, I’m always suspicious when I see a UNION/ALL anyway – they have their uses, but are often mis-used.