2014-03-14

A Little DB2 SQL Adventure

I don't usually blog about this sort of technical stuff, though I deal with it daily. That probably explains right there why I avoid it as if it is tainted by the plague. But this particular problem was interesting enough that I thought it might be worth conveying it.

The challenge for my associate came in the form of a bit of SQL that needed to pull apart a string (unknown until runtime) before it was inserted into a data store. A stored procedure wasn't possible, and the parsing requirement was unavoidable. It also had to be done with inline SQL.

It was known that this particular string could have several forms:
  1. A string containing alphanumeric elements separated by a dash;
  2. A string containing alphanumeric elements without a dash; and
  3. A zero-length string containing nothing (that could not be converted to a NULL on the incoming side for reasons not relevant here, but such that COALESCE was suddenly not a functional solution).
The SQL causing them grief had two variations in the INSERT, but to isolate the data elements I'll stay with simple SELECT statements from the system dummy table. And we'll call the input containing the string {Landmark} for the sake of abstraction.

The first SELECT would look thus:

SELECT SUBSTR('{Landmark}',1,LOCATE('-','{Landmark}')-1) AS DATA_ELEMENT FROM SYSIBM.SYSDUMMY1;
In essence the data element expected for return would see a dash in the Landmark string and return everything left of that dash. (For those wondering why not use LEFT in this context, it ultimately creates the same basic problem in as much as it will fail to return correctly if the Landmark value has no dash. So, it would have been a different approach with the same sticking point.)

The second SELECT would look thus:

SELECT SUBSTR('{Landmark}',1,LOCATE('-','{Landmark}')) FROM SYSIBM.SYSDUMMY1;
A simpler view of the same issue, which is easily illustrated by plugging in a few actual values that landmark might pass in.

The First Form

To focus now we can strip the SELECT and FROM SYSIBM.SYSDUMMY1 parts of the exemplar. It leaves us with the inline SQL reading:

SUBSTR('{Landmark}',1,LOCATE('-','{Landmark}')-1)

And the SQL works perfectly if the data coming in contains a dash and is not zero-length. In other words, '12345-12345-666' parses perfectly:

SUBSTR('12345-12345-666',1,LOCATE('-','12345-12345-666' )-1)

...returns a LOCATE value of 6, and when you subtract 1 it works out to return the substring starting at the first character and returning the first 5 characters.

The problem is that if the incoming string contains no dashes the LOCATE returns 0, and when you subtract the 1 from 0 the SUBSTR function rightfully chokes on the parameter. For example:

SUBSTR('1234512345666',1,LOCATE('-','1234512345666' )-1)

...is really:

SUBSTR('1234512345666',1,-1)

And SUBSTR doesn't quite know hat to do with that.

An even more complex problem arises when the string coming in is zero-length, because the resultant SUBSTR not only faults on -1 for is third parameter, it chokes on the first one having no length.

Using a stored procedure here would be glorious because it could check the length and do all that work in an orderly way, but my associate was stumped by how to get it to work inline and handle the three possibilities of proper dashes, no dashes and a zero-length condition.

What does work though is the inline use of a dual CASE that can handle the three possibilities (yes, it is not the only possibility, but it turned out to be the easiest to convey, even if it is ugly as sin). The statement then would have looked thus:

SELECT (
CASE WHEN LENGTH('{Landmark}')=0 THEN ''
ELSE SUBSTR('{Landmark}',1,LOCATE('-','{Landmark}')
- (CASE WHEN LOCATE('-','{Landmark}') > 0 THEN 1 ELSE 0 END))
END
) AS DATA_ELEMENT
FROM SYSIBM.SYSDUMMY1;

It parses by checking Landmark length and returning a zero-length safely if it is thus; and otherwise it runs the SUBSTR with an inner CASE block that tests the LOCATE return value and returns 1 if it is >0 and 0 otherwise. (This return is subtracted from the LOCATE value, so returning 1 subtracts 1, else we subtract 0.) With an example in place:

SELECT (
CASE WHEN LENGTH('ABC-DEF-GHI')=0 THEN ''
ELSE SUBSTR('ABC-DEF-GHI',1,LOCATE('-','ABC-DEF-GHI')
- (CASE WHEN LOCATE('-','ABC-DEF-GHI') > 0 THEN 1 ELSE 0 END))
END
) AS DATA_ELEMENT
FROM SYSIBM.SYSDUMMY1;

The benefit is that this protects you from the three possible data conditions.

The Second Form

For the stripped form of the second:

SUBSTR('{Landmark}',1,LOCATE('-','{Landmark}'))
...we can modify it with a simpler inline CASE conditional. We end up with:

SUBSTR((CASE WHEN LENGTH('{Landmark}')= 0 THEN ' ' ELSE '{Landmark}' END),1,LOCATE('-','{Landmark}'))

Or in the actual SELECT form:

SELECT SUBSTR((CASE WHEN LENGTH('{Landmark}')= 0 THEN ' ' ELSE '{Landmark}' END),1,LOCATE('-','{Landmark}')) FROM SYSIBM.SYSDUMMY1;

This works because SUBSTR accepts a 0 as its third parameter, and all we need to do is ensure that Landmark (first parameter) is never a zero-length string. The specific CASE part is:

CASE WHEN LENGTH('{Landmark}')= 0 THEN ' '
ELSE '{Landmark}'
END

In other words, return a single space if the LENGTH of the actual field data is zero.

Final Thoughts

No one would ever claim that SQL was pretty to look at, but it illustrates how powerful SQL actually can be, and how it is possible to make the SQL layer of a solution (even without aid of a stored procedure) impose some order on incoming data elements.







No comments:

Post a Comment

Note: Only a member of this blog may post a comment.