Friday, August 01, 2008

Lesson #20080801

[WARNING: Geeky post]

Owing to what my colleague encountered (while I'm not around) and tried calling me up for help, I'd decided to put this down for learning. For myself or otherwise.

This lesson applies mostly to programming or similar high volume processing work. The problem he faced was that an error had occurred while he tried loading in an interface file (from one computer system to another) and the problem was located in one of the records. The file had over eleven thousand records in it, so finding the offending bug was no mean feat. Especially considering the error message (owing to the relatively old software) doesn't provide too much information on which row it had happened in.

So anyway, the solution is straightforward and should apply to most similar situations. I'd call it the "Half-life" solution. Simply chop the file to be loaded in half, bearing in the mind the header row to be replicated, if any.

Let's assume the 2 halves are named "A-1.csv" and "A-2.csv" respectively. And only "A-2.csv" was loaded successfully. This means the error is in "A-1.csv". Separate the records in "A-1.csv" into half again, this time assuming "B-1.csv" and "B-2.csv" are the file names. Attempt loading in both files in sequence. The file that failed to load contains the problematic item. Rinse and repeat this process until you can zoom in on the error.

The other way would have been to load in and run debug through the program. But in most situations, debugging might take longer than expected, and if the program can be left to load in a portion of the records first, while you work on other stuff, why not?

Of course it's not foolproof, but it's the best I can think of in such "dire" situations.

No comments: