How to error check a dragon

That is to say…

How do I go about checking every bit of data that comes into a template, knowing that:

  • I’m too lazy to write a rule for every piece
  • A given data file may not require the same field filled in for every record
  • Extensible is better

Now this should be running headless and that means I’ll probably be filtering out bad records and dumping them separately. My current thought is have a stack of rules to see if a given data thing meets a criteria. Is it blank? Does it have a single double-quote in it (yeah, that’s a thing)? Is a date field a date? Does a number fall in a range? Whatever. Once I check all, or as little if this as possible, turn around and change the field to be something like DATA_EMPTY_FIELD or DATA_BAD_FIELD. The trick would be then I generate all the letters, even the bad.

You’re probably saying you yourself dear reader, “Self, doesn’t that defeat the point of the entire process?” And that’s where they magic happens.

So I can filter the data with a complex rule saying for a given letter and a given field what it has to be. This would be a terrible amount of filtering, prone to human error, and entirely one off rules. After everything then, each rule would then need to congregate all the different filters ouputs back together to either give single good or bad files and… No, I’m not doing this, I started working a single basic rule for one letter version and only found problems.

I can though, and that’s where we get tricky, is parse PDFs. So I take the data, combine it with the text, make a single PDF. Split the PDF based on a constant, a constant that I put on all first pages of any given letter. We then filter into two branches based on just the words like DATA_BAD or DATA_EMPTY. These PDFs are then recombined and go to either a folder for review for the bad or to print for the good.

The best part is that most of this could turn into subprocesses that could be reused for any letter.

I have to wonder how much of a performance hit this will be and how much work I’m doing to solve a problem that shouldn’t even exist.