If you are talking about really large, really not quite properly formatted data sets, you want to look up the PADS project at
http://www.padsproj.org It's a product from ATT Labs (which is a Bell Labbs 'baby') and they apparently used it on their billing data. If you are looking at a few megabytes, any of our parser tools will do perhaps starting with 'parser-tools/'. -- Matthias On May 9, 2013, at 3:47 PM, David Vanderson <david.vander...@gmail.com> wrote: > I've got character-based invoices from old systems that look roughly like > (but much bigger): > > DATE DESC CREDIT DEBIT > 01/01/2013 SERVICES $1234.50 > 01/01/2013 PAYMENT $1000.00 > > BALANCE $234.50 > > > I don't know exactly how they're formatted, so I'm working from examples. My > initial plan was to hand-code a dumb parser with regular expressions, but I > suspect there's a better way. In particular, it'd be nice to have some > leeway as to exact positions of data, and hopefully some nice error reporting > and recovery abilities. > > Can anyone point me towards a parsing technique that would lend itself to > this problem? > > Thanks, > Dave > ____________________ > Racket Users list: > http://lists.racket-lang.org/users ____________________ Racket Users list: http://lists.racket-lang.org/users