If you are talking about really large, really not quite properly formatted data 
sets, 
you want to look up the PADS project at 

 http://www.padsproj.org

It's a product from ATT Labs (which is a Bell Labbs 'baby') and they apparently 
used it on their billing data. 

If you are looking at a few megabytes, any of our parser tools will do perhaps 
starting with 'parser-tools/'. 

-- Matthias







On May 9, 2013, at 3:47 PM, David Vanderson <david.vander...@gmail.com> wrote:

> I've got character-based invoices from old systems that look roughly like 
> (but much bigger):
> 
> DATE        DESC               CREDIT   DEBIT
> 01/01/2013  SERVICES         $1234.50
> 01/01/2013  PAYMENT                     $1000.00
> 
> BALANCE                  $234.50
> 
> 
> I don't know exactly how they're formatted, so I'm working from examples.  My 
> initial plan was to hand-code a dumb parser with regular expressions, but I 
> suspect there's a better way.  In particular, it'd be nice to have some 
> leeway as to exact positions of data, and hopefully some nice error reporting 
> and recovery abilities.
> 
> Can anyone point me towards a parsing technique that would lend itself to 
> this problem?
> 
> Thanks,
> Dave
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users


____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to