Paul Rubin wrote:
Terry Reedy <tjre...@udel.edu> writes:
I do not see any connection, really, between what you describe above
and your desire for static type-checking expressed elsewhere.  When I
was regularly doing analysis of empirical data files, I learned
(sometimes the hard way, as you describe above) to **ALWAYS** run
preliminary checks of all fields through the entire file

Right.  And if the file is large enough that even parsing all the
records to check the fields takes hours, well, that's where I'm at.

So what is the problem? Let it run overnight. If you want more speed, try numpy or Cython or C functions (swigged or ctyped) or...

To guarantee no crashes, make your top level something like

for line in open('humongous.dat', 'r'):
  try:
    <parse line>
    <check fields and update summary>
  except Exception as e:
    print(line, e)

(How trivial in Python!)

If your total system (including electricity supply and busy fingers of other people) is unstable, or you think you might want to interrupt it, keep track of lines or bytes read and write a report every gigabyte or so with enough info to restart from the last checkpoint -- just like reloading your last save in a video game when your avatar dies.

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to