Re: Trying to fix Invalid CSV File

Larry Bates Mon, 04 Aug 2008 14:01:36 -0700

Ryan Rosario wrote:

On Aug 4, 8:30 am, Emile van Sebille <[EMAIL PROTECTED]> wrote:

John Machin wrote:

On Aug 4, 6:15 pm, Ryan Rosario <[EMAIL PROTECTED]> wrote:

On Aug 4, 1:01 am, John Machin <[EMAIL PROTECTED]> wrote:

On Aug 4, 5:49 pm, Ryan Rosario <[EMAIL PROTECTED]> wrote:

Thanks Emile! Works almost perfectly, but is there some way I can
adapt this to quote fields that contain a comma in them?

<snip>

Emile's snippet is pushing it through the csv reading process, to
demonstrate that his series of replaces works (on your *sole* example,
at least).

Exactly -- just print out the results of the passed argument:

 >>>
rec.replace(',"',",'''").replace('",',"''',").replace('"','""').replace("'''",'"')

'123,"Here is some, text ""and some quoted text"" where the quotes
should have been doubled",321'

Where it won't work is if any of the field embedded quotes are next to
commas.

I'd run it against the file.  Presumably, you've got a consistent field
count expectation per record.  Any resulting record not matching is
suspect and will identify records this approach won't address.

There's probably better ways, but sometimes it's fun to create
executable line noise.  :)

Emile


Thanks for your responses. I think John may be right that I am reading
it a second time. I will take a look at the CSV reader documentation
and see if that helps. Then once I run it I can see if I need to worry
about the comma-next-to-quote issue.

This is a perfect demonstration of why tab delimited files are so much betterthan comma and quote delimited. Virtually all software can handle tabledelimited as well as comma and quote delimited, but you would have none of theseproblems if you had used tab delimited. The chances of tabs being embedded inmost data is virtually nil.


-Larry
--
http://mail.python.org/mailman/listinfo/python-list

Re: Trying to fix Invalid CSV File

Reply via email to