On 06/12/2012 07:21, Oltmans wrote:
Hi guys,

I've to deal with CSVs that look like following

CSV (with one header and 3 legit rows where each legit row has 3 columns)
----
Some info
Date: 12/6/2012
Author: Some guy
Total records: 100

header1, header2, header3
one, two, three
one, "Python is great, so are other languages, isn't ?", three
one, two, 'some languages, are realyl beautiful\r\n, I really cannot deny \n 
this \t\t\t fact. \t\t\t\tthis fact alone is amazing'
----

So inside this CSV, there will always be bad lines like the top 4 (they could 
end up in the beginning, in the middle and even in the last). So above sample, 
csv has 3 legit lines and a header. I want to read those three lines and here 
is a regex that I came up with (which clearly isn't working)

     #print line
     pattern = r"([^\t]+\t|,+)"
     matches = re.match(pattern, line)

Do you've any better ideas guys? I will really appreciate all help.


I'd simply use the csv module from the standard library to read your files, discarding anything that you regard as bad. I'd certainly not use a regex for this.

--
Cheers.

Mark Lawrence.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to