Hi guys, I've to deal with CSVs that look like following
CSV (with one header and 3 legit rows where each legit row has 3 columns) ---- Some info Date: 12/6/2012 Author: Some guy Total records: 100 header1, header2, header3 one, two, three one, "Python is great, so are other languages, isn't ?", three one, two, 'some languages, are realyl beautiful\r\n, I really cannot deny \n this \t\t\t fact. \t\t\t\tthis fact alone is amazing' ---- So inside this CSV, there will always be bad lines like the top 4 (they could end up in the beginning, in the middle and even in the last). So above sample, csv has 3 legit lines and a header. I want to read those three lines and here is a regex that I came up with (which clearly isn't working) #print line pattern = r"([^\t]+\t|,+)" matches = re.match(pattern, line) Do you've any better ideas guys? I will really appreciate all help. -- http://mail.python.org/mailman/listinfo/python-list