Re: CSV reader ignore brackets

Cameron Simpson Tue, 24 Sep 2019 16:12:56 -0700

On 24Sep2019 15:55, Mihir Kothari <mihir.koth...@gmail.com> wrote:

I am using python 3.4. I have a CSV file as below:


ABC,PQR,(TEST1,TEST2)
FQW,RTE,MDE

Really? No quotes around the (TEST1,TEST2) column value? I would havesaid this is invalid data, but that does not help you.

Basically comma-separated rows, where some rows have a data in column which
is array like i.e. in brackets.
So I need to read the file and treat such columns as one i.e. do not
separate based on comma if it is inside the bracket.

In short I need to read a CSV file where separator inside the brackets
needs to be ignored.

Output:
Column:   1       2                3
Row1:    ABC  PQR  (TEST1,TEST2)
Row2:    FQW  RTE  MDE

Can you please help with the snippet?

I would be reaching for a regular expression. If you partition yourvalues into 2 types: those starting and ending in a bracket, and thosenot, you could write a regular expression for the former:


   \([^)]*\)

which matches a string like (.....) (with, importantly, no embeddedbrackets, only those at the beginning and end.


And you can write a regular expression like:

   [^,]*

for a value containing no commas i.e. all the other values.

Test the bracketed one first, because the second one always matchessomething.

Then you would not use the CSV module (which expects better formed datathan you have) and instead write a simple parser for a line of textwhich tries to match one of these two expressions repeatedly to consumethe line. Something like this (UNTESTED):


   bracketed_re = re.compile(r'\([^)]*\)')
   no_commas_re = re.compile(r'[^,]*')

   def split_line(line):
     line = line.rstrip()  # drop trailing whitespace/newline
     fields = []
     offset = 0
     while offset < len(line):
       m = bracketed_re.match(line, offset)
       if m:
         field = m.group()
       else:
         m = no_commas_re.match(line, offset)   # this always matches
         field = m.group()
       fields.append(field)
       offset += len(field)
       if line.startswith(',', offset):
         # another column
         offset += 1
       elif offset < len(line):
         raise ValueError(
           "incomplete parse at offset %d, line=%r" % (offset, line))
     return fields

Then read the lines of the file and split them into fields:

   row = []
   with open(datafilename) as f:
     for line in f:
       fields = split_line(line)
       rows.append(fields)

So basicly you're writing a little parser. If you have nested bracketsthings get harder.


Cheers,
Cameron Simpson <c...@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list

Re: CSV reader ignore brackets

Reply via email to