On 24Sep2019 15:55, Mihir Kothari <mihir.koth...@gmail.com> wrote:
I am using python 3.4. I have a CSV file as below:

ABC,PQR,(TEST1,TEST2)
FQW,RTE,MDE

Really? No quotes around the (TEST1,TEST2) column value? I would have said this is invalid data, but that does not help you.

Basically comma-separated rows, where some rows have a data in column which
is array like i.e. in brackets.
So I need to read the file and treat such columns as one i.e. do not
separate based on comma if it is inside the bracket.

In short I need to read a CSV file where separator inside the brackets
needs to be ignored.

Output:
Column:   1       2                3
Row1:    ABC  PQR  (TEST1,TEST2)
Row2:    FQW  RTE  MDE

Can you please help with the snippet?

I would be reaching for a regular expression. If you partition your values into 2 types: those starting and ending in a bracket, and those not, you could write a regular expression for the former:

   \([^)]*\)

which matches a string like (.....) (with, importantly, no embedded brackets, only those at the beginning and end.

And you can write a regular expression like:

   [^,]*

for a value containing no commas i.e. all the other values.

Test the bracketed one first, because the second one always matches something.

Then you would not use the CSV module (which expects better formed data than you have) and instead write a simple parser for a line of text which tries to match one of these two expressions repeatedly to consume the line. Something like this (UNTESTED):

   bracketed_re = re.compile(r'\([^)]*\)')
   no_commas_re = re.compile(r'[^,]*')

   def split_line(line):
     line = line.rstrip()  # drop trailing whitespace/newline
     fields = []
     offset = 0
     while offset < len(line):
       m = bracketed_re.match(line, offset)
       if m:
         field = m.group()
       else:
         m = no_commas_re.match(line, offset)   # this always matches
         field = m.group()
       fields.append(field)
       offset += len(field)
       if line.startswith(',', offset):
         # another column
         offset += 1
       elif offset < len(line):
         raise ValueError(
           "incomplete parse at offset %d, line=%r" % (offset, line))
     return fields

Then read the lines of the file and split them into fields:

   row = []
   with open(datafilename) as f:
     for line in f:
       fields = split_line(line)
       rows.append(fields)

So basicly you're writing a little parser. If you have nested brackets things get harder.

Cheers,
Cameron Simpson <c...@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to