Hi, I have a problem which I believe is seen before: Finding the correct pattern to use, in order to split a line correctly, using the split function in the re module.
I'm new to regexp, and it isn't always easy to comprehend for a newbie :) The lines I want to split are like this: (The following is one line, even if news client splits it up:) "abc ",,"-",,,,,"Doe, John D.",2004,"A long text, which may contain many characters. Dots, commas, and if I'm real unlucky: maybe even "-characters","-",32454,, These lines are in a csv file exported from excel. Comma is obviously the separator, but as you can see a comma might occur between " ", and if that is the case, it should not be (a separator). Then I pondered upon a way of using " chars in the splitting aswell, something like "?,"? . (optional " before and after comma), which of course also goes wrong. " may and may not occur around the splitting comma, but that would also match single commas inside quoted text, see example. Any pointer will be greatly appreciated. Maybe I'm attacking this problem the wrong way already from the start? (Not that I can see another way myself :) Regards -- Hans Almåsbakk -remove .invalid for correct email -- http://mail.python.org/mailman/listinfo/python-list