Am Do., 23. Aug. 2018 um 20:17 Uhr schrieb sebb <seb...@gmail.com>: > On 23 August 2018 at 17:31, Benedikt Ritter <brit...@apache.org> wrote: > > Hi, > > > > Am Do., 23. Aug. 2018 um 12:11 Uhr schrieb sebb <seb...@gmail.com>: > > > >> On 23 August 2018 at 07:10, Benedikt Ritter <brit...@apache.org> wrote: > >> > Hey sebb, > >> > > >> > Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <seb...@gmail.com>: > >> > > >> >> On 23 August 2018 at 00:01, Bruno P. Kinoshita > >> >> <brunodepau...@yahoo.com.br.invalid> wrote: > >> >> > > >> >> >>Maybe I'm just not getting it, but it feels pretty messed up :-) > >> >> > > >> >> > > >> >> > Mutual feeling, and +1 for consistency. From what I understood, > users > >> >> should be able to parse these crazy CVS's, but if they tried to > >> re-create > >> >> them, with comments, then they wouldn't be able to avoid the > >> >> println/newline (so it wouldn't be parseable later with the same > >> reader). > >> >> > > >> >> > > >> >> > We probably need a ticket for it to aggregate the discussion and > >> maybe a > >> >> possible solution. > >> >> > >> >> I'm wondering whether we need to be as flexible when *creating* the > CSV > >> >> files. > >> >> > >> >> "Be liberal in what you accept, and conservative in what you send" > (Jon > >> >> Postel) > >> >> > >> >> In this case send == create, as it might be sent to other less > liberal > >> >> readers. > >> >> > >> >> I don't have a problem with the output being less flexible, so long > as > >> >> it is sufficiently flexible (which I think it likely is already). > >> >> > >> >> I don't think consistency is necessary - or even desirable - here. > >> >> > >> > > >> > okay, but wouldn't you expect that you can use a CSVFormat instance to > >> read > >> > a file that you created with it? This is currently not the case. > >> > >> Sorry, I misread the problem. > >> > >> Yes, it should be able to read what it writes. > >> > >> So the issue remains: should the reader be able to parse the unusual > >> format, or should the writer not be able to create it? > >> > >> I don't have a particular view on that, except that allowing LF and > >> CRLF only seems too restricting. > >> We should allow at least CR alone. I don't know whether there are any > >> other reasonable separators. > >> > > > > As Bruno pointed out, there seem to be formats that have record separator > > that are not new lines. So maybe CSVPrinter.printComment(String) should > not > > scan for CR and LF but for the record separator. > > > > Makes sense. > > >> > >> Perhaps we could just document the method to warn that using anything > >> other than CR, LF or CRLF will produce an output file that is not > >> parseable? > >> > > > > That sounds like a good approach. But how would you implement that? You > > probably don't want to introduce a dependency on a logging framework just > > for that, do you? > > I meant: add a warning to the documentation. >
+1 for that! CSVPrinter has almost no class level documentation, so I wanted to improve that anyway. Benedikt > > > Regards, > > Benedikt > > > > > >> > >> > Regards, > >> > Benedikt > >> > > >> > > >> >> > >> >> > Cheers > >> >> > > >> >> > ________________________________ > >> >> > From: Benedikt Ritter <brit...@apache.org> > >> >> > To: Commons Developers List <dev@commons.apache.org>; > >> >> brunodepau...@yahoo.com.br > >> >> > Sent: Thursday, 23 August 2018 7:10 AM > >> >> > Subject: Re: [CSV] Inconsistent record separator behavior > >> >> > > >> >> > > >> >> > > >> >> > Hi Bruno, > >> >> > > >> >> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita > >> >> > <brunodepau...@yahoo.com.br.invalid>: > >> >> > > >> >> >> Hi, > >> >> >> > >> >> >> > >> >> >> Will try to look at the code and give a better answer during the > >> >> weekend. > >> >> >> But risking a silly question, would it mean that users are not > able > >> to > >> >> >> parse a CSV unless each CSV row is separated by LF or CRLF? > >> >> > > >> >> > > >> >> > Yes. > >> >> > > >> >> > > >> >> >> I remember getting a CSV in a government website some time ago > that > >> was > >> >> >> formatted in a very strange way, and if I remember well it was a > >> small > >> >> >> file, but without LF or CRLF. I think it was using | to separate > the > >> >> rows, > >> >> >> and , for columns. > >> >> >> > >> >> > > >> >> > I didn't know that there are formats that don't use a new line as > line > >> >> > separator. > >> >> > > >> >> > > >> >> >> > >> >> >> > >> >> >> Quick search returned at least another person with similar issue > >> >> >> > >> >> > >> > https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator > >> >> >> > >> >> >> > >> >> >> Not sure if I understood the problem well, but in case it makes > >> sense... > >> >> >> my suggestion would be to perhaps confirm if we could change > >> >> >> CSVPrinter.printComment to accept other characters for line > ending? > >> >> >> > >> >> > > >> >> > The inconsistency I'm seeing is, that we an the one hand accept any > >> >> > character sequence as a record separator. Comments in a way a like > >> >> special > >> >> > records to me. But our implementation seems to put them on a new > >> "line" > >> >> > using the println() method. The println() method in turn uses the > >> record > >> >> > seperator to start a new record. So it's not necessarily a new > line. > >> >> > Nevertheless while processing a comment, we look out for CR and LF > and > >> >> then > >> >> > we call println() again. Maybe I'm just not getting it, but it > feels > >> >> pretty > >> >> > messed up :-) > >> >> > > >> >> > Regards, > >> >> > Benedikt > >> >> > > >> >> > > >> >> > > >> >> >> > >> >> >> > >> >> >> Thanks! > >> >> >> > >> >> >> Bruno > >> >> >> > >> >> >> > >> >> >> ________________________________ > >> >> >> From: Benedikt Ritter <brit...@apache.org> > >> >> >> To: Commons Developers List <dev@commons.apache.org> > >> >> >> Sent: Tuesday, 21 August 2018 7:13 PM > >> >> >> Subject: [CSV] Inconsistent record separator behavior > >> >> >> > >> >> >> > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> > >> >> >> we have this strange handling of record separator / line endings > in > >> CSV: > >> >> >> > >> >> >> > >> >> >> Users can use what ever character sequence they like as a record > >> >> separator. > >> >> >> > >> >> >> I could for example use the ! character to mark the end of a > record. > >> >> >> > >> >> >> Then we have CSVPrinter.printComment(String). This inserts > comments > >> >> into a > >> >> >> > >> >> >> CSV output. It detects CRLF and call println() on the CSVFormat, > >> which > >> >> in > >> >> >> > >> >> >> turn uses the record separator to indicate a new record... > >> >> >> > >> >> >> > >> >> >> So now I'm thinking: Does it make sense to use anything else but > LF > >> or > >> >> CRLF > >> >> >> > >> >> >> as record separator? Maybe we should deprecate > >> >> >> > >> >> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum > >> where > >> >> >> > >> >> >> users can choose between LF and CRLF. This way we can make the > >> behavior > >> >> >> > >> >> >> between parsing and printing consistent. > >> >> >> > >> >> >> > >> >> >> Thoughts? > >> >> >> > >> >> >> Benedikt > >> >> >> > >> >> >> > --------------------------------------------------------------------- > >> >> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> >> >> For additional commands, e-mail: dev-h...@commons.apache.org > >> >> > > >> >> >> > >> >> >> > >> >> > > >> >> > > --------------------------------------------------------------------- > >> >> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> >> > For additional commands, e-mail: dev-h...@commons.apache.org > >> >> > > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> >> For additional commands, e-mail: dev-h...@commons.apache.org > >> >> > >> >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >