Re: [CSV] Release again

Alex Herbert Sat, 15 Jun 2019 11:59:25 -0700

> On 15 Jun 2019, at 18:57, Gary Gregory <garydgreg...@gmail.com> wrote:
> 
> We've fixed some issues immediately after 1.7. How does everyone feel about
> releasing again?
> What else needs to be addressed in the short term?
> Gary

- Bug (picked up by FindBugs):

CSVRecord is no longer serialisable as it stores the CSVParser and that is not 
serialisable. This can be fixed by making the parser transient and documenting 
the getParser() method as invalid after deserialisation. Otherwise the cascade 
of fields that must be serialisable eventually includes the original 
BufferedReader used to read the data. The parser is required for the header map 
functionality and the getParser() method. So the class can store the header map 
(easy), or overload the serialisation methods to record and load the header map 
(more effort), or not support the parser and the header map functionality after 
deserialisation, or something else. 

This should have been picked up by FindBugs during the release cycle but for 
unknown reasons the FindBugs report on the site does not show this. So anyone 
who deserialises this class will get an error as the serialVersionUID has not 
been updated and the new parser field is not serialisable.


I also raised a few issues about the handling of column headers in my recent 
fix PR:

- Bug:

If the settings are not allowing empty columns headers you can currently use a 
single empty header. This is because column headers are only checked for empty 
when they are duplicates. So it is the second empty header (the first 
duplicate) that raises an error. This test should pass but does not:

// This fails to throw an exception.
// Only 1 column name is missing and this is not recognised as a problem.
@Test(expected = IllegalArgumentException.class)
public void testHeadersMissing1ColumnException() throws Exception {
    final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz");
    CSVFormat.DEFAULT.withHeader().parse(in).iterator();
}

- Documentation:

If you are using the map of the header to column index it should be noted that 
under certain circumstances where you allow duplicate column headers that the 
map cannot represent the one-to-many relationship. The map is only guaranteed 
to be valid if the record was created with a format that does not allow 
duplicate or empty header names.

- Bug / Documentation

The CSVParser only stores headers names in a list of header names if they are 
not null. So the list can be shorter than the number of columns if you use a 
format that allows empty headers.

If the purpose of the list is to provide the correct order of the column 
headers then it should contain empty entries too. Otherwise I don’t see the 
purpose of storing this list unless it is documented what it will contain. This 
appears to be a list of non-null header names in the order they occur in the 
header and thus represent keys that can be used in the header map. But you can 
get that (without the ordering) using the map ketSet.








---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org
Re: [CSV] Release again

Reply via email to