Re: [CSV] Release again

Alex Herbert Wed, 03 Jul 2019 01:13:52 -0700

> On 15 Jun 2019, at 19:59, Alex Herbert <alex.d.herb...@gmail.com> wrote:
> 
> 
> 
>> On 15 Jun 2019, at 18:57, Gary Gregory <garydgreg...@gmail.com> wrote:
>> 
>> We've fixed some issues immediately after 1.7. How does everyone feel about
>> releasing again?
>> What else needs to be addressed in the short term?
>> Gary
> 
> - Bug (picked up by FindBugs):
> 
> CSVRecord is no longer serialisable as it stores the CSVParser and that is 
> not serialisable. This can be fixed by making the parser transient and 
> documenting the getParser() method as invalid after deserialisation. 
> Otherwise the cascade of fields that must be serialisable eventually includes 
> the original BufferedReader used to read the data. The parser is required for 
> the header map functionality and the getParser() method. So the class can 
> store the header map (easy), or overload the serialisation methods to record 
> and load the header map (more effort), or not support the parser and the 
> header map functionality after deserialisation, or something else. 
> 
> This should have been picked up by FindBugs during the release cycle but for 
> unknown reasons the FindBugs report on the site does not show this. So anyone 
> who deserialises this class will get an error as the serialVersionUID has not 
> been updated and the new parser field is not serialisable.
> 

I’ve raised this as a bug (CSV-248). It requires a decision on how to fix it.

> 
> I also raised a few issues about the handling of column headers in my recent 
> fix PR:
> 
> - Bug:
> 
> If the settings are not allowing empty columns headers you can currently use 
> a single empty header. This is because column headers are only checked for 
> empty when they are duplicates. So it is the second empty header (the first 
> duplicate) that raises an error. This test should pass but does not:
> 
> // This fails to throw an exception.
> // Only 1 column name is missing and this is not recognised as a problem.
> @Test(expected = IllegalArgumentException.class)
> public void testHeadersMissing1ColumnException() throws Exception {
>    final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz");
>    CSVFormat.DEFAULT.withHeader().parse(in).iterator();
> }

I’ve raised this as a bug (CSV-247). It is a rearrangement of the checking 
logic and I will raise a PR to fix it.

> 
> - Documentation:
> 
> If you are using the map of the header to column index it should be noted 
> that under certain circumstances where you allow duplicate column headers 
> that the map cannot represent the one-to-many relationship. The map is only 
> guaranteed to be valid if the record was created with a format that does not 
> allow duplicate or empty header names.

Any objections to updating the Javadoc to record the limitations of the header 
map?

> 
> - Bug / Documentation
> 
> The CSVParser only stores headers names in a list of header names if they are 
> not null. So the list can be shorter than the number of columns if you use a 
> format that allows empty headers.
> 
> If the purpose of the list is to provide the correct order of the column 
> headers then it should contain empty entries too. Otherwise I don’t see the 
> purpose of storing this list unless it is documented what it will contain. 
> This appears to be a list of non-null header names in the order they occur in 
> the header and thus represent keys that can be used in the header map. But 
> you can get that (without the ordering) using the map ketSet.

Is this a bug or should it be documented as a feature?

I consider it a bug. The list of header names does not always reflect the 
header, i.e. it should contain empty entries if the header was empty.

> 
> 
> 
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org
Re: [CSV] Release again

Reply via email to