On Thu, 20 Oct 2022 at 22:45, Gary D. Gregory <ggreg...@apache.org> wrote:
>
> Hi All (below)
>
> On 2022/10/20 18:08:31 Alex Herbert wrote:
> > On Thu, 20 Oct 2022 at 17:05, sebb <seb...@gmail.com> wrote:
> > >
> > > On Thu, 20 Oct 2022 at 15:43, Gary Gregory <garydgreg...@gmail.com> wrote:
> > > >
> > > > Would't it be simpler to deal with the serialization issue by bumping 
> > > > the
> > > > serialVersionID? We can just say that you only serialized and 
> > > > deserialize
> > > > for the same version.
> > >
> > > Are we willing to continue supporting serialisation going forward?
> > > It is not easy to ensure compatibility and avoid security issues.
> > >
> > > Are there any use-cases for allowing serialisation?
> >
> > Serialisation was broken for CSVRecord before and partially fixed in
> > 1.8 to support 1.0 to 1.6. Fields from 1.7 are not supported. The
> > release notes for 1.8 state that serialisation will not be supported
> > going forward. So this breakage of serialisation for CSVFormat has
> > precedent.
> >
> > I think Gary's suggestion to change the serial version ID commits to
> > the same path as not supporting serialisation from 2.0.
>
> I think this is the simplest solution. We can Javadoc the class and say that 
> we do not support serialization from one version to the next and that it will 
> be removed in 2.0. Check? If this OK, then I'll update git master and we can 
> move on to the duplicate headers enum.

+1

>
> >
> > Regarding the release, I am not concerned about serialisation as it
> > seems to be a lost cause. The issue is the behavioural compatibility
> > of switching from a boolean flag for duplicate headers to an enum with
> > 3 options. We should get this correct to avoid a future release having
> > to explain a behavioural compatibility change.
>
> Duplicate headers enum: We are leaning towards canceling RC1, updating the 
> PR, or creating a new PR. I'll wait to cancel RC1 until it is clear what is 
> being proposed, with a PR.

I think we need to check what happened before we had the duplicate
headers flag. This is what I have found:

CSV-239 added the flag (1.7.0) [1] in commit [2]. This was added to
allow the CSVRecord getHeaderNames to return all headers including
repeats. Before that duplicates threw an exception (see CSV-236 [3],
which predates CSV-239). Throwing an exception for duplicate headers
is mentioned in the changes log for release 1.0 [5]. Note the original
behavior was to throw for non-empty duplicates due to a fix
implemented in CSV-121 for release 1.0 [6]. So this is the original
behaviour.

CSV-264 added the enum (1.10.0) [4].

So if the duplicate headers flag has been in since 1.7 then we should
just map the behaviour to the new enum. The commit when the boolean
flag was added has this text in CSVParser:

"This will always allow a duplicate header if the header is empty"

So behaviour when the flag was added:

true - allow duplicates
false - only allow empty duplicates, throw for non-empy duplicates

I did not have time to track through whether this behaviour changed
after the initial implementation of the flag. I would think not as the
original behaviour is from 1.0. This would map to:

true -> ALLOW_ALL
false -> ALLOW_EMPTY
new -> DISALLOW

Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to
change the use of the flag to 'false -> DISALLOW' is not maintaining
behavioural compatibility (to 1.7, or back to 1.0).

The original review by Markus Span also found inconsistent settings of
the quotedNullString. But this was removed from PR #276 and I lost
track of whether that change was required.

Alex

[1] https://issues.apache.org/jira/browse/CSV-236
[2] 
https://github.com/apache/commons-csv/commit/030fb8e37c4024b24fac2b5404300449a6741699
[3] https://issues.apache.org/jira/browse/CSV-236
[4] https://issues.apache.org/jira/browse/CSV-264
[5] https://commons.apache.org/proper/commons-csv/changes-report.html
[6] https://issues.apache.org/jira/browse/CSV-121
[7] https://github.com/apache/commons-csv/pull/276

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to