Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
> Would't it be simpler to deal with the serialization issue by bumping the > serialVersionID? simpler yes, but it's a different thing The PR makes the serialized forms for commons-csv versions 1.9.0 and 1.10.0 compatible. Given that serialization has been broken for several versions in commons-csv and given that "fixing" it now is a labor of love which does not add much value, I am in favor of throwing it out altogether now rather than in some future (major) version. Clearly inform about it in the release notes and be done with it. > Also note the PR will throw an NPE in the builder > when instead of using the validate() method. setDuplicateHeaderMode(null) is illegal and should be fail-fast IMO kind regards, Markus From: Gary Gregory Sent: Thursday, October 20, 2022 16:43 To: Commons Developers List Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 Would't it be simpler to deal with the serialization issue by bumping the serialVersionID? We can just say that you only serialized and deserialize for the same version. Also note the PR will throw an NPE in the builder when instead of using the validate() method. Gary On Wed, Oct 19, 2022, 18:27 Gary D. Gregory wrote: > I've commented on the PR. > TY. > Gary > > On 2022/10/19 16:51:57 Gary Gregory wrote: > > On Wed, Oct 19, 2022 at 10:01 AM Alex Herbert > wrote: > > > > > > On Wed, 19 Oct 2022 at 14:57, Gary D. Gregory > wrote: > > > > > > > > My +1 > > > > > > > > Gary > > > > > > Gary, > > > > > > PR #276 highlights a behavioural compatibility error in the 1.10.0 RC1. > > > > > > AllowDuplicates enum may be set to the incorrect value when setting > > > the allow duplicates boolean. Have you reviewed this? I believe it is > > > valid. > > > > I will re-read later tonight... > > > > Gary > > > > > > > > Alex > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
On Fri, 21 Oct 2022 at 11:57, wrote: > > > Would't it be simpler to deal with the serialization issue by bumping the > > serialVersionID? > simpler yes, but it's a different thing > The PR makes the serialized forms for commons-csv versions 1.9.0 and 1.10.0 > compatible. > > Given that serialization has been broken for several versions in commons-csv > and given that "fixing" it now is a labor of love which does not add much > value, > I am in favor of throwing it out altogether now rather than in some future > (major) version. Clearly inform about it in the release notes and be done > with it. +1 > > Also note the PR will throw an NPE in the builder > > when instead of using the validate() method. > > setDuplicateHeaderMode(null) is illegal and should be fail-fast IMO > > kind regards, > Markus > > From: Gary Gregory > Sent: Thursday, October 20, 2022 16:43 > To: Commons Developers List > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > Would't it be simpler to deal with the serialization issue by bumping the > serialVersionID? We can just say that you only serialized and deserialize > for the same version. Also note the PR will throw an NPE in the builder > when instead of using the validate() method. > > Gary > > On Wed, Oct 19, 2022, 18:27 Gary D. Gregory wrote: > > > I've commented on the PR. > > TY. > > Gary > > > > On 2022/10/19 16:51:57 Gary Gregory wrote: > > > On Wed, Oct 19, 2022 at 10:01 AM Alex Herbert > > wrote: > > > > > > > > On Wed, 19 Oct 2022 at 14:57, Gary D. Gregory > > wrote: > > > > > > > > > > My +1 > > > > > > > > > > Gary > > > > > > > > Gary, > > > > > > > > PR #276 highlights a behavioural compatibility error in the 1.10.0 RC1. > > > > > > > > AllowDuplicates enum may be set to the incorrect value when setting > > > > the allow duplicates boolean. Have you reviewed this? I believe it is > > > > valid. > > > > > > I will re-read later tonight... > > > > > > Gary > > > > > > > > > > > Alex > > > > > > > > - > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
I updated git master with Javadoc to signal that Serialization in CSVFormat is not supported from one version to the next. I also bumped the serial version ID from 1 to 2. All of this is noted in changes.xml. Gary On 2022/10/21 11:27:43 sebb wrote: > On Fri, 21 Oct 2022 at 11:57, wrote: > > > > > Would't it be simpler to deal with the serialization issue by bumping the > > > serialVersionID? > > simpler yes, but it's a different thing > > The PR makes the serialized forms for commons-csv versions 1.9.0 and 1.10.0 > > compatible. > > > > Given that serialization has been broken for several versions in > > commons-csv and given that "fixing" it now is a labor of love which does > > not add much value, > > I am in favor of throwing it out altogether now rather than in some future > > (major) version. Clearly inform about it in the release notes and be done > > with it. > > +1 > > > > Also note the PR will throw an NPE in the builder > > > when instead of using the validate() method. > > > > setDuplicateHeaderMode(null) is illegal and should be fail-fast IMO > > > > kind regards, > > Markus > > > > From: Gary Gregory > > Sent: Thursday, October 20, 2022 16:43 > > To: Commons Developers List > > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > > > Would't it be simpler to deal with the serialization issue by bumping the > > serialVersionID? We can just say that you only serialized and deserialize > > for the same version. Also note the PR will throw an NPE in the builder > > when instead of using the validate() method. > > > > Gary > > > > On Wed, Oct 19, 2022, 18:27 Gary D. Gregory wrote: > > > > > I've commented on the PR. > > > TY. > > > Gary > > > > > > On 2022/10/19 16:51:57 Gary Gregory wrote: > > > > On Wed, Oct 19, 2022 at 10:01 AM Alex Herbert > > > wrote: > > > > > > > > > > On Wed, 19 Oct 2022 at 14:57, Gary D. Gregory > > > wrote: > > > > > > > > > > > > My +1 > > > > > > > > > > > > Gary > > > > > > > > > > Gary, > > > > > > > > > > PR #276 highlights a behavioural compatibility error in the 1.10.0 > > > > > RC1. > > > > > > > > > > AllowDuplicates enum may be set to the incorrect value when setting > > > > > the allow duplicates boolean. Have you reviewed this? I believe it is > > > > > valid. > > > > > > > > I will re-read later tonight... > > > > > > > > Gary > > > > > > > > > > > > > > Alex > > > > > > > > > > - > > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
On 2022/10/20 22:56:05 Alex Herbert wrote: > On Thu, 20 Oct 2022 at 23:43, Alex Herbert wrote: > > > > I did not have time to track through whether this behaviour changed > > after the initial implementation of the flag. I would think not as the > > original behaviour is from 1.0. This would map to: > > > > true -> ALLOW_ALL > > false -> ALLOW_EMPTY > > new -> DISALLOW > > > > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to > > change the use of the flag to 'false -> DISALLOW' is not maintaining > > behavioural compatibility (to 1.7, or back to 1.0). > > PS. I just verified that PR 276 changes the DuplicateHeaderMode value > for allowDuplicates=false and does not change any tests. > > So the test suite is currently not enforcing behavioural > compatibility. This seems like a glaring hole in the tests and should > be addressed to prevent regressions. In git master, there are many tests for senders of withAllowDuplicateHeaderNames(boolean) and setAllowDuplicateHeaderNames(boolean). I filled in the coverage in testGetAllowDuplicateHeaderNames() for getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right or wrong. Let's reassess now git master now please. If the PR matters still, it should be rebased on git master. Gary > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
I've removed serialization stuff from the PR and rebased it to master https://github.com/apache/commons-csv/pull/276 kind regards, Markus From: Gary D. Gregory Sent: Friday, October 21, 2022 14:18 To: dev@commons.apache.org Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 On 2022/10/20 22:56:05 Alex Herbert wrote: > On Thu, 20 Oct 2022 at 23:43, Alex Herbert wrote: > > > > I did not have time to track through whether this behaviour changed > > after the initial implementation of the flag. I would think not as the > > original behaviour is from 1.0. This would map to: > > > > true -> ALLOW_ALL > > false -> ALLOW_EMPTY > > new -> DISALLOW > > > > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to > > change the use of the flag to 'false -> DISALLOW' is not maintaining > > behavioural compatibility (to 1.7, or back to 1.0). > > PS. I just verified that PR 276 changes the DuplicateHeaderMode value > for allowDuplicates=false and does not change any tests. > > So the test suite is currently not enforcing behavioural > compatibility. This seems like a glaring hole in the tests and should > be addressed to prevent regressions. In git master, there are many tests for senders of withAllowDuplicateHeaderNames(boolean) and setAllowDuplicateHeaderNames(boolean). I filled in the coverage in testGetAllowDuplicateHeaderNames() for getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right or wrong. Let's reassess now git master now please. If the PR matters still, it should be rebased on git master. Gary > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
Hi Markus, The PR has conflicts and does not test its changes. I'd like feedback from the community on whether or not git master as it is now is OK for the duplicate header behavior or if changing the default is needed and if doing so is just ping-pongonging from one incompatibility to another, based on previous versions. TY! Gary On Fri, Oct 21, 2022, 08:52 wrote: > I've removed serialization stuff from the PR and rebased it to master > https://github.com/apache/commons-csv/pull/276 > > kind regards, > Markus > > From: Gary D. Gregory > Sent: Friday, October 21, 2022 14:18 > To: dev@commons.apache.org > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > On 2022/10/20 22:56:05 Alex Herbert wrote: > > On Thu, 20 Oct 2022 at 23:43, Alex Herbert > wrote: > > > > > > I did not have time to track through whether this behaviour changed > > > after the initial implementation of the flag. I would think not as the > > > original behaviour is from 1.0. This would map to: > > > > > > true -> ALLOW_ALL > > > false -> ALLOW_EMPTY > > > new -> DISALLOW > > > > > > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to > > > change the use of the flag to 'false -> DISALLOW' is not maintaining > > > behavioural compatibility (to 1.7, or back to 1.0). > > > > PS. I just verified that PR 276 changes the DuplicateHeaderMode value > > for allowDuplicates=false and does not change any tests. > > > > So the test suite is currently not enforcing behavioural > > compatibility. This seems like a glaring hole in the tests and should > > be addressed to prevent regressions. > > In git master, there are many tests for senders of > withAllowDuplicateHeaderNames(boolean) and > setAllowDuplicateHeaderNames(boolean). > > I filled in the coverage in testGetAllowDuplicateHeaderNames() for > getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right or > wrong. > > Let's reassess now git master now please. If the PR matters still, it > should be rebased on git master. > > Gary > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [math] contribution proposal for multivariate functions optimization (2)
Hello Alex, Ichecked the architecture of MultivariateOptimizer family to compareto what I have done so far. I think that what I have done can berefactored to fit in the general framework as extension ofGradientMultivariateOptimizer even though, main differences are : - There is no need to provide explicit gradient function, as it can be computed with finite difference (secant approximation to partial derivative). - There is a need to compute the Hessian matrix. In the same way, it is computed with finite differences. - My approach uses of an extension of MultivariateFunction that have methods to provide gradient and hessian. But that is perhaps not a so good idea: it would be better to provide generic gradient and hessian classes that are operator applied to plain MultivariateFunction. I also have already written tests to cover at least all nominalcases and a few error cases. Now that I know where to put my code, Iwill integrate it in ACM clone (with all tests) and try to refactorit until it fit seamlessly in ACM classes and concepts. This may take a while. Yours truly François Laferrière Le jeudi 20 octobre 2022 à 20:18:46 UTC+2, Alex Herbert a écrit : Hi, Thanks for the interest in Commons Math. Currently all the optimisation code is in commons-math-legacy. I think the gradient based methods would fit in: org.apache.commons.math4.legacy.optim.nonlinear.scalar.gradient Can your implementations be adapted to work with the existing interfaces? The decision to move the entire 'optim' package to a new module allows a redesign of interfaces. The old and new can coexist but ideally we would want to support only one optimisation architecture. Have a look at the current classes and let us know what you think. Regards, Alex On Thu, 20 Oct 2022 at 15:36, François Laferrière wrote: > > Hello, > Sorry, previous message was a mess > Based on Apache common math, I have implemented some commonplace optimization > algorithms that could be integrated in ACM. This includes > > - Gradient Descent (en.wikipedia.org/wiki/Gradient_descent) > > - Newton Raphson >(https://en.wikipedia.org/wiki/Newton's_method_in_optimization) > > - BFGS >(https://en.wikipedia.org/wiki/Broyden–Fletcher–Goldfarb–Shanno_algorithm) > > They are implemented in such a way that other algorithms of the same family > (Newton) can be implemented easily from existing building blocks. > I clone http://gitbox.apache.org/repos/asf/commons-math.git but I am a bit > lost in the module structure. Should I put my code in one existing > commons-math4-* module (if so which one?) or should I create a new module > (for instance commons-math-optimization) ? > Many thanks in advance > François Laferrière > > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
Hi Gary, the PR did not have conflicts 30 minutes ago. So we must have encountered a race condition between you updating master and me rebasing and force-pushing my pr. I've done that again just now. There are no conflicts. At the same time I see parts of my PR have dribbled into master already, so if you rather implement the changes yourself, please go ahead. I am fine with closing my pr. regards, Markus From: Gary Gregory Sent: Friday, October 21, 2022 15:17 To: Commons Developers List Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 Hi Markus, The PR has conflicts and does not test its changes. I'd like feedback from the community on whether or not git master as it is now is OK for the duplicate header behavior or if changing the default is needed and if doing so is just ping-pongonging from one incompatibility to another, based on previous versions. TY! Gary On Fri, Oct 21, 2022, 08:52 wrote: > I've removed serialization stuff from the PR and rebased it to master > https://github.com/apache/commons-csv/pull/276 > > kind regards, > Markus > > From: Gary D. Gregory > Sent: Friday, October 21, 2022 14:18 > To: dev@commons.apache.org > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > On 2022/10/20 22:56:05 Alex Herbert wrote: > > On Thu, 20 Oct 2022 at 23:43, Alex Herbert > wrote: > > > > > > I did not have time to track through whether this behaviour changed > > > after the initial implementation of the flag. I would think not as the > > > original behaviour is from 1.0. This would map to: > > > > > > true -> ALLOW_ALL > > > false -> ALLOW_EMPTY > > > new -> DISALLOW > > > > > > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to > > > change the use of the flag to 'false -> DISALLOW' is not maintaining > > > behavioural compatibility (to 1.7, or back to 1.0). > > > > PS. I just verified that PR 276 changes the DuplicateHeaderMode value > > for allowDuplicates=false and does not change any tests. > > > > So the test suite is currently not enforcing behavioural > > compatibility. This seems like a glaring hole in the tests and should > > be addressed to prevent regressions. > > In git master, there are many tests for senders of > withAllowDuplicateHeaderNames(boolean) and > setAllowDuplicateHeaderNames(boolean). > > I filled in the coverage in testGetAllowDuplicateHeaderNames() for > getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right or > wrong. > > Let's reassess now git master now please. If the PR matters still, it > should be rebased on git master. > > Gary > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] contribution proposal for multivariate functions optimization (2)
Hello. I haven't looked at the code, but thanks for your interest in contributing to Commons Math. The more straightforward path is indeed to adapt your code to what is currently in the "...legacy" module. However, you can really consider the other venue, i.e. a design specific to this family of optimizers to be defined in a separate module. [In the current "legacy" code we tried to share as much code as possible, but this has led to various design issues.] Which is best is certainly not clear-cut. My hope is that starting from a clean slate may help us make progress on several aspects of the library that we wanted to change anyways[1] (as per the JIRA records). Regards, Gilles [1] For example, usage of external matrix algebra implementations. Le ven. 21 oct. 2022 à 15:30, François Laferrière a écrit : > > Hello Alex, > Ichecked the architecture of MultivariateOptimizer family to compareto what I > have done so far. I think that what I have done can berefactored to fit in > the general framework as extension ofGradientMultivariateOptimizer even > though, main differences are : >- > There is no need to provide explicit gradient function, as it can be computed > with finite difference (secant approximation to partial derivative). > >- > There is a need to compute the Hessian matrix. In the same way, it is > computed with finite differences. > >- > My approach uses of an extension of MultivariateFunction that have methods to > provide gradient and hessian. But that is perhaps not a so good idea: it > would be better to provide generic gradient and hessian classes that are > operator applied to plain MultivariateFunction. > > > I also have already written tests to cover at least all nominalcases and a > few error cases. Now that I know where to put my code, Iwill integrate it in > ACM clone (with all tests) and try to refactorit until it fit seamlessly in > ACM classes and concepts. > > This may take a while. > > Yours truly > François Laferrière > > Le jeudi 20 octobre 2022 à 20:18:46 UTC+2, Alex Herbert > a écrit : > > Hi, > > Thanks for the interest in Commons Math. > > Currently all the optimisation code is in commons-math-legacy. I think > the gradient based methods would fit in: > > org.apache.commons.math4.legacy.optim.nonlinear.scalar.gradient > > Can your implementations be adapted to work with the existing > interfaces? The decision to move the entire 'optim' package to a new > module allows a redesign of interfaces. The old and new can coexist > but ideally we would want to support only one optimisation > architecture. Have a look at the current classes and let us know what > you think. > > Regards, > > Alex > > > > On Thu, 20 Oct 2022 at 15:36, François Laferrière > wrote: > > > > Hello, > > Sorry, previous message was a mess > > Based on Apache common math, I have implemented some commonplace > > optimization algorithms that could be integrated in ACM. This includes > > > >- Gradient Descent (en.wikipedia.org/wiki/Gradient_descent) > > > >- Newton Raphson > > (https://en.wikipedia.org/wiki/Newton's_method_in_optimization) > > > >- BFGS > > (https://en.wikipedia.org/wiki/Broyden–Fletcher–Goldfarb–Shanno_algorithm) > > > > They are implemented in such a way that other algorithms of the same family > > (Newton) can be implemented easily from existing building blocks. > > I clone http://gitbox.apache.org/repos/asf/commons-math.git but I am a bit > > lost in the module structure. Should I put my code in one existing > > commons-math4-* module (if so which one?) or should I create a new module > > (for instance commons-math-optimization) ? > > Many thanks in advance > > François Laferrière > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
Ok, I don't see any conflict anymore but the PR title does not match the code: "CSVFormat.duplicateHeaderMode requires default DISALLOW" I am ok with the changes as they stand now but we have to resolve if the default should be changed. Gary On Fri, Oct 21, 2022, 09:44 wrote: > Hi Gary, > > the PR did not have conflicts 30 minutes ago. So we must have encountered > a race condition between you updating master and me rebasing and > force-pushing my pr. > I've done that again just now. There are no conflicts. > At the same time I see parts of my PR have dribbled into master already, > so if you rather implement the changes yourself, please go ahead. I am fine > with closing my pr. > > regards, > Markus > > > From: Gary Gregory > Sent: Friday, October 21, 2022 15:17 > To: Commons Developers List > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > Hi Markus, > > The PR has conflicts and does not test its changes. > > I'd like feedback from the community on whether or not git master as it is > now is OK for the duplicate header behavior or if changing the default is > needed and if doing so is just ping-pongonging from one incompatibility to > another, based on previous versions. > > TY! > > Gary > > On Fri, Oct 21, 2022, 08:52 wrote: > > > I've removed serialization stuff from the PR and rebased it to master > > https://github.com/apache/commons-csv/pull/276 > > > > kind regards, > > Markus > > > > From: Gary D. Gregory > > Sent: Friday, October 21, 2022 14:18 > > To: dev@commons.apache.org > > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > > > On 2022/10/20 22:56:05 Alex Herbert wrote: > > > On Thu, 20 Oct 2022 at 23:43, Alex Herbert > > wrote: > > > > > > > > I did not have time to track through whether this behaviour changed > > > > after the initial implementation of the flag. I would think not as > the > > > > original behaviour is from 1.0. This would map to: > > > > > > > > true -> ALLOW_ALL > > > > false -> ALLOW_EMPTY > > > > new -> DISALLOW > > > > > > > > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] > to > > > > change the use of the flag to 'false -> DISALLOW' is not maintaining > > > > behavioural compatibility (to 1.7, or back to 1.0). > > > > > > PS. I just verified that PR 276 changes the DuplicateHeaderMode value > > > for allowDuplicates=false and does not change any tests. > > > > > > So the test suite is currently not enforcing behavioural > > > compatibility. This seems like a glaring hole in the tests and should > > > be addressed to prevent regressions. > > > > In git master, there are many tests for senders of > > withAllowDuplicateHeaderNames(boolean) and > > setAllowDuplicateHeaderNames(boolean). > > > > I filled in the coverage in testGetAllowDuplicateHeaderNames() for > > getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right > or > > wrong. > > > > Let's reassess now git master now please. If the PR matters still, it > > should be rebased on git master. > > > > Gary > > > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >
Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
>From my investigation the behaviour from 1.0 to 1.6 was to throw on duplicates, but ignore empty duplicates (so allowing pad columns in a CSV). The allowDuplicates flag was added in 1.7 to allow duplicates. DEFAULT behaviour was true! So the behavioural compatibility was broken in release 1.7. Note there is also an allowMissingColumnNames flag. Logic was: - Store column headers as we go - For each new column header, check in the existing headers - If present: -- If not empty then throw if allowDuplicateHeaderNames=true -- If empty then throw if allowMissingColumnNames=false There is some logic failure here. It would allow the first missing column name for example; only the second empty missing column name would trigger an exception. The current master has updated this logic somewhat but I think the concept is still the same. There are tests to show that the default enum value is ALLOW_ALL. This is the same behaviour as 1.7, but not 1.0 - 1.6. Setting the boolean flag to true -> ALLOW_ALL. Setting the boolean flag to false -> ALLOW_EMPTY. So I think the default settings are correct, and the use of the old flag flips the behaviour correctly. But when investigating I found another issue. I looked at the CSVFormat tests and found two fixtures missing the @Test annotation: testDuplicateHeaderElementsTrue() testDuplicateHeaderElementsTrue_Deprecated() I added these to master. The tests for duplicate headers use the builder and setHeader("A", "A"). So there are no tests for empty headers. Trying this throws an exception in CSVFormat.validate for all of these: CSVFormat.DEFAULT.builder().setAllowDuplicateHeaderNames(false).setHeader("A", "", "B", "").build(); CSVFormat.DEFAULT.builder().setDuplicateHeaderMode(DuplicateHeaderMode.ALLOW_EMPTY).setHeader("A", "", "B", "").build(); CSVFormat.DEFAULT.builder().setAllowDuplicateHeaderNames(false) .setAllowMissingColumnNames(true).setHeader("A", "", "B", "").build(); So it seems the logic when building a CSVFormat with explicit headers is different from when parsing a format in CSVParser.createHeaders(). That method has a lot of logic to set up the header map. If a CSVFormat is created with an explicit header the behaviour is different with respect to duplicate headers, and allowMissingColumnNames which is not even considered in CSVFormat.validate(). This is the current behaviour in master for the parser: @Test public void testEmptyColumnsAndDuplicateHeadersNotAllowed() throws Exception { assertThrows(IllegalArgumentException.class, () -> CSVParser.parse("a,,c,\n1,2,3,4\nw,x,y,z", CSVFormat.DEFAULT.withHeader().withAllowDuplicateHeaderNames(false))); } @Test public void testEmptyColumnsAndDuplicateHeadersNotAllowedWithMissingColumnsAllowed() throws Exception { try (CSVParser parser = CSVParser.parse("a,,c,\n1,2,3,4\nw,x,y,z", CSVFormat.DEFAULT.withHeader().withAllowDuplicateHeaderNames(false).withAllowMissingColumnNames(true))) { // noop } } So the duplicate headers cannot be considered on its own, it requires use of the allow missing column names flag. Also, if a CSVFormat has an explicit header then the CSVParser will not construct a header Map to allow column names to look up column indices. I am not sure of the best path to fix this. I would document the CSVParser header map as not being created if the header is not read from the CSV input (current behaviour). Then maybe change the logic in CSVFormat.validate() to respect the ALLOW_ALL, ALLOW_EMPTY, ALLOW_NONE duplicates choice, possibly also considering allowEmptyColumnNames which defaults to false. Ideally we should get the same exceptions when building a CSVFormat with explicitly set headers, and parsing a CSV file and getting the header from the file. Currently I do not think that is true. Alex On Fri, 21 Oct 2022 at 15:54, Gary Gregory wrote: > > Ok, I don't see any conflict anymore but the PR title does not match the > code: "CSVFormat.duplicateHeaderMode requires default DISALLOW" > > I am ok with the changes as they stand now but we have to resolve if the > default should be changed. > > Gary > > On Fri, Oct 21, 2022, 09:44 wrote: > > > Hi Gary, > > > > the PR did not have conflicts 30 minutes ago. So we must have encountered > > a race condition between you updating master and me rebasing and > > force-pushing my pr. > > I've done that again just now. There are no conflicts. > > At the same time I see parts of my PR have dribbled into master already, > > so if you rather implement the changes yourself, please go ahead. I am fine > > with closing my pr. > > > > regards, > > Markus > > > > > > From: Gary Gregory > > Sent: Friday, October 21, 2022 15:17 > > To: Commons Developers List > > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > > > Hi Markus, > > > > The PR has conflicts and does not test its changes. > > > > I'd like feedback from the community on whether or not git master as it is > > now is OK for the duplic