[math] contribution proposal for multivariate functions optimization

2022-10-20 Thread François Laferrière
Hello,
Based on Apache common math, I have implemented some commonplace optimization 
algorithms that could be integrated in ACM. This includes

Gradient Descent Gradient descent - Wikipedia

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Gradient descent - Wikipedia

Gradient descent is generally attributed to Augustin-Louis Cauchy, who first 
suggested it in 1847.[1] Jacques Ha...
 |

 |

 |



Newton Raphson Newton's method in optimization - Wikipedia

| 
| 
| 
|  |  |

 |

 |
| 
|  | 
Newton's method in optimization - Wikipedia

The central problem of optimization is minimization of functions. Let us first 
consider the case of univariate f...
 |

 |

 |




BFGS  Broyden–Fletcher–Goldfarb–Shanno algorithm - Wikipedia

| 
| 
|  | 
Broyden–Fletcher–Goldfarb–Shanno algorithm - Wikipedia

Since the updates of the BFGS curvature matrix do not require matrix inversion, 
its computational complexity is ...
 |

 |

 |


They are implemented in such a way that other algorithms of the same family 
(Newton) can be implemented easily from existing building blocks.
I clone http://gitbox.apache.org/repos/asf/commons-math.git but I am a bit lost 
in the module structure. Should I put my code in one existing commons-math4-* 
module (if so which one?) or should I create a new module (for instance 
commons-math-optimization) ?
Many thanks in advance
François Laferrière
 


[math] contribution proposal for multivariate functions optimization (2)

2022-10-20 Thread François Laferrière
Hello,
Sorry, previous message was a mess
Based on Apache common math, I have implemented some commonplace optimization 
algorithms that could be integrated in ACM. This includes
   
   - Gradient Descent (en.wikipedia.org/wiki/Gradient_descent)   

   - Newton Raphson 
(https://en.wikipedia.org/wiki/Newton's_method_in_optimization)   

   - BFGS 
(https://en.wikipedia.org/wiki/Broyden–Fletcher–Goldfarb–Shanno_algorithm)   

They are implemented in such a way that other algorithms of the same family 
(Newton) can be implemented easily from existing building blocks.
I clone http://gitbox.apache.org/repos/asf/commons-math.git but I am a bit lost 
in the module structure. Should I put my code in one existing commons-math4-* 
module (if so which one?) or should I create a new module (for instance 
commons-math-optimization) ?
Many thanks in advance
François Laferrière
 
  

Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-20 Thread Gary Gregory
Would't it be simpler to deal with the serialization issue by bumping the
serialVersionID? We can just say that you only serialized and deserialize
for the same version. Also note the PR will throw an NPE in the builder
when instead of using the validate() method.

Gary

On Wed, Oct 19, 2022, 18:27 Gary D. Gregory  wrote:

> I've commented on the PR.
> TY.
> Gary
>
> On 2022/10/19 16:51:57 Gary Gregory wrote:
> > On Wed, Oct 19, 2022 at 10:01 AM Alex Herbert 
> wrote:
> > >
> > > On Wed, 19 Oct 2022 at 14:57, Gary D. Gregory 
> wrote:
> > > >
> > > > My +1
> > > >
> > > > Gary
> > >
> > > Gary,
> > >
> > > PR #276 highlights a behavioural compatibility error in the 1.10.0 RC1.
> > >
> > > AllowDuplicates enum may be set to the incorrect value when setting
> > > the allow duplicates boolean. Have you reviewed this? I believe it is
> > > valid.
> >
> > I will re-read later tonight...
> >
> > Gary
> >
> > >
> > > Alex
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-20 Thread sebb
On Thu, 20 Oct 2022 at 15:43, Gary Gregory  wrote:
>
> Would't it be simpler to deal with the serialization issue by bumping the
> serialVersionID? We can just say that you only serialized and deserialize
> for the same version.

Are we willing to continue supporting serialisation going forward?
It is not easy to ensure compatibility and avoid security issues.

Are there any use-cases for allowing serialisation?

> Also note the PR will throw an NPE in the builder
> when instead of using the validate() method.
>
> Gary
>
> On Wed, Oct 19, 2022, 18:27 Gary D. Gregory  wrote:
>
> > I've commented on the PR.
> > TY.
> > Gary
> >
> > On 2022/10/19 16:51:57 Gary Gregory wrote:
> > > On Wed, Oct 19, 2022 at 10:01 AM Alex Herbert 
> > wrote:
> > > >
> > > > On Wed, 19 Oct 2022 at 14:57, Gary D. Gregory 
> > wrote:
> > > > >
> > > > > My +1
> > > > >
> > > > > Gary
> > > >
> > > > Gary,
> > > >
> > > > PR #276 highlights a behavioural compatibility error in the 1.10.0 RC1.
> > > >
> > > > AllowDuplicates enum may be set to the incorrect value when setting
> > > > the allow duplicates boolean. Have you reviewed this? I believe it is
> > > > valid.
> > >
> > > I will re-read later tonight...
> > >
> > > Gary
> > >
> > > >
> > > > Alex
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-20 Thread Alex Herbert
On Thu, 20 Oct 2022 at 17:05, sebb  wrote:
>
> On Thu, 20 Oct 2022 at 15:43, Gary Gregory  wrote:
> >
> > Would't it be simpler to deal with the serialization issue by bumping the
> > serialVersionID? We can just say that you only serialized and deserialize
> > for the same version.
>
> Are we willing to continue supporting serialisation going forward?
> It is not easy to ensure compatibility and avoid security issues.
>
> Are there any use-cases for allowing serialisation?

Serialisation was broken for CSVRecord before and partially fixed in
1.8 to support 1.0 to 1.6. Fields from 1.7 are not supported. The
release notes for 1.8 state that serialisation will not be supported
going forward. So this breakage of serialisation for CSVFormat has
precedent.

I think Gary's suggestion to change the serial version ID commits to
the same path as not supporting serialisation from 2.0.

Regarding the release, I am not concerned about serialisation as it
seems to be a lost cause. The issue is the behavioural compatibility
of switching from a boolean flag for duplicate headers to an enum with
3 options. We should get this correct to avoid a future release having
to explain a behavioural compatibility change.

Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] contribution proposal for multivariate functions optimization (2)

2022-10-20 Thread Alex Herbert
Hi,

Thanks for the interest in Commons Math.

Currently all the optimisation code is in commons-math-legacy. I think
the gradient based methods would fit in:

org.apache.commons.math4.legacy.optim.nonlinear.scalar.gradient

Can your implementations be adapted to work with the existing
interfaces? The decision to move the entire 'optim' package to a new
module allows a redesign of interfaces. The old and new can coexist
but ideally we would want to support only one optimisation
architecture. Have a look at the current classes and let us know what
you think.

Regards,

Alex



On Thu, 20 Oct 2022 at 15:36, François Laferrière
 wrote:
>
> Hello,
> Sorry, previous message was a mess
> Based on Apache common math, I have implemented some commonplace optimization 
> algorithms that could be integrated in ACM. This includes
>
>- Gradient Descent (en.wikipedia.org/wiki/Gradient_descent)
>
>- Newton Raphson 
> (https://en.wikipedia.org/wiki/Newton's_method_in_optimization)
>
>- BFGS 
> (https://en.wikipedia.org/wiki/Broyden–Fletcher–Goldfarb–Shanno_algorithm)
>
> They are implemented in such a way that other algorithms of the same family 
> (Newton) can be implemented easily from existing building blocks.
> I clone http://gitbox.apache.org/repos/asf/commons-math.git but I am a bit 
> lost in the module structure. Should I put my code in one existing 
> commons-math4-* module (if so which one?) or should I create a new module 
> (for instance commons-math-optimization) ?
> Many thanks in advance
> François Laferrière
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-20 Thread Gary D. Gregory
Hi All (below)

On 2022/10/20 18:08:31 Alex Herbert wrote:
> On Thu, 20 Oct 2022 at 17:05, sebb  wrote:
> >
> > On Thu, 20 Oct 2022 at 15:43, Gary Gregory  wrote:
> > >
> > > Would't it be simpler to deal with the serialization issue by bumping the
> > > serialVersionID? We can just say that you only serialized and deserialize
> > > for the same version.
> >
> > Are we willing to continue supporting serialisation going forward?
> > It is not easy to ensure compatibility and avoid security issues.
> >
> > Are there any use-cases for allowing serialisation?
> 
> Serialisation was broken for CSVRecord before and partially fixed in
> 1.8 to support 1.0 to 1.6. Fields from 1.7 are not supported. The
> release notes for 1.8 state that serialisation will not be supported
> going forward. So this breakage of serialisation for CSVFormat has
> precedent.
> 
> I think Gary's suggestion to change the serial version ID commits to
> the same path as not supporting serialisation from 2.0.

I think this is the simplest solution. We can Javadoc the class and say that we 
do not support serialization from one version to the next and that it will be 
removed in 2.0. Check? If this OK, then I'll update git master and we can move 
on to the duplicate headers enum.

> 
> Regarding the release, I am not concerned about serialisation as it
> seems to be a lost cause. The issue is the behavioural compatibility
> of switching from a boolean flag for duplicate headers to an enum with
> 3 options. We should get this correct to avoid a future release having
> to explain a behavioural compatibility change.

Duplicate headers enum: We are leaning towards canceling RC1, updating the PR, 
or creating a new PR. I'll wait to cancel RC1 until it is clear what is being 
proposed, with a PR.

Gary

> 
> Alex
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-20 Thread Alex Herbert
On Thu, 20 Oct 2022 at 22:45, Gary D. Gregory  wrote:
>
> Hi All (below)
>
> On 2022/10/20 18:08:31 Alex Herbert wrote:
> > On Thu, 20 Oct 2022 at 17:05, sebb  wrote:
> > >
> > > On Thu, 20 Oct 2022 at 15:43, Gary Gregory  wrote:
> > > >
> > > > Would't it be simpler to deal with the serialization issue by bumping 
> > > > the
> > > > serialVersionID? We can just say that you only serialized and 
> > > > deserialize
> > > > for the same version.
> > >
> > > Are we willing to continue supporting serialisation going forward?
> > > It is not easy to ensure compatibility and avoid security issues.
> > >
> > > Are there any use-cases for allowing serialisation?
> >
> > Serialisation was broken for CSVRecord before and partially fixed in
> > 1.8 to support 1.0 to 1.6. Fields from 1.7 are not supported. The
> > release notes for 1.8 state that serialisation will not be supported
> > going forward. So this breakage of serialisation for CSVFormat has
> > precedent.
> >
> > I think Gary's suggestion to change the serial version ID commits to
> > the same path as not supporting serialisation from 2.0.
>
> I think this is the simplest solution. We can Javadoc the class and say that 
> we do not support serialization from one version to the next and that it will 
> be removed in 2.0. Check? If this OK, then I'll update git master and we can 
> move on to the duplicate headers enum.

+1

>
> >
> > Regarding the release, I am not concerned about serialisation as it
> > seems to be a lost cause. The issue is the behavioural compatibility
> > of switching from a boolean flag for duplicate headers to an enum with
> > 3 options. We should get this correct to avoid a future release having
> > to explain a behavioural compatibility change.
>
> Duplicate headers enum: We are leaning towards canceling RC1, updating the 
> PR, or creating a new PR. I'll wait to cancel RC1 until it is clear what is 
> being proposed, with a PR.

I think we need to check what happened before we had the duplicate
headers flag. This is what I have found:

CSV-239 added the flag (1.7.0) [1] in commit [2]. This was added to
allow the CSVRecord getHeaderNames to return all headers including
repeats. Before that duplicates threw an exception (see CSV-236 [3],
which predates CSV-239). Throwing an exception for duplicate headers
is mentioned in the changes log for release 1.0 [5]. Note the original
behavior was to throw for non-empty duplicates due to a fix
implemented in CSV-121 for release 1.0 [6]. So this is the original
behaviour.

CSV-264 added the enum (1.10.0) [4].

So if the duplicate headers flag has been in since 1.7 then we should
just map the behaviour to the new enum. The commit when the boolean
flag was added has this text in CSVParser:

"This will always allow a duplicate header if the header is empty"

So behaviour when the flag was added:

true - allow duplicates
false - only allow empty duplicates, throw for non-empy duplicates

I did not have time to track through whether this behaviour changed
after the initial implementation of the flag. I would think not as the
original behaviour is from 1.0. This would map to:

true -> ALLOW_ALL
false -> ALLOW_EMPTY
new -> DISALLOW

Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to
change the use of the flag to 'false -> DISALLOW' is not maintaining
behavioural compatibility (to 1.7, or back to 1.0).

The original review by Markus Span also found inconsistent settings of
the quotedNullString. But this was removed from PR #276 and I lost
track of whether that change was required.

Alex

[1] https://issues.apache.org/jira/browse/CSV-236
[2] 
https://github.com/apache/commons-csv/commit/030fb8e37c4024b24fac2b5404300449a6741699
[3] https://issues.apache.org/jira/browse/CSV-236
[4] https://issues.apache.org/jira/browse/CSV-264
[5] https://commons.apache.org/proper/commons-csv/changes-report.html
[6] https://issues.apache.org/jira/browse/CSV-121
[7] https://github.com/apache/commons-csv/pull/276

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-20 Thread Alex Herbert
On Thu, 20 Oct 2022 at 23:43, Alex Herbert  wrote:
>
> I did not have time to track through whether this behaviour changed
> after the initial implementation of the flag. I would think not as the
> original behaviour is from 1.0. This would map to:
>
> true -> ALLOW_ALL
> false -> ALLOW_EMPTY
> new -> DISALLOW
>
> Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to
> change the use of the flag to 'false -> DISALLOW' is not maintaining
> behavioural compatibility (to 1.7, or back to 1.0).

PS. I just verified that PR 276 changes the DuplicateHeaderMode value
for allowDuplicates=false and does not change any tests.

So the test suite is currently not enforcing behavioural
compatibility. This seems like a glaring hole in the tests and should
be addressed to prevent regressions.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-20 Thread David Dellsperger
I had just started to look into this and was going to call out the same
thing.  I'm concerned with those changes, especially the ones regarding the
allowDuplicates change, I made a note in my ticket for work to make sure we
have appropriate test cases on our end, with the RC, we didn't see any
issues with the compatibility between 1.9.0 and 1.10.0.RC.

David

On Thu, Oct 20, 2022 at 5:56 PM Alex Herbert 
wrote:

> On Thu, 20 Oct 2022 at 23:43, Alex Herbert 
> wrote:
> >
> > I did not have time to track through whether this behaviour changed
> > after the initial implementation of the flag. I would think not as the
> > original behaviour is from 1.0. This would map to:
> >
> > true -> ALLOW_ALL
> > false -> ALLOW_EMPTY
> > new -> DISALLOW
> >
> > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to
> > change the use of the flag to 'false -> DISALLOW' is not maintaining
> > behavioural compatibility (to 1.7, or back to 1.0).
>
> PS. I just verified that PR 276 changes the DuplicateHeaderMode value
> for allowDuplicates=false and does not change any tests.
>
> So the test suite is currently not enforcing behavioural
> compatibility. This seems like a glaring hole in the tests and should
> be addressed to prevent regressions.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>