Re: [Math] Review of "genetic algorithm" module

Avijit Basak Sun, 20 Feb 2022 21:56:30 -0800

Hi All

        Please find my comments below:


>The build fails because of CheckStyle errors:
>https://app.travis-ci.com/github/apache/commons-math/builds/246683712
--Fixed the issues

>>>> [...]
>> >> >
>> >> >I did not suggest to remove any Javadoc, only to rephrase it as:
[...]
>> >As hinted by my comment is the previous message, I've still to
>> >clarify my own expectations; but I vaguely sense some lost
>> >opportunity for simpler usage simpler and increased performance
>> >through the caller just needs to specify the number of "worker
>> >threads".
>> -- We should have both options. Users can execute the algorithm by
>> specifying only the number of worker threads with a single population as
>> well as optimize multiple populations in a parallel fashion.
>
>Another misunderstanding (probably); we must figure out where
>the parallelism will be implemented.
>IIUC the current state of the code, optimizing multiple populations
>in parallel would be the same as launching multiple JVMs; I'd want
>to explore low-level parallelism (i.e. at the "Chromosome" level).
-- I have implemented both muti-threading and multi-population parallelism.

>
>> For the multi-population option the common thread pool would be reused
for
>> all populations.
>> >
>> >Do we at least agree that
>> >1. Adding/retrieving a "Chromosome" to/from a "Population"
>> >must be thread-safe (and is not trivial)
>> >2. Fitness computation is where most time is usually spent
>> >(so that multi-threading must be achieved at that granularity)
>> >?
>> --The way I am thinking of designing a task is that it should accept the
>> current population and return an instance of ChromosomePair.
>> The chromosomes within this pair would be added to the population by the
>> caller thread. Population won't be updated by multiple threads.
>
>Then, it would not be a multi-threaded library.
>This kind of parallelism does not need support from the library,
>and can be implemented at the application level.
>
>> The code snippet below shows the body of the method which will be
executed
>> inside the task.
>> --CUT--
>>
>> //selection
>> ChromosomePair pair = getSelectionPolicy().select(current);
>>
>> // crossover
>> if (randGen.nextDouble() < getCrossoverRate()) {
>> // apply crossover policy to create two offspringoport
>> pair = getCrossoverPolicy().crossover(pair.getFirst(), pair.getSecond());
>> }
>>
>> // mutation
>> if (randGen.nextDouble() < getMutationRate()) {
>> // apply mutation policy to the chromosomes
>> pair = new ChromosomePair(
>> getMutationPolicy().mutate(pair.getFirst()),
>> getMutationPolicy().mutate(pair.getSecond()));
>> }
>>
>> return pair;
>>
>> --CUT--
>
>One of the issue with above code is, again, that "randGen"
>must be thread-safe (and it is not, usually).
>Also, it doesn't say anything about how to ensure that
>the fitness computation is thread-safe (and if you assume
>that it will be computed outside that "task", then the
>performance gain will be very low).
>
-- Current implementation is using a Thread local version of random number
generator.

>> >
>> >I'd surmise that "multiple instances of AbstractGeneticAlgorithm"
>> >is an application concern; unless I'm missing something, it's
>> >not what I've in mind when talking about multi-threading.
>> >Actually, I was wondering whether we could implement the
>> >analog of what is in the "commons-math-neuralnet" module,
>> >where
>> >* "Neuron" is the counterpart "Chromosome"
>> >* "Network" is the counterpart of "Population".
>> --"multiple instance of AbstractGeneticAlgorithm" is related to parallel
GA
>> with multiple populations not multi-threading.
>
>Yes, as I also mentioned above.
>But I'm interested in where multi-threading can be implemented to
>be used in both cases (single population and multiple populations).
>
>> Users can also implement parallel GA in a synchronous manner although
that
>> won't be a recommended way.
>> Multi-threading is only a way to improve performance using a user's multi
>> core CPU.
>> The threads in the thread pool would only be used to execute the task as
>> mentioned in the previous comment.
>
>That's where I've some doubt.
>But be free to implement benchmarks that demonstrate the
>expected performance improvement.
>
>> I think we have some misunderstanding over here. It is better to do an
>> implementation first and start the discussion.
>> It would be more productive.
>
>Agreed. ;-)

--I have checked-in the code for multi-threading along with parallel GA.
Kindly review and share your thoughts.


Thanks & Regards
--Avijit Basak

On Sun, 20 Feb 2022 at 05:25, Gilles Sadowski <[email protected]> wrote:

> Hello.
>
> Le ven. 18 févr. 2022 à 11:32, Avijit Basak <[email protected]> a
> écrit :
> >
> > Hi All
> >
> >         Please see my comments below.
> >
> >> [...]
> >
> > --I have modified the exception class name to
> > GeneticIllegalArgumentException.
>
> The build fails because of CheckStyle errors:
> https://app.travis-ci.com/github/apache/commons-math/builds/246683712
>
> >>> [...]
> > >> >
> > >> >I did not suggest to remove any Javadoc, only to rephrase it as:
> > >> >---CUT---
> > >> >/** "Message template". */
> > >> >---CUT---
> > >> >
> > >> -- Just for confirmation. Do you need the comment to be changed from
> > >> ---CUT---
> > >> /** Error message for "out of range" condition. */
> > >> ---CUT---
> > >> *to*
> > >> ---CUT---
> > >> /** "out of range" */
> > >> ---CUT---
> > >
> > >No, sorry, I just meant that e.g.
> > >---CUT---
> > >/** Message template. */
> > >private static final String OUT_OF_RANGE = "Out of range: min={0}
> max={1}";
> > >---CUT---
> > >is sufficient. Since the field is private, the comment will mostly ever
> > >be read while looking at the source code.
> >
> > --I think we have a misunderstanding here. The field is public, not
> > private.
>
> Sorry, my mistake.
>
> > Will a hard coded "Message Template" comment be sufficient?
>
> You are right, better be somewhat more specific.
> However, it would even better if we can manage to keep
> the internal classes private. [This can be explored further
> when the design has stabilized.]
>
> >
> > >>       JDK(*ProviderBuilder.RandomSourceInternal.JDK*),
> > >> ...
> > >> }
> > >> ---CUT---
> > >> I am not sure how to achieve this for our algorithm classes.
> > >
> > >I'm a bit lost here; I don't get the relationship of this with my
> > >remark above (constant vs adaptive)...  Maybe you could file
> > >JIRA report to clarify and further discussion?
> >
> > --Let me try to clarify my original understanding. I think the proposal
> is
> > to introduce an Enum and declare each type of algorithm classes as
> > instances of enum. So it might follow this template:
> > --CUT--
> > enum GeneticAlgorithms {
> >     SGA([args...]),//Simple genetic algorithm equivalent to
> > GeneticAlgorithm class
> > AGA([args...]) //Adaptive genetic algorithm.
> > }
> > --CUT--
> > Where [args...] are selection, crossover, mutation operators and
> > probalities.
>
> I didn't mean anything like that; AFAICT there may have
> been some mixing between two comments. ;-)
>
> >
> > As the AdaptiveGeneticAlgorithm is a generalization of GeneticAlgorithm
> it
> > would be possible to represent the later by former using constant
> crossover
> > and mutation rate generators.
> > But the problem lies in a different area.
> > In enumeration the required arguments ([args...]) need to be specified
> > during the declaration using constant values.
> > But for our implementation those arguments need to be provided by the
> > client programs. I don't know how to achieve that using enum.
> > Let me know if you have any different views regarding this.
>
> We can revisit this later.
>
> > >> [...]
> > >
> > >As hinted by my comment is the previous message, I've still to
> > >clarify my own expectations; but I vaguely sense some lost
> > >opportunity for simpler usage simpler and increased performance
> > >through the caller just needs to specify the number of "worker
> > >threads".
> > -- We should have both options. Users can execute the algorithm by
> > specifying only the number of worker threads with a single population as
> > well as optimize multiple populations in a parallel fashion.
>
> Another misunderstanding (probably); we must figure out where
> the parallelism will be implemented.
> IIUC the current state of the code, optimizing multiple populations
> in parallel would be the same as launching multiple JVMs; I'd want
> to explore low-level parallelism (i.e. at the "Chromosome" level).
>
> > For the multi-population option the common thread pool would be reused
> for
> > all populations.
> > >
> > >Do we at least agree that
> > >1. Adding/retrieving a "Chromosome" to/from a "Population"
> > >must be thread-safe (and is not trivial)
> > >2. Fitness computation is where most time is usually spent
> > >(so that multi-threading must be achieved at that granularity)
> > >?
> > --The way I am thinking of designing a task is that it should accept the
> > current population and return an instance of ChromosomePair.
> > The chromosomes within this pair would be added to the population by the
> > caller thread. Population won't be updated by multiple threads.
>
> Then, it would not be a multi-threaded library.
> This kind of parallelism does not need support from the library,
> and can be implemented at the application level.
>
> > The code snippet below shows the body of the method which will be
> executed
> > inside the task.
> > --CUT--
> >
> > //selection
> > ChromosomePair pair = getSelectionPolicy().select(current);
> >
> > // crossover
> > if (randGen.nextDouble() < getCrossoverRate()) {
> > // apply crossover policy to create two offspringoport
> > pair = getCrossoverPolicy().crossover(pair.getFirst(), pair.getSecond());
> > }
> >
> > // mutation
> > if (randGen.nextDouble() < getMutationRate()) {
> > // apply mutation policy to the chromosomes
> > pair = new ChromosomePair(
> > getMutationPolicy().mutate(pair.getFirst()),
> > getMutationPolicy().mutate(pair.getSecond()));
> > }
> >
> > return pair;
> >
> > --CUT--
>
> One of the issue with above code is, again, that "randGen"
> must be thread-safe (and it is not, usually).
> Also, it doesn't say anything about how to ensure that
> the fitness computation is thread-safe (and if you assume
> that it will be computed outside that "task", then the
> performance gain will be very low).
>
> > >
> > >I'd surmise that "multiple instances of AbstractGeneticAlgorithm"
> > >is an application concern; unless I'm missing something, it's
> > >not what I've in mind when talking about multi-threading.
> > >Actually, I was wondering whether we could implement the
> > >analog of what is in the "commons-math-neuralnet" module,
> > >where
> > >* "Neuron" is the counterpart "Chromosome"
> > >* "Network" is the counterpart of "Population".
> > --"multiple instance of AbstractGeneticAlgorithm" is related to parallel
> GA
> > with multiple populations not multi-threading.
>
> Yes, as I also mentioned above.
> But I'm interested in where multi-threading can be implemented to
> be used in both cases (single population and multiple populations).
>
> > Users can also implement parallel GA in a synchronous manner although
> that
> > won't be a recommended way.
> > Multi-threading is only a way to improve performance using a user's multi
> > core CPU.
> > The threads in the thread pool would only be used to execute the task as
> > mentioned in the previous comment.
>
> That's where I've some doubt.
> But be free to implement benchmarks that demonstrate the
> expected performance improvement.
>
> > I think we have some misunderstanding over here. It is better to do an
> > implementation first and start the discussion.
> > It would be more productive.
>
> Agreed. ;-)
>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [Math] Review of "genetic algorithm" module

Reply via email to