Re: [Math] Review of "genetic algorithm" module

Gilles Sadowski Sat, 19 Feb 2022 15:55:01 -0800

Hello.

Le ven. 18 févr. 2022 à 11:32, Avijit Basak <[email protected]> a écrit :
>
> Hi All
>
>         Please see my comments below.
>
>> [...]
>
> --I have modified the exception class name to
> GeneticIllegalArgumentException.


The build fails because of CheckStyle errors:
https://app.travis-ci.com/github/apache/commons-math/builds/246683712

>>> [...]
> >> >
> >> >I did not suggest to remove any Javadoc, only to rephrase it as:
> >> >---CUT---
> >> >/** "Message template". */
> >> >---CUT---
> >> >
> >> -- Just for confirmation. Do you need the comment to be changed from
> >> ---CUT---
> >> /** Error message for "out of range" condition. */
> >> ---CUT---
> >> *to*
> >> ---CUT---
> >> /** "out of range" */
> >> ---CUT---
> >
> >No, sorry, I just meant that e.g.
> >---CUT---
> >/** Message template. */
> >private static final String OUT_OF_RANGE = "Out of range: min={0} max={1}";
> >---CUT---
> >is sufficient. Since the field is private, the comment will mostly ever
> >be read while looking at the source code.
>
> --I think we have a misunderstanding here. The field is public, not
> private.

Sorry, my mistake.

> Will a hard coded "Message Template" comment be sufficient?

You are right, better be somewhat more specific.
However, it would even better if we can manage to keep
the internal classes private. [This can be explored further
when the design has stabilized.]

>
> >>       JDK(*ProviderBuilder.RandomSourceInternal.JDK*),
> >> ...
> >> }
> >> ---CUT---
> >> I am not sure how to achieve this for our algorithm classes.
> >
> >I'm a bit lost here; I don't get the relationship of this with my
> >remark above (constant vs adaptive)...  Maybe you could file
> >JIRA report to clarify and further discussion?
>
> --Let me try to clarify my original understanding. I think the proposal is
> to introduce an Enum and declare each type of algorithm classes as
> instances of enum. So it might follow this template:
> --CUT--
> enum GeneticAlgorithms {
>     SGA([args...]),//Simple genetic algorithm equivalent to
> GeneticAlgorithm class
> AGA([args...]) //Adaptive genetic algorithm.
> }
> --CUT--
> Where [args...] are selection, crossover, mutation operators and
> probalities.

I didn't mean anything like that; AFAICT there may have
been some mixing between two comments. ;-)

>
> As the AdaptiveGeneticAlgorithm is a generalization of GeneticAlgorithm it
> would be possible to represent the later by former using constant crossover
> and mutation rate generators.
> But the problem lies in a different area.
> In enumeration the required arguments ([args...]) need to be specified
> during the declaration using constant values.
> But for our implementation those arguments need to be provided by the
> client programs. I don't know how to achieve that using enum.
> Let me know if you have any different views regarding this.

We can revisit this later.

> >> [...]
> >
> >As hinted by my comment is the previous message, I've still to
> >clarify my own expectations; but I vaguely sense some lost
> >opportunity for simpler usage simpler and increased performance
> >through the caller just needs to specify the number of "worker
> >threads".
> -- We should have both options. Users can execute the algorithm by
> specifying only the number of worker threads with a single population as
> well as optimize multiple populations in a parallel fashion.

Another misunderstanding (probably); we must figure out where
the parallelism will be implemented.
IIUC the current state of the code, optimizing multiple populations
in parallel would be the same as launching multiple JVMs; I'd want
to explore low-level parallelism (i.e. at the "Chromosome" level).

> For the multi-population option the common thread pool would be reused for
> all populations.
> >
> >Do we at least agree that
> >1. Adding/retrieving a "Chromosome" to/from a "Population"
> >must be thread-safe (and is not trivial)
> >2. Fitness computation is where most time is usually spent
> >(so that multi-threading must be achieved at that granularity)
> >?
> --The way I am thinking of designing a task is that it should accept the
> current population and return an instance of ChromosomePair.
> The chromosomes within this pair would be added to the population by the
> caller thread. Population won't be updated by multiple threads.

Then, it would not be a multi-threaded library.
This kind of parallelism does not need support from the library,
and can be implemented at the application level.

> The code snippet below shows the body of the method which will be executed
> inside the task.
> --CUT--
>
> //selection
> ChromosomePair pair = getSelectionPolicy().select(current);
>
> // crossover
> if (randGen.nextDouble() < getCrossoverRate()) {
> // apply crossover policy to create two offspringoport
> pair = getCrossoverPolicy().crossover(pair.getFirst(), pair.getSecond());
> }
>
> // mutation
> if (randGen.nextDouble() < getMutationRate()) {
> // apply mutation policy to the chromosomes
> pair = new ChromosomePair(
> getMutationPolicy().mutate(pair.getFirst()),
> getMutationPolicy().mutate(pair.getSecond()));
> }
>
> return pair;
>
> --CUT--

One of the issue with above code is, again, that "randGen"
must be thread-safe (and it is not, usually).
Also, it doesn't say anything about how to ensure that
the fitness computation is thread-safe (and if you assume
that it will be computed outside that "task", then the
performance gain will be very low).

> >
> >I'd surmise that "multiple instances of AbstractGeneticAlgorithm"
> >is an application concern; unless I'm missing something, it's
> >not what I've in mind when talking about multi-threading.
> >Actually, I was wondering whether we could implement the
> >analog of what is in the "commons-math-neuralnet" module,
> >where
> >* "Neuron" is the counterpart "Chromosome"
> >* "Network" is the counterpart of "Population".
> --"multiple instance of AbstractGeneticAlgorithm" is related to parallel GA
> with multiple populations not multi-threading.

Yes, as I also mentioned above.
But I'm interested in where multi-threading can be implemented to
be used in both cases (single population and multiple populations).

> Users can also implement parallel GA in a synchronous manner although that
> won't be a recommended way.
> Multi-threading is only a way to improve performance using a user's multi
> core CPU.
> The threads in the thread pool would only be used to execute the task as
> mentioned in the previous comment.

That's where I've some doubt.
But be free to implement benchmarks that demonstrate the
expected performance improvement.

> I think we have some misunderstanding over here. It is better to do an
> implementation first and start the discussion.
> It would be more productive.

Agreed. ;-)

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [Math] Review of "genetic algorithm" module

Reply via email to