Re: [Math] Review of "genetic algorithm" module

Avijit Basak Sat, 02 Jul 2022 23:39:16 -0700

Hi All

            Please find my responses. Sorry for the delay.


> Comments related to new model:
>>
>> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
>> operators are designed hierarchically implementing a common interface and
>> operators are accepted as a list.
>> This enables interchangeability of operators. Library users would be able
>> to use crossover and mutation operators interchangeably.
>> However, in genetic algorithms or other population based search
algorithms
>> the operators are used for broadly two purposes, exploration and
>> exploitation.
>> In GA crossover is used for exploitation and mutation is used for
>> exploration. Keeping them in a common hierarchy and allowing
>> interchangeable operation violates that purpose.
>
>I'm not sure that the semantics of "exploitation" and "exploration"
>should drive the API design.
>Saying it differently: We don't need to know how various operators
>will be used in order to implement them; hence IMO, there is no
>need to discriminate at the API level.

-- Even if we want to use a common abstraction, algorithm implementation
needs to use them in different ways.
The new proposed implementation does not consider different operators
separately. I have mentioned this in the summary section at the end.

>
>>
>> 2) Chromosome Fitness: In the new design the chromosome fitness is
>> maintained as a hashmap where the key is the chromosome itself.
>> Are we going to retain the fitness value in chromosome too otherwise
>> implementation of Comparable won't be possible?
>
>Sorry, I don't follow.
>
>> Assuming the chromosome representation would be used to calculate
hashcode,
>> it would be a very time consuming process depending on the length of
>> chromosome.
>
>Is this assumption correct?
>For what purpose do we need to compute a custom hash code?

-- A standard practice is to provide a custom implementation of hashCode
and equals method whenever an object is used as key to any hash based data
structure like HashMap we are using here.
If you think it is good to rely on the JVM specific implementation, I have
no concerns.

>
>> E.g. in binary chromosomes we have provision to allow chromosome length
>> upto (2^31 - 1).
>
>That's a lot. ;-)
>Do you have use cases where such long representations can be handled?

-- One such use case could be tuning of neural networks. Once we introduce
genetic programming there would be a lot more use cases.

>
>
>
>> Along with that the chromosome implements Comparable
>> interface.
>> Equality by Comparable interface would be decided by fitness
>
>Sure.
>
>> however
>> equality by equals() method would depend on representation.
>
>Do we need a custom "equals"?

-- Mentioned above.

>
>> As chromosomes with different representations may also have the same
>> fitness, this would produce a conflict.
>
>Please provide an example.

-- Think of a problem where we need to find out the minimum value of a
function using GA where the function is a square of sinusoid like mentioned
below.
y = Math.pow(sin(x), 2)
         where -180 deg <= x <= 180 deg
We may have the same value of y for different values of x due to the
replicative nature of the function.
Similar situations can arrive in real life optimization problems too.
Even the MathFunctionOptimization example we have used belongs to this
category of problem.

>
>>
>> 3) The model does not consider anything related to adaptive approaches.
>> Would it be a separate factory?
>
>What I've proposed is an alternative "skeleton" for the API.  Of course,
>more classes will provide specific functionality.
>An adaptive operator "is-a" specialized genetic operator (perhaps the
>notion of "Adaptive" will be defined through an interface?).

-- In "commons-math4-ga2" module separate ApplicationRate has been
introduced to handle the adaptive nature of the algorithm.
But this model always uses sorting as the default option which is
unnecessary for simple genetic algorithms. Isn't it a performance
bottleneck?

>
>>
>> 4) Comparison of chromosomes: In the current model two chromosomes are
>> compared after decoding the genotype to phenotype using the internal
>> decoder.
>> How can we address this in the new model?
>
>As mentioned above: Do we really need to compare representations?

-- This is a minor scenario and can be ignored as of now.

>
>>
>> 5) Chromosome String representation: Currently we use the toString()
method
>> to print the chromosome's phenotype. In the new model we would need to
>> avoid this approach as decoders won't be available to chromosomes.
>
>This seems like a minor issue (or perhaps no issue at all?) unless I'm
>missing something.

-- We can ignore this for now.

[...]
>> >Doing manually will be too tedious.
>> >If you are reasonably confident that you've ported everything
>> >valuable, I guess that we'll rely on coverage tools...
>> >The current log statements could also be construed as (totally)
>> >unnecessary, as they merely document statements which I would
>> >consider part of an "application", not the GA "library".
>> >I believe that we do not agree yet on where to draw the line (my
>> >proposal in MATH-1618 is related to that difference of opinion).
>> >
>> -- Probably I am missing something here. These log statements are more
>> useful for library users. If we remove all log statements how users would
>> be able to debug any issue.
>> What if there are issues in any part of library code or may be anything
>> fails due to some application related customization.
>> It may not be necessarily a library bug.
>
>I'm not clear about what is the intended purpose of logging.
>But I'd rather defer this discussion, and first focus on the actual GA
>functionality being implemented.

-- This can be postponed for now.

>>[...]
>> >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
>> >(as a user), but as developers we should only include functionality that
>> >is broadly useful.  I think I provided arguments that it is not the
case of
>> >that class.
>> -- So do you want this class to be removed?
>
>Short answer would be "yes".  But the rationale is that we should
>defer all functionality that just seems "nice-to-have" until the core
>API is stabilized.

-- I agree with you.

>
>>
>> >Great!
>> >
>> >> >> [...]
>> >
>> >Maybe I'm still missing something, but IMHO the distinction between
>> >adaptive vs not should not impact code at that level.
>> -- Actually I wanted to keep a provision where users can mix both
>> approaches. Maybe keep crossover probability as constant but mutation as
>> adaptive.
>
>AFAICT, this would follow from my proposal (when "AdaptiveMutation"
>and "AdaptiveCrossover" operators are implemented).

-- Addressed in the new implementation.

>
>>
>> [...]
>> >> >(3)
>> >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>> >> >
>> >> >It seems to be a "mixed-bag" kind of class (that is being frowned
>> >> >upon nowadays).
>> >> >Its comment refers to "random" but some methods are not using
>> >> >any randomization.  Most methods are only used in unit tests.
>> >> >
>> >> -- Actually most methods are taken from the legacy GA application. In
the
>> >> legacy library representation there was no separate concept for the
>> decoder
>> >> and all representation utility methods were kept in the corresponding
>> >> Chromosome classes. ChromosomeRepresentationUtils is created as a
>> >> placeholder for those utility methods.
>> >
>> >I'm not sure I follow what you mean.
>> >If these methods are only needed by the (legacy?) examples, they
>> >should be moved to the "ga-examples" module (where they can be
>> >removed later on without regards to BC).
>> -- These were adopted from the legacy model and should be useful in this
>> new model too.
>
>Anything not _surely_ useful (i.e. actually used by the new library)
>should be left out.  If necessary, code can be re-added later.
>
>>
>> > > > [...]
>> >
>> >OK.  [Is there a new PR?]
>> -- I shall create one.
>
>We still need to agree on my proposal...
>Would you try to follow up on that basis, i.e. add a minimal set of GA
>operators, in order to show that some of the "examples" can be run?

-- Implementation of the algorithm still needs to be corrected in
commons-math4-ga2 module as described in the summary.

>
>> >
>> >> >(5)
>> >> >Why does a "Chromosome" need an "identifier"?
>> >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
>> >> >that is an internal class, where it seems that the chromosome itself
>> >> >(rather than its "id") could serve as the map's key.
>> >> -- How can we generate a map's key from the chromosome itself? Please
>> >> suggest.
>> >
>> >A class to be used as a key only needs to implement "equals" and
>> >"hashCode".
>> -- The current chromosome class implements Comparable interface which
uses
>> chromosome fitness for comparison. Use of both Comparable and equals()
>> might introduce inconsistencies.
>
>An example?

-- Inconsistency can appear in case we provide a custom implementation of
equals and hashcode following the representation of chromosome or use the
default implementation.
Since Comparable uses the fitness value to compare and as described above
two chromosomes with separate representations can have the same fitness
value this might result in inconsistency.
But in the new module chromosome does not implement Comparable, so there is
no possibility of the same.

>
>> >
>> >> >
>> >> >(6)
>> >> >o.a.c.m.ga.chromsome.AbstractChromosome
>> >> >
>> >> >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
>> >> >object (used in "evaluate() to compute that field) is passed to the
>> >> >constructor.  Is there a reason for the "lazy" evaluation?
>> >> >Dropping it would make the instance immutable (and "evaluate()"
>> >> >should be renamed to "getFitness()").
>> >> >
>> >> >Why should the "FitnessFunction" be stored in every chromosome?
>> >> >
>> >> -- I have modified the fitness as final and initialized the same in
the
>> >> constructor.
>> >
>> >Better, but did you check my proposal in MATH-1618, where
>> >Chromosome and fitness are decoupled, and their relationship
>> >is held within a "Population" instance?
>> --Mentioned earlier.
>
>I still don't know whether you agree that my proposal makes it
>simpler to express a GA.

-- I think there are few points where we are not aligned and those are
mentioned in the summary section.

>
>> >
>> >> [...]
>
>> >
>> >> >
>> >> >(9)
>> >> >Naming of factory methods should be harmonized to match the
convention
>> >> >adopted in components like [RNG] and [Numbers].
>> >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
>> "from(...)"
>> >> >for "value object", and "create(...)" otherwise.
>> >> >
>> >> -- I have renamed the same for Chromosome classes.
>> >> What about the nextGeneration() method of ListPopulation class.
Renaming
>> >> this to create() or from() won't convey the purpose of it.
>> >
>> >I agree, and that's why the new "Population" class (in MATH-1618) does
>> >not provide a factory method (see also the "GeneticAlgorithmFactory"
>> >class).
>> -- We can avoid the same in the current model if we agree to use a
default
>> implementation of population and remove the Population interface
following
>> your new model.
>
>So, do we adopt that "new model"?
>Or do you still have objections?

-- Mentioned above.

>
>> >
>> >> >(10)
>> >> >o.a.c.m.ga.chromosome.AbstractListChromosome
>> >> >
>> >> >Constructor is called with an argument that is a previously
instantiated
>> >> >"representation".  If the latter is mutable, the caller will be able
to
>> >> modify
>> >> >the underlying data structure of the newly created chromosome.  [The
>> >> >doc assumes immutability of the representation but this cannot be
>> >> >enforced, and mixed ownership can entail subtle bugs.]
>> >> -- I think this applies to both representation as well as generic
>> parameter
>> >> type T. But I don't see any other option but to rely on the user.
>> >
>> >The Javadoc (at line 84) is misleading in its mention of "immutable".
>> >
>> >> If you have any suggestions kindly share.
>> >
>> >I may not understand all the implications, but I'd suggest that the
>> >"representation" be instantiated within the control of the library (e.g.
>> >through a "builder"/"factory").
>> -- Currently we have the ChromosomeRepresentationUtils for the same. Its
>> methods are designed to generate the representations.
>
>My suggestion is that this design can be improved (a.o. according to my
>above suggestion).

-- Sure.

>
>> >
>> >> >
>> >> >(11)
>> >> >Do we agree that, in a GA, the most time-consuming task is the
fitness
>> >> >computation?  Hence IMO, it should be the focus of the multithreading
>> >> >tools (i.e. "ExecutorService"), probably keeping the other parts
(namely
>> >> >the genetic operators) within a simple sequential loop (as in class
>> >> >"GeneticAlgorithmFactory" in MATH-1618).
>> >> -- Current implementation uses separate threads for applying crossover
>> and
>> >> mutation operators for each pair of selected chromosomes.
>> >> I think this ensures better utilization of multi-core processors
compared
>> >> to use of multi-threading only for the fitness calculation.
>> >
>> >I have the opposite intuition: Parallel application of the genetic
>> >operators would only provide marginal gains wrt the fitness
>> >computation.
>> >In any case, I think that it will be fairly easy to modify my proposed
>> >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
>> >show that a substantial gain could indeed be achieved).
>> >
>> >> -- Some codes are checked in. But there is a conflict in the pull
>> request.
>> >> So I shall create a new one and delete the old branch itself.
>> >
>> >IMHO, we could make more substantial progress if you could
>> >first point to issues with my proposal in MATH-1618.
>> --Mentioned earlier.
>
>Well, I don't know where we stand...
>
>
>
>
>
>
>
>
>Responding below to some of my own questions following commit
>   ddfd5bf859d04cc5da604b20021ceaba9de7def6
>in branch
>  feature__MATH-1563__genetic_algorithm
>
>Le mar. 24 mai 2022 à 01:54, Gilles Sadowski <[email protected]> a
écrit :
>>
>> Hello.
>>
>> Le mer. 18 mai 2022 à 16:34, Avijit Basak <[email protected]> a
écrit :
>> >
>> > Hi All
>> >
>> >         Please find my comments below.
>> >
>> > Comments related to new model:
>> >
>> > 1) Hierarchy of GeneticOperator: In the proposed model the genetic
>> > operators are designed hierarchically implementing a common interface
and
>> > operators are accepted as a list.
>> > This enables interchangeability of operators. Library users would be
able
>> > to use crossover and mutation operators interchangeably.
>> > However, in genetic algorithms or other population based search
algorithms
>> > the operators are used for broadly two purposes, exploration and
>> > exploitation.
>> > In GA crossover is used for exploitation and mutation is used for
>> > exploration. Keeping them in a common hierarchy and allowing
>> > interchangeable operation violates that purpose.
>>
>> I'm not sure that the semantics of "exploitation" and "exploration"
>> should drive the API design.
>> Saying it differently: We don't need to know how various operators
>> will be used in order to implement them; hence IMO, there is no
>> need to discriminate at the API level.
>
>The "core" GA algorithm (see class "GeneticAlgorithmFactory") is
>oblivious to whether a genetic operator "is-a" crossover or mutation.

-- I have provided my comments related to this in the summary section at
the end.

>
>> >
>> > 2) Chromosome Fitness: In the new design the chromosome fitness is
>> > maintained as a hashmap where the key is the chromosome itself.
>> > Are we going to retain the fitness value in chromosome too otherwise
>> > implementation of Comparable won't be possible?
>>
>> Sorry, I don't follow.
>
>Implementing "Comparable" is not necessary.
-- Agreed

>
>> > Assuming the chromosome representation would be used to calculate
hashcode,
>> > it would be a very time consuming process depending on the length of
>> > chromosome.
>>
>> Is this assumption correct?
>> For what purpose do we need to compute a custom hash code?
>
>A custom "hashCode" method is not necessary.
>The only consequence seems that 2 different instances of a genotype that is
>logically the same (genotype-wise) could both be added to a population
while
>the same (in-memory) instance would only appear once in the hash map.

-- This would be fine with the proposed implementation.

>
>>
>> > E.g. in binary chromosomes we have provision to allow chromosome length
>> > upto (2^31 - 1).
>>
>> That's a lot. ;-)
>> Do you have use cases where such long representations can be handled?
>
>For now, I've use "BitSet" as the internal representation (but this could
>be changed if necessary, because it is not part of the public API).

-- Sure.

>
>> > Along with that the chromosome implements Comparable
>> > interface.
>> > Equality by Comparable interface would be decided by fitness
>>
>> Sure.
>>
>> > however
>> > equality by equals() method would depend on representation.
>>
>> Do we need a custom "equals"?
>
>We don't.

-- Agreed.

>
>> > As chromosomes with different representations may also have the same
>> > fitness, this would produce a conflict.
>>
>> Please provide an example.
>>
>> >
>> > 3) The model does not consider anything related to adaptive approaches.
>> > Would it be a separate factory?
>>
>> What I've proposed is an alternative "skeleton" for the API.  Of course,
>> more classes will provide specific functionality.
>> An adaptive operator "is-a" specialized genetic operator (perhaps the
>> notion of "Adaptive" will be defined through an interface?).
>
>We don't need anything that complex (IIUC).
>See interface "ApplicationRate" and its implementations in package "rate".

-- This is fine.

>
>> >
>> > 4) Comparison of chromosomes: In the current model two chromosomes are
>> > compared after decoding the genotype to phenotype using the internal
>> > decoder.
>> > How can we address this in the new model?
>>
>> As mentioned above: Do we really need to compare representations?
>
>Within the library, it does not seem necessary.

-- Agreed.

>
>> >
>> > 5) Chromosome String representation: Currently we use the toString()
method
>> > to print the chromosome's phenotype. In the new model we would need to
>> > avoid this approach as decoders won't be available to chromosomes.
>>
>> This seems like a minor issue (or perhaps no issue at all?) unless I'm
>> missing something.
>
>The GA does not need to print the phenotype.
>The decoder is user-defined, hence he can obviously apply it whenever
>he needs a printable version of the chromosomes.

-- Agreed.

>
>> >
>> > 6) However in the new model the generic type parameters and java.util
>> > packages are used in a more organized way. We can adopt the same in the
>> > existing model.
>>
>> I don't understand what you are proposing.
>
>I've used two generic parameters (one for genotype, one for phenotype).
>The code does not use any "@SuppressWarning" annotation.

-- Agreed.

>
>> >
>> > >If we need to document software (e.g. the "examples") produced
>> > >by a "Commons" component, it is preferable that we use and refer
>> > >to "Log4j2".
>> > >[The logger API is another matter since it does not impact how the
>> > >software is used (from a user's POV).]
>> > -- I shall make the changes.
>
>Logging statements are marginally useful in such a small piece of code.
>Tracking evolution can be implemented at the application level.
>See interface "GenerationCallback".

-- As long as users have some option to enable the same it is fine.

>
>>
>> Thanks.
>>
>> >
>> > > [...]
>> > >Doing manually will be too tedious.
>> > >If you are reasonably confident that you've ported everything
>> > >valuable, I guess that we'll rely on coverage tools...
>> > >The current log statements could also be construed as (totally)
>> > >unnecessary, as they merely document statements which I would
>> > >consider part of an "application", not the GA "library".
>> > >I believe that we do not agree yet on where to draw the line (my
>> > >proposal in MATH-1618 is related to that difference of opinion).
>> > >
>> > -- Probably I am missing something here. These log statements are more
>> > useful for library users. If we remove all log statements how users
would
>> > be able to debug any issue.
>> > What if there are issues in any part of library code or may be anything
>> > fails due to some application related customization.
>> > It may not be necessarily a library bug.
>>
>> I'm not clear about what is the intended purpose of logging.
>
>See previous response.
>
>> But I'd rather defer this discussion, and first focus on the actual GA
>> functionality being implemented.
>>
>> >
>> > >> >> >Likewise, the "PopulationStatisticsLogger" is not general
enough to
>> > >> >> >be worth being part of the library.
>> > >> >> -- I would love to keep this simple implementation of convergence
>> > >> >> listener. This can provide a very basic overview on the nature of
>> > >> >> convergence.
>> > >> >
>> > >> >As a user, you'll be able to provide the feature to your
application
>> > >> >with less than 10 lines of code (by registering a "listener").
>> > >> >Everyone else might want a slightly different output (contents
and/or
>> > >> >format) than what this class generates.
>> > >> -- Users would always have the freedom to register their own
listener.
>> > >> PopulationStatisticsLogger provides only a very basic option to
track the
>> > >> population statistics during the convergence process.
>> > >> This is never a very generic solution to meet the needs of all
users.
>> > >
>> > >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
>> > >(as a user), but as developers we should only include functionality
that
>> > >is broadly useful.  I think I provided arguments that it is not the
case of
>> > >that class.
>> > -- So do you want this class to be removed?
>>
>> Short answer would be "yes".  But the rationale is that we should
>> defer all functionality that just seems "nice-to-have" until the core
>> API is stabilized.
>
>Displaying statistics should be done at the application level.
>See "GenerationLogger" in class "MathFunctionOptimizer2" (adapted
>from yours).
>
>> >
>> > >Great!
>> > >
>> > >> >> [...]
>> > >
>> > >Maybe I'm still missing something, but IMHO the distinction between
>> > >adaptive vs not should not impact code at that level.
>> > -- Actually I wanted to keep a provision where users can mix both
>> > approaches. Maybe keep crossover probability as constant but mutation
as
>> > adaptive.
>>
>> AFAICT, this would follow from my proposal (when "AdaptiveMutation"
>> and "AdaptiveCrossover" operators are implemented).
>
>Such operators are not necessary (see previous comment, about
>"ApplicationRate").
>
>> >
>> > [...]
>> > >> >(3)
>> > >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
>> > >> >
>> > >> >It seems to be a "mixed-bag" kind of class (that is being frowned
>> > >> >upon nowadays).
>> > >> >Its comment refers to "random" but some methods are not using
>> > >> >any randomization.  Most methods are only used in unit tests.
>> > >> >
>> > >> -- Actually most methods are taken from the legacy GA application.
In the
>> > >> legacy library representation there was no separate concept for the
>> > decoder
>> > >> and all representation utility methods were kept in the
corresponding
>> > >> Chromosome classes. ChromosomeRepresentationUtils is created as a
>> > >> placeholder for those utility methods.
>> > >
>> > >I'm not sure I follow what you mean.
>> > >If these methods are only needed by the (legacy?) examples, they
>> > >should be moved to the "ga-examples" module (where they can be
>> > >removed later on without regards to BC).
>> > -- These were adopted from the legacy model and should be useful in
this
>> > new model too.
>>
>> Anything not _surely_ useful (i.e. actually used by the new library)
>> should be left out.  If necessary, code can be re-added later.
>
>Representation(s) should be internal, together with the operators
>that manipulate them.
>
>> >
>> > > > > [...]
>> > >
>> > >OK.  [Is there a new PR?]
>> > -- I shall create one.
>>
>> We still need to agree on my proposal...
>> Would you try to follow up on that basis, i.e. add a minimal set of GA
>> operators, in order to show that some of the "examples" can be run?
>
>I've added the "MathFunctionOptimizer2" in the "Standalone" example
>application.  Please check that it produces the expected output.
>[I had to slightly change the function being optimized, and your decoding
>function. Why are you assuming that "Coordinates" cannot be negative?]

-- The fitness function uses the square of each dimensional value of the
coordinate. So negative signs won't impact the fitness value.
But we can always consider -ve.

>
>> > >
>> > >> >(5)
>> > >> >Why does a "Chromosome" need an "identifier"?
>> > >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
>> > >> >that is an internal class, where it seems that the chromosome
itself
>> > >> >(rather than its "id") could serve as the map's key.
>> > >> -- How can we generate a map's key from the chromosome itself?
Please
>> > >> suggest.
>> > >
>> > >A class to be used as a key only needs to implement "equals" and
>> > >"hashCode".
>> > -- The current chromosome class implements Comparable interface which
uses
>> > chromosome fitness for comparison. Use of both Comparable and equals()
>> > might introduce inconsistencies.
>
>In my proposal, neither is implemented.
>
>> An example?
>>
>> > >
>> > >> >
>> > >> >(6)
>> > >> >o.a.c.m.ga.chromsome.AbstractChromosome
>> > >> >
>> > >> >Field "fitness" is not "final", yet it could be: a
"FitnessFunction"
>> > >> >object (used in "evaluate() to compute that field) is passed to the
>> > >> >constructor.  Is there a reason for the "lazy" evaluation?
>> > >> >Dropping it would make the instance immutable (and "evaluate()"
>> > >> >should be renamed to "getFitness()").
>> > >> >
>> > >> >Why should the "FitnessFunction" be stored in every chromosome?
>> > >> >
>> > >> -- I have modified the fitness as final and initialized the same in
the
>> > >> constructor.
>> > >
>> > >Better, but did you check my proposal in MATH-1618, where
>> > >Chromosome and fitness are decoupled, and their relationship
>> > >is held within a "Population" instance?
>> > --Mentioned earlier.
>>
>> I still don't know whether you agree that my proposal makes it
>> simpler to express a GA.
>
>Please have a look at the commit mentioned above, and
> * for each design issue, post a new discussion thread,
> * for each bug, file a JIRA report.
>
>> > >
>> > >> [...]
>> > >
>> > >>
>> > >> >(8)
>> > >> >@SuppressWarnings("unchecked")
>> > >> >
>> > >> >By default, I'm a bit suspicious about having to resort to these
>> > >> annotations,
>> > >> >especially for the kind of algorithms we are trying to implement.
>> > >> >What do you think of the alternative approach outlined in the ZIP
file
>> > >> >attached in MATH-1618:
>> > >> >    https://issues.apache.org/jira/browse/MATH-1618
>> > >> >?
>> > >> -- This annotation is required because we have kept an option to use
>> > >> different types of genotypes including primitive.
>> > >> Because of that our base interfaces only declares phenotype not
genotype.
>> > >> This introduced a kind of hierarchy in all operators and chromosome
>> > classes
>> > >> which required us to use the mentioned annotation.
>> > >
>> > >I may again be missing something.
>> > >Could you please explain the case that makes these annotations
>> > >necessary.
>> > -- This has been only used to avoid the warning in the place of
typecasting.
>> > However, I can work to minimize this following your new model.
>>
>> "Minimize"?
>
>No unsafe cast seems necessary.

-- Agreed.

>
>> > >
>> > >> >
>> > >> >(9)
>> > >> >Naming of factory methods should be harmonized to match the
convention
>> > >> >adopted in components like [RNG] and [Numbers].
>> > >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
>> > "from(...)"
>> > >> >for "value object", and "create(...)" otherwise.
>> > >> >
>> > >> -- I have renamed the same for Chromosome classes.
>> > >> What about the nextGeneration() method of ListPopulation class.
Renaming
>> > >> this to create() or from() won't convey the purpose of it.
>> > >
>> > >I agree, and that's why the new "Population" class (in MATH-1618) does
>> > >not provide a factory method (see also the "GeneticAlgorithmFactory"
>> > >class).
>> > -- We can avoid the same in the current model if we agree to use a
default
>> > implementation of population and remove the Population interface
following
>> > your new model.
>>
>> So, do we adopt that "new model"?
>> Or do you still have objections?
>
>As the one who investigates GA behaviours, please list the concrete
>problems with my proposal.

-- Mentioned at the end in the summary section.

>
>> > >
>> > >> >(10)
>> > >> >o.a.c.m.ga.chromosome.AbstractListChromosome
>> > >> >
>> > >> >Constructor is called with an argument that is a previously
instantiated
>> > >> >"representation".  If the latter is mutable, the caller will be
able to
>> > >> modify
>> > >> >the underlying data structure of the newly created chromosome.
 [The
>> > >> >doc assumes immutability of the representation but this cannot be
>> > >> >enforced, and mixed ownership can entail subtle bugs.]
>> > >> -- I think this applies to both representation as well as generic
>> > parameter
>> > >> type T. But I don't see any other option but to rely on the user.
>> > >
>> > >The Javadoc (at line 84) is misleading in its mention of "immutable".
>> > >
>> > >> If you have any suggestions kindly share.
>> > >
>> > >I may not understand all the implications, but I'd suggest that the
>> > >"representation" be instantiated within the control of the library
(e.g.
>> > >through a "builder"/"factory").
>> > -- Currently we have the ChromosomeRepresentationUtils for the same.
Its
>> > methods are designed to generate the representations.
>>
>> My suggestion is that this design can be improved (a.o. according to my
>> above suggestion).
>
>In my proposal, representations would be not be "public", and each
>would be declared and used within a dedicated package.
>See package "gene/binary".

-- This is fine.

>
>> > >
>> > >> >
>> > >> >(11)
>> > >> >Do we agree that, in a GA, the most time-consuming task is the
fitness
>> > >> >computation?  Hence IMO, it should be the focus of the
multithreading
>> > >> >tools (i.e. "ExecutorService"), probably keeping the other parts
(namely
>> > >> >the genetic operators) within a simple sequential loop (as in class
>> > >> >"GeneticAlgorithmFactory" in MATH-1618).
>> > >> -- Current implementation uses separate threads for applying
crossover
>> > and
>> > >> mutation operators for each pair of selected chromosomes.
>> > >> I think this ensures better utilization of multi-core processors
compared
>> > >> to use of multi-threading only for the fitness calculation.
>> > >
>> > >I have the opposite intuition: Parallel application of the genetic
>> > >operators would only provide marginal gains wrt the fitness
>> > >computation.
>> > >In any case, I think that it will be fairly easy to modify my proposed
>> > >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
>> > >show that a substantial gain could indeed be achieved).
>> > >
>> > >> -- Some codes are checked in. But there is a conflict in the pull
>> > request.
>> > >> So I shall create a new one and delete the old branch itself.
>> > >
>> > >IMHO, we could make more substantial progress if you could
>> > >first point to issues with my proposal in MATH-1618.
>> > --Mentioned earlier.
>>
>> Well, I don't know where we stand...
>
>Hoping we can make significant progress so as to include the new
>GA functionality in the upcoming release of "Commons Math"...


I would like to summarize a few points which I think could be a problem in
the new model.

1) Separation of crossover and mutation operator at the API level and the
corresponding usage of the GeneticOperator in the algorithm implementation:
-- As mentioned earlier we would need a separation of GeneticOperators.
The current implementation of the algorithm does not distinguish the
crossover and mutation operator and treats them uniformly.
This violates the basic concept of the algorithm itself and needs to be
fixed.

2) Use of adaptive probability:
-- Currently the Mutation class keeps a constant probability value as well
as a separate ApplicationRate was passed for each genetic operator.
Since we are going to use the operator probability in an adaptive way we
would need to pass the generated probability as an argument to the apply
method of GeneticOperator.

3) Separation of logic for adaptive and Simple GA:
-- We might want to separate two different strategies of GA i.e. simple GA
and adaptive GA.
As for simple GA with constant probability we can avoid computation like
rank calculation.

4) OffspringGenerator following Strategy pattern:
-- Can we have an implementation of OffspringGenerator following strategy
pattern. We should have default implementation following your first
proposal.
But there can be an optional parameter in the API method, where library
users would be able to provide their custom implementation.

If you need a separate mail chain for each of the review comments mentioned
above kindly let me know.

Please share further thoughts regarding this.


Thanks & Regards
--Avijit Basak


On Wed, 1 Jun 2022 at 04:53, Gilles Sadowski <[email protected]> wrote:

> Responding below to some of my own questions following commit
>    ddfd5bf859d04cc5da604b20021ceaba9de7def6
> in branch
>   feature__MATH-1563__genetic_algorithm
>
> Le mar. 24 mai 2022 à 01:54, Gilles Sadowski <[email protected]> a
> écrit :
> >
> > Hello.
> >
> > Le mer. 18 mai 2022 à 16:34, Avijit Basak <[email protected]> a
> écrit :
> > >
> > > Hi All
> > >
> > >         Please find my comments below.
> > >
> > > Comments related to new model:
> > >
> > > 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> > > operators are designed hierarchically implementing a common interface
> and
> > > operators are accepted as a list.
> > > This enables interchangeability of operators. Library users would be
> able
> > > to use crossover and mutation operators interchangeably.
> > > However, in genetic algorithms or other population based search
> algorithms
> > > the operators are used for broadly two purposes, exploration and
> > > exploitation.
> > > In GA crossover is used for exploitation and mutation is used for
> > > exploration. Keeping them in a common hierarchy and allowing
> > > interchangeable operation violates that purpose.
> >
> > I'm not sure that the semantics of "exploitation" and "exploration"
> > should drive the API design.
> > Saying it differently: We don't need to know how various operators
> > will be used in order to implement them; hence IMO, there is no
> > need to discriminate at the API level.
>
> The "core" GA algorithm (see class "GeneticAlgorithmFactory") is
> oblivious to whether a genetic operator "is-a" crossover or mutation.
>
> > >
> > > 2) Chromosome Fitness: In the new design the chromosome fitness is
> > > maintained as a hashmap where the key is the chromosome itself.
> > > Are we going to retain the fitness value in chromosome too otherwise
> > > implementation of Comparable won't be possible?
> >
> > Sorry, I don't follow.
>
> Implementing "Comparable" is not necessary.
>
> > > Assuming the chromosome representation would be used to calculate
> hashcode,
> > > it would be a very time consuming process depending on the length of
> > > chromosome.
> >
> > Is this assumption correct?
> > For what purpose do we need to compute a custom hash code?
>
> A custom "hashCode" method is not necessary.
> The only consequence seems that 2 different instances of a genotype that is
> logically the same (genotype-wise) could both be added to a population
> while
> the same (in-memory) instance would only appear once in the hash map.
>
> >
> > > E.g. in binary chromosomes we have provision to allow chromosome length
> > > upto (2^31 - 1).
> >
> > That's a lot. ;-)
> > Do you have use cases where such long representations can be handled?
>
> For now, I've use "BitSet" as the internal representation (but this could
> be changed if necessary, because it is not part of the public API).
>
> > > Along with that the chromosome implements Comparable
> > > interface.
> > > Equality by Comparable interface would be decided by fitness
> >
> > Sure.
> >
> > > however
> > > equality by equals() method would depend on representation.
> >
> > Do we need a custom "equals"?
>
> We don't.
>
> > > As chromosomes with different representations may also have the same
> > > fitness, this would produce a conflict.
> >
> > Please provide an example.
> >
> > >
> > > 3) The model does not consider anything related to adaptive approaches.
> > > Would it be a separate factory?
> >
> > What I've proposed is an alternative "skeleton" for the API.  Of course,
> > more classes will provide specific functionality.
> > An adaptive operator "is-a" specialized genetic operator (perhaps the
> > notion of "Adaptive" will be defined through an interface?).
>
> We don't need anything that complex (IIUC).
> See interface "ApplicationRate" and its implementations in package "rate".
>
> > >
> > > 4) Comparison of chromosomes: In the current model two chromosomes are
> > > compared after decoding the genotype to phenotype using the internal
> > > decoder.
> > > How can we address this in the new model?
> >
> > As mentioned above: Do we really need to compare representations?
>
> Within the library, it does not seem necessary.
>
> > >
> > > 5) Chromosome String representation: Currently we use the toString()
> method
> > > to print the chromosome's phenotype. In the new model we would need to
> > > avoid this approach as decoders won't be available to chromosomes.
> >
> > This seems like a minor issue (or perhaps no issue at all?) unless I'm
> > missing something.
>
> The GA does not need to print the phenotype.
> The decoder is user-defined, hence he can obviously apply it whenever
> he needs a printable version of the chromosomes.
>
> > >
> > > 6) However in the new model the generic type parameters and java.util
> > > packages are used in a more organized way. We can adopt the same in the
> > > existing model.
> >
> > I don't understand what you are proposing.
>
> I've used two generic parameters (one for genotype, one for phenotype).
> The code does not use any "@SuppressWarning" annotation.
>
> > >
> > > >If we need to document software (e.g. the "examples") produced
> > > >by a "Commons" component, it is preferable that we use and refer
> > > >to "Log4j2".
> > > >[The logger API is another matter since it does not impact how the
> > > >software is used (from a user's POV).]
> > > -- I shall make the changes.
>
> Logging statements are marginally useful in such a small piece of code.
> Tracking evolution can be implemented at the application level.
> See interface "GenerationCallback".
>
> >
> > Thanks.
> >
> > >
> > > > [...]
> > > >Doing manually will be too tedious.
> > > >If you are reasonably confident that you've ported everything
> > > >valuable, I guess that we'll rely on coverage tools...
> > > >The current log statements could also be construed as (totally)
> > > >unnecessary, as they merely document statements which I would
> > > >consider part of an "application", not the GA "library".
> > > >I believe that we do not agree yet on where to draw the line (my
> > > >proposal in MATH-1618 is related to that difference of opinion).
> > > >
> > > -- Probably I am missing something here. These log statements are more
> > > useful for library users. If we remove all log statements how users
> would
> > > be able to debug any issue.
> > > What if there are issues in any part of library code or may be anything
> > > fails due to some application related customization.
> > > It may not be necessarily a library bug.
> >
> > I'm not clear about what is the intended purpose of logging.
>
> See previous response.
>
> > But I'd rather defer this discussion, and first focus on the actual GA
> > functionality being implemented.
> >
> > >
> > > >> >> >Likewise, the "PopulationStatisticsLogger" is not general
> enough to
> > > >> >> >be worth being part of the library.
> > > >> >> -- I would love to keep this simple implementation of convergence
> > > >> >> listener. This can provide a very basic overview on the nature of
> > > >> >> convergence.
> > > >> >
> > > >> >As a user, you'll be able to provide the feature to your
> application
> > > >> >with less than 10 lines of code (by registering a "listener").
> > > >> >Everyone else might want a slightly different output (contents
> and/or
> > > >> >format) than what this class generates.
> > > >> -- Users would always have the freedom to register their own
> listener.
> > > >> PopulationStatisticsLogger provides only a very basic option to
> track the
> > > >> population statistics during the convergence process.
> > > >> This is never a very generic solution to meet the needs of all
> users.
> > > >
> > > >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
> > > >(as a user), but as developers we should only include functionality
> that
> > > >is broadly useful.  I think I provided arguments that it is not the
> case of
> > > >that class.
> > > -- So do you want this class to be removed?
> >
> > Short answer would be "yes".  But the rationale is that we should
> > defer all functionality that just seems "nice-to-have" until the core
> > API is stabilized.
>
> Displaying statistics should be done at the application level.
> See "GenerationLogger" in class "MathFunctionOptimizer2" (adapted
> from yours).
>
> > >
> > > >Great!
> > > >
> > > >> >> [...]
> > > >
> > > >Maybe I'm still missing something, but IMHO the distinction between
> > > >adaptive vs not should not impact code at that level.
> > > -- Actually I wanted to keep a provision where users can mix both
> > > approaches. Maybe keep crossover probability as constant but mutation
> as
> > > adaptive.
> >
> > AFAICT, this would follow from my proposal (when "AdaptiveMutation"
> > and "AdaptiveCrossover" operators are implemented).
>
> Such operators are not necessary (see previous comment, about
> "ApplicationRate").
>
> > >
> > > [...]
> > > >> >(3)
> > > >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
> > > >> >
> > > >> >It seems to be a "mixed-bag" kind of class (that is being frowned
> > > >> >upon nowadays).
> > > >> >Its comment refers to "random" but some methods are not using
> > > >> >any randomization.  Most methods are only used in unit tests.
> > > >> >
> > > >> -- Actually most methods are taken from the legacy GA application.
> In the
> > > >> legacy library representation there was no separate concept for the
> > > decoder
> > > >> and all representation utility methods were kept in the
> corresponding
> > > >> Chromosome classes. ChromosomeRepresentationUtils is created as a
> > > >> placeholder for those utility methods.
> > > >
> > > >I'm not sure I follow what you mean.
> > > >If these methods are only needed by the (legacy?) examples, they
> > > >should be moved to the "ga-examples" module (where they can be
> > > >removed later on without regards to BC).
> > > -- These were adopted from the legacy model and should be useful in
> this
> > > new model too.
> >
> > Anything not _surely_ useful (i.e. actually used by the new library)
> > should be left out.  If necessary, code can be re-added later.
>
> Representation(s) should be internal, together with the operators
> that manipulate them.
>
> > >
> > > > > > [...]
> > > >
> > > >OK.  [Is there a new PR?]
> > > -- I shall create one.
> >
> > We still need to agree on my proposal...
> > Would you try to follow up on that basis, i.e. add a minimal set of GA
> > operators, in order to show that some of the "examples" can be run?
>
> I've added the "MathFunctionOptimizer2" in the "Standalone" example
> application.  Please check that it produces the expected output.
> [I had to slightly change the function being optimized, and your decoding
> function. Why are you assuming that "Coordinates" cannot be negative?]
>
> > > >
> > > >> >(5)
> > > >> >Why does a "Chromosome" need an "identifier"?
> > > >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> > > >> >that is an internal class, where it seems that the chromosome
> itself
> > > >> >(rather than its "id") could serve as the map's key.
> > > >> -- How can we generate a map's key from the chromosome itself?
> Please
> > > >> suggest.
> > > >
> > > >A class to be used as a key only needs to implement "equals" and
> > > >"hashCode".
> > > -- The current chromosome class implements Comparable interface which
> uses
> > > chromosome fitness for comparison. Use of both Comparable and equals()
> > > might introduce inconsistencies.
>
> In my proposal, neither is implemented.
>
> > An example?
> >
> > > >
> > > >> >
> > > >> >(6)
> > > >> >o.a.c.m.ga.chromsome.AbstractChromosome
> > > >> >
> > > >> >Field "fitness" is not "final", yet it could be: a
> "FitnessFunction"
> > > >> >object (used in "evaluate() to compute that field) is passed to the
> > > >> >constructor.  Is there a reason for the "lazy" evaluation?
> > > >> >Dropping it would make the instance immutable (and "evaluate()"
> > > >> >should be renamed to "getFitness()").
> > > >> >
> > > >> >Why should the "FitnessFunction" be stored in every chromosome?
> > > >> >
> > > >> -- I have modified the fitness as final and initialized the same in
> the
> > > >> constructor.
> > > >
> > > >Better, but did you check my proposal in MATH-1618, where
> > > >Chromosome and fitness are decoupled, and their relationship
> > > >is held within a "Population" instance?
> > > --Mentioned earlier.
> >
> > I still don't know whether you agree that my proposal makes it
> > simpler to express a GA.
>
> Please have a look at the commit mentioned above, and
>  * for each design issue, post a new discussion thread,
>  * for each bug, file a JIRA report.
>
> > > >
> > > >> [...]
> > > >
> > > >>
> > > >> >(8)
> > > >> >@SuppressWarnings("unchecked")
> > > >> >
> > > >> >By default, I'm a bit suspicious about having to resort to these
> > > >> annotations,
> > > >> >especially for the kind of algorithms we are trying to implement.
> > > >> >What do you think of the alternative approach outlined in the ZIP
> file
> > > >> >attached in MATH-1618:
> > > >> >    https://issues.apache.org/jira/browse/MATH-1618
> > > >> >?
> > > >> -- This annotation is required because we have kept an option to use
> > > >> different types of genotypes including primitive.
> > > >> Because of that our base interfaces only declares phenotype not
> genotype.
> > > >> This introduced a kind of hierarchy in all operators and chromosome
> > > classes
> > > >> which required us to use the mentioned annotation.
> > > >
> > > >I may again be missing something.
> > > >Could you please explain the case that makes these annotations
> > > >necessary.
> > > -- This has been only used to avoid the warning in the place of
> typecasting.
> > > However, I can work to minimize this following your new model.
> >
> > "Minimize"?
>
> No unsafe cast seems necessary.
>
> > > >
> > > >> >
> > > >> >(9)
> > > >> >Naming of factory methods should be harmonized to match the
> convention
> > > >> >adopted in components like [RNG] and [Numbers].
> > > >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
> > > "from(...)"
> > > >> >for "value object", and "create(...)" otherwise.
> > > >> >
> > > >> -- I have renamed the same for Chromosome classes.
> > > >> What about the nextGeneration() method of ListPopulation class.
> Renaming
> > > >> this to create() or from() won't convey the purpose of it.
> > > >
> > > >I agree, and that's why the new "Population" class (in MATH-1618) does
> > > >not provide a factory method (see also the "GeneticAlgorithmFactory"
> > > >class).
> > > -- We can avoid the same in the current model if we agree to use a
> default
> > > implementation of population and remove the Population interface
> following
> > > your new model.
> >
> > So, do we adopt that "new model"?
> > Or do you still have objections?
>
> As the one who investigates GA behaviours, please list the concrete
> problems with my proposal.
>
> > > >
> > > >> >(10)
> > > >> >o.a.c.m.ga.chromosome.AbstractListChromosome
> > > >> >
> > > >> >Constructor is called with an argument that is a previously
> instantiated
> > > >> >"representation".  If the latter is mutable, the caller will be
> able to
> > > >> modify
> > > >> >the underlying data structure of the newly created chromosome.
> [The
> > > >> >doc assumes immutability of the representation but this cannot be
> > > >> >enforced, and mixed ownership can entail subtle bugs.]
> > > >> -- I think this applies to both representation as well as generic
> > > parameter
> > > >> type T. But I don't see any other option but to rely on the user.
> > > >
> > > >The Javadoc (at line 84) is misleading in its mention of "immutable".
> > > >
> > > >> If you have any suggestions kindly share.
> > > >
> > > >I may not understand all the implications, but I'd suggest that the
> > > >"representation" be instantiated within the control of the library
> (e.g.
> > > >through a "builder"/"factory").
> > > -- Currently we have the ChromosomeRepresentationUtils for the same.
> Its
> > > methods are designed to generate the representations.
> >
> > My suggestion is that this design can be improved (a.o. according to my
> > above suggestion).
>
> In my proposal, representations would be not be "public", and each
> would be declared and used within a dedicated package.
> See package "gene/binary".
>
> > > >
> > > >> >
> > > >> >(11)
> > > >> >Do we agree that, in a GA, the most time-consuming task is the
> fitness
> > > >> >computation?  Hence IMO, it should be the focus of the
> multithreading
> > > >> >tools (i.e. "ExecutorService"), probably keeping the other parts
> (namely
> > > >> >the genetic operators) within a simple sequential loop (as in class
> > > >> >"GeneticAlgorithmFactory" in MATH-1618).
> > > >> -- Current implementation uses separate threads for applying
> crossover
> > > and
> > > >> mutation operators for each pair of selected chromosomes.
> > > >> I think this ensures better utilization of multi-core processors
> compared
> > > >> to use of multi-threading only for the fitness calculation.
> > > >
> > > >I have the opposite intuition: Parallel application of the genetic
> > > >operators would only provide marginal gains wrt the fitness
> > > >computation.
> > > >In any case, I think that it will be fairly easy to modify my proposed
> > > >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
> > > >show that a substantial gain could indeed be achieved).
> > > >
> > > >> -- Some codes are checked in. But there is a conflict in the pull
> > > request.
> > > >> So I shall create a new one and delete the old branch itself.
> > > >
> > > >IMHO, we could make more substantial progress if you could
> > > >first point to issues with my proposal in MATH-1618.
> > > --Mentioned earlier.
> >
> > Well, I don't know where we stand...
>
> Hoping we can make significant progress so as to include the new
> GA functionality in the upcoming release of "Commons Math"...
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [Math] Review of "genetic algorithm" module

Reply via email to