Re: [Math] Review of "genetic algorithm" module

Gilles Sadowski Mon, 23 May 2022 16:54:46 -0700

Hello.

Le mer. 18 mai 2022 à 16:34, Avijit Basak <avijit.ba...@gmail.com> a écrit :
>
> Hi All
>
>         Please find my comments below.
>
> Comments related to new model:
>
> 1) Hierarchy of GeneticOperator: In the proposed model the genetic
> operators are designed hierarchically implementing a common interface and
> operators are accepted as a list.
> This enables interchangeability of operators. Library users would be able
> to use crossover and mutation operators interchangeably.
> However, in genetic algorithms or other population based search algorithms
> the operators are used for broadly two purposes, exploration and
> exploitation.
> In GA crossover is used for exploitation and mutation is used for
> exploration. Keeping them in a common hierarchy and allowing
> interchangeable operation violates that purpose.


I'm not sure that the semantics of "exploitation" and "exploration"
should drive the API design.
Saying it differently: We don't need to know how various operators
will be used in order to implement them; hence IMO, there is no
need to discriminate at the API level.

>
> 2) Chromosome Fitness: In the new design the chromosome fitness is
> maintained as a hashmap where the key is the chromosome itself.
> Are we going to retain the fitness value in chromosome too otherwise
> implementation of Comparable won't be possible?

Sorry, I don't follow.

> Assuming the chromosome representation would be used to calculate hashcode,
> it would be a very time consuming process depending on the length of
> chromosome.

Is this assumption correct?
For what purpose do we need to compute a custom hash code?

> E.g. in binary chromosomes we have provision to allow chromosome length
> upto (2^31 - 1).

That's a lot. ;-)
Do you have use cases where such long representations can be handled?

> Along with that the chromosome implements Comparable
> interface.
> Equality by Comparable interface would be decided by fitness

Sure.

> however
> equality by equals() method would depend on representation.

Do we need a custom "equals"?

> As chromosomes with different representations may also have the same
> fitness, this would produce a conflict.

Please provide an example.

>
> 3) The model does not consider anything related to adaptive approaches.
> Would it be a separate factory?

What I've proposed is an alternative "skeleton" for the API.  Of course,
more classes will provide specific functionality.
An adaptive operator "is-a" specialized genetic operator (perhaps the
notion of "Adaptive" will be defined through an interface?).

>
> 4) Comparison of chromosomes: In the current model two chromosomes are
> compared after decoding the genotype to phenotype using the internal
> decoder.
> How can we address this in the new model?

As mentioned above: Do we really need to compare representations?

>
> 5) Chromosome String representation: Currently we use the toString() method
> to print the chromosome's phenotype. In the new model we would need to
> avoid this approach as decoders won't be available to chromosomes.

This seems like a minor issue (or perhaps no issue at all?) unless I'm
missing something.

>
> 6) However in the new model the generic type parameters and java.util
> packages are used in a more organized way. We can adopt the same in the
> existing model.

I don't understand what you are proposing.

>
> >If we need to document software (e.g. the "examples") produced
> >by a "Commons" component, it is preferable that we use and refer
> >to "Log4j2".
> >[The logger API is another matter since it does not impact how the
> >software is used (from a user's POV).]
> -- I shall make the changes.

Thanks.

>
> > [...]
> >Doing manually will be too tedious.
> >If you are reasonably confident that you've ported everything
> >valuable, I guess that we'll rely on coverage tools...
> >The current log statements could also be construed as (totally)
> >unnecessary, as they merely document statements which I would
> >consider part of an "application", not the GA "library".
> >I believe that we do not agree yet on where to draw the line (my
> >proposal in MATH-1618 is related to that difference of opinion).
> >
> -- Probably I am missing something here. These log statements are more
> useful for library users. If we remove all log statements how users would
> be able to debug any issue.
> What if there are issues in any part of library code or may be anything
> fails due to some application related customization.
> It may not be necessarily a library bug.

I'm not clear about what is the intended purpose of logging.
But I'd rather defer this discussion, and first focus on the actual GA
functionality being implemented.

>
> >> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to
> >> >> >be worth being part of the library.
> >> >> -- I would love to keep this simple implementation of convergence
> >> >> listener. This can provide a very basic overview on the nature of
> >> >> convergence.
> >> >
> >> >As a user, you'll be able to provide the feature to your application
> >> >with less than 10 lines of code (by registering a "listener").
> >> >Everyone else might want a slightly different output (contents and/or
> >> >format) than what this class generates.
> >> -- Users would always have the freedom to register their own listener.
> >> PopulationStatisticsLogger provides only a very basic option to track the
> >> population statistics during the convergence process.
> >> This is never a very generic solution to meet the needs of all users.
> >
> >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs
> >(as a user), but as developers we should only include functionality that
> >is broadly useful.  I think I provided arguments that it is not the case of
> >that class.
> -- So do you want this class to be removed?

Short answer would be "yes".  But the rationale is that we should
defer all functionality that just seems "nice-to-have" until the core
API is stabilized.

>
> >Great!
> >
> >> >> [...]
> >
> >Maybe I'm still missing something, but IMHO the distinction between
> >adaptive vs not should not impact code at that level.
> -- Actually I wanted to keep a provision where users can mix both
> approaches. Maybe keep crossover probability as constant but mutation as
> adaptive.

AFAICT, this would follow from my proposal (when "AdaptiveMutation"
and "AdaptiveCrossover" operators are implemented).

>
> [...]
> >> >(3)
> >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils
> >> >
> >> >It seems to be a "mixed-bag" kind of class (that is being frowned
> >> >upon nowadays).
> >> >Its comment refers to "random" but some methods are not using
> >> >any randomization.  Most methods are only used in unit tests.
> >> >
> >> -- Actually most methods are taken from the legacy GA application. In the
> >> legacy library representation there was no separate concept for the
> decoder
> >> and all representation utility methods were kept in the corresponding
> >> Chromosome classes. ChromosomeRepresentationUtils is created as a
> >> placeholder for those utility methods.
> >
> >I'm not sure I follow what you mean.
> >If these methods are only needed by the (legacy?) examples, they
> >should be moved to the "ga-examples" module (where they can be
> >removed later on without regards to BC).
> -- These were adopted from the legacy model and should be useful in this
> new model too.

Anything not _surely_ useful (i.e. actually used by the new library)
should be left out.  If necessary, code can be re-added later.

>
> > > > [...]
> >
> >OK.  [Is there a new PR?]
> -- I shall create one.

We still need to agree on my proposal...
Would you try to follow up on that basis, i.e. add a minimal set of GA
operators, in order to show that some of the "examples" can be run?

> >
> >> >(5)
> >> >Why does a "Chromosome" need an "identifier"?
> >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl"
> >> >that is an internal class, where it seems that the chromosome itself
> >> >(rather than its "id") could serve as the map's key.
> >> -- How can we generate a map's key from the chromosome itself? Please
> >> suggest.
> >
> >A class to be used as a key only needs to implement "equals" and
> >"hashCode".
> -- The current chromosome class implements Comparable interface which uses
> chromosome fitness for comparison. Use of both Comparable and equals()
> might introduce inconsistencies.

An example?

> >
> >> >
> >> >(6)
> >> >o.a.c.m.ga.chromsome.AbstractChromosome
> >> >
> >> >Field "fitness" is not "final", yet it could be: a "FitnessFunction"
> >> >object (used in "evaluate() to compute that field) is passed to the
> >> >constructor.  Is there a reason for the "lazy" evaluation?
> >> >Dropping it would make the instance immutable (and "evaluate()"
> >> >should be renamed to "getFitness()").
> >> >
> >> >Why should the "FitnessFunction" be stored in every chromosome?
> >> >
> >> -- I have modified the fitness as final and initialized the same in the
> >> constructor.
> >
> >Better, but did you check my proposal in MATH-1618, where
> >Chromosome and fitness are decoupled, and their relationship
> >is held within a "Population" instance?
> --Mentioned earlier.

I still don't know whether you agree that my proposal makes it
simpler to express a GA.

> >
> >> [...]
> >
> >>
> >> >(8)
> >> >@SuppressWarnings("unchecked")
> >> >
> >> >By default, I'm a bit suspicious about having to resort to these
> >> annotations,
> >> >especially for the kind of algorithms we are trying to implement.
> >> >What do you think of the alternative approach outlined in the ZIP file
> >> >attached in MATH-1618:
> >> >    https://issues.apache.org/jira/browse/MATH-1618
> >> >?
> >> -- This annotation is required because we have kept an option to use
> >> different types of genotypes including primitive.
> >> Because of that our base interfaces only declares phenotype not genotype.
> >> This introduced a kind of hierarchy in all operators and chromosome
> classes
> >> which required us to use the mentioned annotation.
> >
> >I may again be missing something.
> >Could you please explain the case that makes these annotations
> >necessary.
> -- This has been only used to avoid the warning in the place of typecasting.
> However, I can work to minimize this following your new model.

"Minimize"?

> >
> >> >
> >> >(9)
> >> >Naming of factory methods should be harmonized to match the convention
> >> >adopted in components like [RNG] and [Numbers].
> >> >E.g. instead of "newChromosome(...)", please use "of(...)" or
> "from(...)"
> >> >for "value object", and "create(...)" otherwise.
> >> >
> >> -- I have renamed the same for Chromosome classes.
> >> What about the nextGeneration() method of ListPopulation class. Renaming
> >> this to create() or from() won't convey the purpose of it.
> >
> >I agree, and that's why the new "Population" class (in MATH-1618) does
> >not provide a factory method (see also the "GeneticAlgorithmFactory"
> >class).
> -- We can avoid the same in the current model if we agree to use a default
> implementation of population and remove the Population interface following
> your new model.

So, do we adopt that "new model"?
Or do you still have objections?

> >
> >> >(10)
> >> >o.a.c.m.ga.chromosome.AbstractListChromosome
> >> >
> >> >Constructor is called with an argument that is a previously instantiated
> >> >"representation".  If the latter is mutable, the caller will be able to
> >> modify
> >> >the underlying data structure of the newly created chromosome.  [The
> >> >doc assumes immutability of the representation but this cannot be
> >> >enforced, and mixed ownership can entail subtle bugs.]
> >> -- I think this applies to both representation as well as generic
> parameter
> >> type T. But I don't see any other option but to rely on the user.
> >
> >The Javadoc (at line 84) is misleading in its mention of "immutable".
> >
> >> If you have any suggestions kindly share.
> >
> >I may not understand all the implications, but I'd suggest that the
> >"representation" be instantiated within the control of the library (e.g.
> >through a "builder"/"factory").
> -- Currently we have the ChromosomeRepresentationUtils for the same. Its
> methods are designed to generate the representations.

My suggestion is that this design can be improved (a.o. according to my
above suggestion).

> >
> >> >
> >> >(11)
> >> >Do we agree that, in a GA, the most time-consuming task is the fitness
> >> >computation?  Hence IMO, it should be the focus of the multithreading
> >> >tools (i.e. "ExecutorService"), probably keeping the other parts (namely
> >> >the genetic operators) within a simple sequential loop (as in class
> >> >"GeneticAlgorithmFactory" in MATH-1618).
> >> -- Current implementation uses separate threads for applying crossover
> and
> >> mutation operators for each pair of selected chromosomes.
> >> I think this ensures better utilization of multi-core processors compared
> >> to use of multi-threading only for the fitness calculation.
> >
> >I have the opposite intuition: Parallel application of the genetic
> >operators would only provide marginal gains wrt the fitness
> >computation.
> >In any case, I think that it will be fairly easy to modify my proposed
> >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks
> >show that a substantial gain could indeed be achieved).
> >
> >> -- Some codes are checked in. But there is a conflict in the pull
> request.
> >> So I shall create a new one and delete the old branch itself.
> >
> >IMHO, we could make more substantial progress if you could
> >first point to issues with my proposal in MATH-1618.
> --Mentioned earlier.

Well, I don't know where we stand...

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] Review of "genetic algorithm" module

Reply via email to