Hello. Le mer. 18 mai 2022 à 16:34, Avijit Basak <avijit.ba...@gmail.com> a écrit : > > Hi All > > Please find my comments below. > > Comments related to new model: > > 1) Hierarchy of GeneticOperator: In the proposed model the genetic > operators are designed hierarchically implementing a common interface and > operators are accepted as a list. > This enables interchangeability of operators. Library users would be able > to use crossover and mutation operators interchangeably. > However, in genetic algorithms or other population based search algorithms > the operators are used for broadly two purposes, exploration and > exploitation. > In GA crossover is used for exploitation and mutation is used for > exploration. Keeping them in a common hierarchy and allowing > interchangeable operation violates that purpose.
I'm not sure that the semantics of "exploitation" and "exploration" should drive the API design. Saying it differently: We don't need to know how various operators will be used in order to implement them; hence IMO, there is no need to discriminate at the API level. > > 2) Chromosome Fitness: In the new design the chromosome fitness is > maintained as a hashmap where the key is the chromosome itself. > Are we going to retain the fitness value in chromosome too otherwise > implementation of Comparable won't be possible? Sorry, I don't follow. > Assuming the chromosome representation would be used to calculate hashcode, > it would be a very time consuming process depending on the length of > chromosome. Is this assumption correct? For what purpose do we need to compute a custom hash code? > E.g. in binary chromosomes we have provision to allow chromosome length > upto (2^31 - 1). That's a lot. ;-) Do you have use cases where such long representations can be handled? > Along with that the chromosome implements Comparable > interface. > Equality by Comparable interface would be decided by fitness Sure. > however > equality by equals() method would depend on representation. Do we need a custom "equals"? > As chromosomes with different representations may also have the same > fitness, this would produce a conflict. Please provide an example. > > 3) The model does not consider anything related to adaptive approaches. > Would it be a separate factory? What I've proposed is an alternative "skeleton" for the API. Of course, more classes will provide specific functionality. An adaptive operator "is-a" specialized genetic operator (perhaps the notion of "Adaptive" will be defined through an interface?). > > 4) Comparison of chromosomes: In the current model two chromosomes are > compared after decoding the genotype to phenotype using the internal > decoder. > How can we address this in the new model? As mentioned above: Do we really need to compare representations? > > 5) Chromosome String representation: Currently we use the toString() method > to print the chromosome's phenotype. In the new model we would need to > avoid this approach as decoders won't be available to chromosomes. This seems like a minor issue (or perhaps no issue at all?) unless I'm missing something. > > 6) However in the new model the generic type parameters and java.util > packages are used in a more organized way. We can adopt the same in the > existing model. I don't understand what you are proposing. > > >If we need to document software (e.g. the "examples") produced > >by a "Commons" component, it is preferable that we use and refer > >to "Log4j2". > >[The logger API is another matter since it does not impact how the > >software is used (from a user's POV).] > -- I shall make the changes. Thanks. > > > [...] > >Doing manually will be too tedious. > >If you are reasonably confident that you've ported everything > >valuable, I guess that we'll rely on coverage tools... > >The current log statements could also be construed as (totally) > >unnecessary, as they merely document statements which I would > >consider part of an "application", not the GA "library". > >I believe that we do not agree yet on where to draw the line (my > >proposal in MATH-1618 is related to that difference of opinion). > > > -- Probably I am missing something here. These log statements are more > useful for library users. If we remove all log statements how users would > be able to debug any issue. > What if there are issues in any part of library code or may be anything > fails due to some application related customization. > It may not be necessarily a library bug. I'm not clear about what is the intended purpose of logging. But I'd rather defer this discussion, and first focus on the actual GA functionality being implemented. > > >> >> >Likewise, the "PopulationStatisticsLogger" is not general enough to > >> >> >be worth being part of the library. > >> >> -- I would love to keep this simple implementation of convergence > >> >> listener. This can provide a very basic overview on the nature of > >> >> convergence. > >> > > >> >As a user, you'll be able to provide the feature to your application > >> >with less than 10 lines of code (by registering a "listener"). > >> >Everyone else might want a slightly different output (contents and/or > >> >format) than what this class generates. > >> -- Users would always have the freedom to register their own listener. > >> PopulationStatisticsLogger provides only a very basic option to track the > >> population statistics during the convergence process. > >> This is never a very generic solution to meet the needs of all users. > > > >AFAICT, the "PopulationStatisticsLogger" class satisfies *your* needs > >(as a user), but as developers we should only include functionality that > >is broadly useful. I think I provided arguments that it is not the case of > >that class. > -- So do you want this class to be removed? Short answer would be "yes". But the rationale is that we should defer all functionality that just seems "nice-to-have" until the core API is stabilized. > > >Great! > > > >> >> [...] > > > >Maybe I'm still missing something, but IMHO the distinction between > >adaptive vs not should not impact code at that level. > -- Actually I wanted to keep a provision where users can mix both > approaches. Maybe keep crossover probability as constant but mutation as > adaptive. AFAICT, this would follow from my proposal (when "AdaptiveMutation" and "AdaptiveCrossover" operators are implemented). > > [...] > >> >(3) > >> >o.a.c.m.ga.utils.ChromosomeRepresentationUtils > >> > > >> >It seems to be a "mixed-bag" kind of class (that is being frowned > >> >upon nowadays). > >> >Its comment refers to "random" but some methods are not using > >> >any randomization. Most methods are only used in unit tests. > >> > > >> -- Actually most methods are taken from the legacy GA application. In the > >> legacy library representation there was no separate concept for the > decoder > >> and all representation utility methods were kept in the corresponding > >> Chromosome classes. ChromosomeRepresentationUtils is created as a > >> placeholder for those utility methods. > > > >I'm not sure I follow what you mean. > >If these methods are only needed by the (legacy?) examples, they > >should be moved to the "ga-examples" module (where they can be > >removed later on without regards to BC). > -- These were adopted from the legacy model and should be useful in this > new model too. Anything not _surely_ useful (i.e. actually used by the new library) should be left out. If necessary, code can be re-added later. > > > > > [...] > > > >OK. [Is there a new PR?] > -- I shall create one. We still need to agree on my proposal... Would you try to follow up on that basis, i.e. add a minimal set of GA operators, in order to show that some of the "examples" can be run? > > > >> >(5) > >> >Why does a "Chromosome" need an "identifier"? > >> >Method "getId()" is only used in "PopulationStatisticalSummaryImpl" > >> >that is an internal class, where it seems that the chromosome itself > >> >(rather than its "id") could serve as the map's key. > >> -- How can we generate a map's key from the chromosome itself? Please > >> suggest. > > > >A class to be used as a key only needs to implement "equals" and > >"hashCode". > -- The current chromosome class implements Comparable interface which uses > chromosome fitness for comparison. Use of both Comparable and equals() > might introduce inconsistencies. An example? > > > >> > > >> >(6) > >> >o.a.c.m.ga.chromsome.AbstractChromosome > >> > > >> >Field "fitness" is not "final", yet it could be: a "FitnessFunction" > >> >object (used in "evaluate() to compute that field) is passed to the > >> >constructor. Is there a reason for the "lazy" evaluation? > >> >Dropping it would make the instance immutable (and "evaluate()" > >> >should be renamed to "getFitness()"). > >> > > >> >Why should the "FitnessFunction" be stored in every chromosome? > >> > > >> -- I have modified the fitness as final and initialized the same in the > >> constructor. > > > >Better, but did you check my proposal in MATH-1618, where > >Chromosome and fitness are decoupled, and their relationship > >is held within a "Population" instance? > --Mentioned earlier. I still don't know whether you agree that my proposal makes it simpler to express a GA. > > > >> [...] > > > >> > >> >(8) > >> >@SuppressWarnings("unchecked") > >> > > >> >By default, I'm a bit suspicious about having to resort to these > >> annotations, > >> >especially for the kind of algorithms we are trying to implement. > >> >What do you think of the alternative approach outlined in the ZIP file > >> >attached in MATH-1618: > >> > https://issues.apache.org/jira/browse/MATH-1618 > >> >? > >> -- This annotation is required because we have kept an option to use > >> different types of genotypes including primitive. > >> Because of that our base interfaces only declares phenotype not genotype. > >> This introduced a kind of hierarchy in all operators and chromosome > classes > >> which required us to use the mentioned annotation. > > > >I may again be missing something. > >Could you please explain the case that makes these annotations > >necessary. > -- This has been only used to avoid the warning in the place of typecasting. > However, I can work to minimize this following your new model. "Minimize"? > > > >> > > >> >(9) > >> >Naming of factory methods should be harmonized to match the convention > >> >adopted in components like [RNG] and [Numbers]. > >> >E.g. instead of "newChromosome(...)", please use "of(...)" or > "from(...)" > >> >for "value object", and "create(...)" otherwise. > >> > > >> -- I have renamed the same for Chromosome classes. > >> What about the nextGeneration() method of ListPopulation class. Renaming > >> this to create() or from() won't convey the purpose of it. > > > >I agree, and that's why the new "Population" class (in MATH-1618) does > >not provide a factory method (see also the "GeneticAlgorithmFactory" > >class). > -- We can avoid the same in the current model if we agree to use a default > implementation of population and remove the Population interface following > your new model. So, do we adopt that "new model"? Or do you still have objections? > > > >> >(10) > >> >o.a.c.m.ga.chromosome.AbstractListChromosome > >> > > >> >Constructor is called with an argument that is a previously instantiated > >> >"representation". If the latter is mutable, the caller will be able to > >> modify > >> >the underlying data structure of the newly created chromosome. [The > >> >doc assumes immutability of the representation but this cannot be > >> >enforced, and mixed ownership can entail subtle bugs.] > >> -- I think this applies to both representation as well as generic > parameter > >> type T. But I don't see any other option but to rely on the user. > > > >The Javadoc (at line 84) is misleading in its mention of "immutable". > > > >> If you have any suggestions kindly share. > > > >I may not understand all the implications, but I'd suggest that the > >"representation" be instantiated within the control of the library (e.g. > >through a "builder"/"factory"). > -- Currently we have the ChromosomeRepresentationUtils for the same. Its > methods are designed to generate the representations. My suggestion is that this design can be improved (a.o. according to my above suggestion). > > > >> > > >> >(11) > >> >Do we agree that, in a GA, the most time-consuming task is the fitness > >> >computation? Hence IMO, it should be the focus of the multithreading > >> >tools (i.e. "ExecutorService"), probably keeping the other parts (namely > >> >the genetic operators) within a simple sequential loop (as in class > >> >"GeneticAlgorithmFactory" in MATH-1618). > >> -- Current implementation uses separate threads for applying crossover > and > >> mutation operators for each pair of selected chromosomes. > >> I think this ensures better utilization of multi-core processors compared > >> to use of multi-threading only for the fitness calculation. > > > >I have the opposite intuition: Parallel application of the genetic > >operators would only provide marginal gains wrt the fitness > >computation. > >In any case, I think that it will be fairly easy to modify my proposed > >"OffspringGenerator" class to use an "ExecutorService" (if benchmarks > >show that a substantial gain could indeed be achieved). > > > >> -- Some codes are checked in. But there is a conflict in the pull > request. > >> So I shall create a new one and delete the old branch itself. > > > >IMHO, we could make more substantial progress if you could > >first point to issues with my proposal in MATH-1618. > --Mentioned earlier. Well, I don't know where we stand... Regards, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org