Hi All

         Please see my comments below.

>As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
>sort of workaround for a multi-thread application that does not want
>to bother managing per-thread RNG instance(s).
-- I am not clear on this. ThreadLocalRandomSource maintains
an EnumMap<RandomSource, ThreadLocal<UniformRandomProvider>>. What is meant
by it "does not want to bother managing per-thread RNG instance(s)" Could
you please elaborate more on this. If this is an issue in RNG why don't we
think of fixing the same or providing a different internal implementation.

>The library should not make that decision for the application since we
>can care for both usages: Every piece of the GA that needs a RNG can
>provide factory methods that either take a "RandomSource" argument
>or create a default one.
-- Library can always use a default option or provide an option for
customization at a global level but it need not be at the operator
level(IMHO). I don't see much use of it.

>
> >> >2. Less/no flexibility (no user's choice of random source).
> >> -- Agreed.
> -- Do we really need this much flexibility here?

>My main concern is that IMO the RNG is a prominent part of a GA
>and it is not a good design to use "ThreadLocalRandomSource".
-- RNG is definitely a prominent part. However, if we have a sharing issue
with ThreadLocalRandomSource we need to think of it's alternate
implementation.
>How many is "too many instances"?
>The memory used by an operator is tiny compared to a chromosome,
>even less to a population of chromosome, or two populations of them
>(parents and offsprings).
--My concern is we are trying to provide a fix for a performance problem in
another library and that is going to consume additional memory.

>     So I think we have a design tradeoff here performance vs memory
> consumption. I am more worried about memory as that might restrict use of
> this library beyond a certain number of dimensions in some areas.

>I'm referring to separate copies for each thread.
>How many threads/virtual CPUs are common nowadays?
>> However,
>> creating deep copy would only be possible when we strictly restrict
>> extension of operators which I want to avoid.

>How to avoid deep copies in a multi-thread library?
>Through synchronization?
-- The operator interfaces are designed like a functional interface.
Accordingly, the current implementation of all operators are read only. The
implementation does not maintain any mutable properties during computations
too. So they are perfectly suitable for multi-threaded operation. If you
see any deviation to it please notify me.

>
> >> So even if we provide
> >> the customization at the operator level we cannot avoid sharing.
>
> >We can, and we should.
> >What we probably can't avoid sharing is the instance that represents the
> >population of chromosomes.
> *--* In a multi-threaded optimization the chromosome instances are shared
> in case the same chromosome is chosen for crossover by the selection
> process. I missed this point earlier.
> ...

Chromosomes can be shared (if they are read-only).
--They are read-only.

>
> >> >  Mine is against using "ThreadLocalRandomSource"...
> >> -- What is the wayout other than that. Please suggest.
>
> >I think I did.
> *--* The factory based approach would be useful only when we can have
> separate copies of operators for each set of operations.

If we don't have separate copies in each thread, then the operator
will not be multithreaded...
-- If operators do not contain any mutable property then they are perfectly
usable in a multi-threaded environment.

> *--* I think we should not block the extension.

>This would be going backwards to many things that have been done
>to improve the robustness and reduce the bug counts of the Commons
>Math codes.
-- GA is different from other math functions. We may not impose the same
principle on everything.

> >Initially we discussed about having a light-weight library, for easier
> usage
> >than alternative existing framework(s).
> *--* We can always think of making the framework lightweight but it should
> not cost extensibility.

>There is no cost: We'll gladly merge every worthy extension into
>the Commons component.
-- I think we have a disconnect here. If the framework is not extensible
how anyone would be able to use it in any new domain. Do you mean first the
framework should be changed for any new domain and users should only use it
out of box.

>
> >> E.g. any developer should be able to extend the
> >> IntegralChromosome class and define a child class which explicitly
> >> specifies the range of integers to be used.
>
> >It does not look like this would need an extension, only configuration
> >of the range.
> *-- *I agree. But the question is should we block the extension.

>Please find a valid use case. ;-)
-- Recently I did an implementation of scheduling with commons-math 3.6. I
have implemented the chromosome representing schedule by extending
AbstractListChromosome. The mutation was also customized according to the
requirement. However, I was able to use the existing OnePointCrossover
operator. Do you think this kind of implementation would be possible if the
framework does not support extensibility?

>
> >> I have initially implemented
> >> the Binary chromosome and the corresponding binary mutation following
the
> >> same pattern. However, restricting extension of concrete classes by
> private
> >> constructor does not prevent users from extending the abstract parent
> >> classes.
>
> >We should aim at coding the GA logic through (Java) interfaces, and not
> >expose the "abstract" classes.
> *-- *One of the primary reasons for me to contribute in Apache' GA library
> is it's simplicity and extensibility.

>"Extensibility" does not necessarily imply "inheritance"-based.
-- Can you provide a solution to the above problem without an extensibility
feature?

>In fact, we do want to *avoid* in order to more easily and more robustly
>provide other advantages such as multi-threading.
-- IMHO immutable operator design is the best choice for supporting
multi-threading. It is much easier to implement even for user extension.
Why don't we think of fixing the ThreadLocalRandomSource.

>> I would like to have a framework
>> which should be always extensible for any problem domain with minor
>> changes.

>Any problem domain should indeed be amenable to be solved
>by the library; I don't see how that should imply a design based
>on inheritance.
-- Do you have any alter design in mind. Kindly share the same.

>> The primary reason behind this is that application domains of GA
>> are too diverse. It is not possible to implement everything in a library.
>> We don't know all possible domain areas too. If we remove the
extensibility
>> from the framework it would be useless in lots of areas.

>When that occurs, people are welcome to contribute back if
>something they need is missing.
-- I think we have a disconnect here too. If the framework is not
extensible how users can use this in their problem domain. If this is not
extensible then it would never be used. How can we get back the
contribution?

>Your argument of "too much diversity" can be reversed, in that
>it is unlikely that one library would attract everyone that needs a
>genetic algorithm.
-- Even if it cannot attract everyone with out of box features it should be
extensible for those.

>Better make a design that can handle a fraction of use cases,
>and grow as needed.
--There are already libraries which can solve most common use cases.
Non-extensible nature would block the growth to a considerable extent.

>> >Extending the functionality, if necessary, should be contributed back
here
>> *-- *Sometimes the GA operators are very much specific to the domain and
>> it's hard to generalise. In those scenarios contributing back to the
>> library might not be possible.

>In such a case, how likely will it also be that whatever general
>framework this library has put in place, will also not be amenable
>to that domain's specifics?
-- Could you please frame this concern w.r.t. the scheduling example
provided above.

>There is always a scope from which design decisions must be taken.
>If "multi-threading" is in the scope, then the design must avoid
>inheritance (in public classes) in order to much more easily
>ensure the correctness of applications.
-- Immutable design can also take care of multi-threading.

>> However, if a library cannot be extended for
>> a new domain by users it becomes underutilised over time if not useless.

>Sure but that is a hypothetical for the long-term.
>However, if the library is buggy or slow, it will not be used at all.
-- Is there any benchmark for speed/performance? GA is always infamous for
resource consumption rather than time.


Thanks & Regards
--Avijit Basak

On Wed, 22 Dec 2021 at 20:32, Gilles Sadowski <gillese...@gmail.com> wrote:

> Hello.
>
> Le mer. 22 déc. 2021 à 14:25, Avijit Basak <avijit.ba...@gmail.com> a
> écrit :
> >
> > Hi All
> >
> >         Please see my comments below.
> >
> > >> >Several problems with this approach (raised in previous messages
> IIRC):
> > >> >1. Potential performance loss in sharing the same RNG instance.
> > >> -- As per my understanding ThreadLocalRandomSource creates separate
> > >> instances of UniformRandomProvider for each thread. So I am not sure
> how
> > a
> > >> UniformRandomProvider instance is being shared. Please correct me if
> I am
> > >> wrong.
> >
> > >Within a given thread there will be *one* RNG instance; that's what I
> meant
> > >by "shared".
> > >Of course you are right that that instance is not shared by multiple
> > threads
> > >(which would be a bug).
> > >The performance loss is because it will be necessary to call
> > >  ThreadLocalRandomSource.current(RandomSource source)
> > >for each access to the RNG (since it would be a bug to store the
> returned
> > >value in e.g. an operator instance that would be shared among threads
> (as
> > >you suggest below).
> >
> > -- I tried to do a small test on it and here are the results. Output
> times
> > are in milliseconds. According to my understanding the performance loss
> is
> > mostly during creation of per thread instance of UniformRandomProvider.
> > --*CUT*--
> >     @Test
> >     void test() {
> >         int limit = 1;
> >         long start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 1000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 10000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 100000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 1000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 10000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 100000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >
> >         limit = 1000000000;
> >         start = System.currentTimeMillis();
> >         for (int i = 0; i < limit; i++) {
> >             ThreadLocalRandomSource.current(RandomSource.JDK);
> >         }
> >         System.out.println(System.currentTimeMillis() - start);
> >     }
> > --*CUT*--
> > --*output*--
> > 363
> > 1
> > 2
> > 4
> > 6
> > 28
> > 244
> > 2423
> > --*output*--
>
> As I've already indicated, "ThreadLocalRandomSource" is, IMHO, a
> sort of workaround for a multi-thread application that does not want
> to bother managing per-thread RNG instance(s).
> The library should not make that decision for the application since we
> can care for both usages: Every piece of the GA that needs a RNG can
> provide factory methods that either take a "RandomSource" argument
> or create a default one.
>
> Note that your above custom benchmark is likely to mean nothing
> (please see e.g. "Commons RNG" on how to create JMH based
> benchmarks).
>
> >
> > >> >2. Less/no flexibility (no user's choice of random source).
> > >> -- Agreed.
> > -- Do we really need this much flexibility here?
>
> My main concern is that IMO the RNG is a prominent part of a GA
> and it is not a good design to use "ThreadLocalRandomSource".
>
> > >> >3. Error-prone (user can access/reuse the "UniformRandomProvider"
> > >> instances).
> > >>
> > >> >Again: "ThreadLocalRandomSource" is an ad-hoc workaround for correct
> but
> > >> >"light" usage of random number generation in a multi-threaded
> > application;
> > >> GAs
> > >> >make "heavy" use of RNG, thus it is does not seem outlandish that all
> > the
> > >> RNG
> > >> >"clients" (e.g. every "operator") creates their own instances.
> > >
> > >
> > >> >IMHO, a more important discussion would be about the expectations in
> a
> > >> >multithreaded context: E.g. should an operator be shareable by
> different
> > >> >threads?  And if not, how does the API help application developers to
> > avoid
> > >> >such pitfalls?
> > >> -- Once we implement multi-threading in GA, same crossover and
> mutation
> > >> operators will be re-used across multiple threads.
> >
> > >I would be wary to go on that path; better consider making (deep)
> copies.
> > >We can have multiple instances of an operator, all being configured in
> the
> > >same way but being different instances with no risk of a multithreading
> > bug.
> >
> > -- I don't think this would be a good design choice just to support
> > customization of RNG functionality. This will lead to too many instances
> of
> > the same operators resulting in lots of unnecessary memory consumption. I
> > think we might face memory issues for higher dimensional problems. As
> > population size requirement also increases with increase of dimension
> this
> > might lead to a major issue and need a thought.
>
> How many is "too many instances"?
> The memory used by an operator is tiny compared to a chromosome,
> even less to a population of chromosome, or two populations of them
> (parents and offsprings).
>
> >     So I think we have a design tradeoff here performance vs memory
> > consumption. I am more worried about memory as that might restrict use of
> > this library beyond a certain number of dimensions in some areas.
>
> I'm referring to separate copies for each thread.
> How many threads/virtual CPUs are common nowadays?
>
> > However,
> > creating deep copy would only be possible when we strictly restrict
> > extension of operators which I want to avoid.
>
> How to avoid deep copies in a multi-thread library?
> Through synchronization?
>
> >
> > >> So even if we provide
> > >> the customization at the operator level we cannot avoid sharing.
> >
> > >We can, and we should.
> > >What we probably can't avoid sharing is the instance that represents the
> > >population of chromosomes.
> > *--* In a multi-threaded optimization the chromosome instances are shared
> > in case the same chromosome is chosen for crossover by the selection
> > process. I missed this point earlier.
> > ...
>
> Chromosomes can be shared (if they are read-only).
>
> >
> > >> >  Mine is against using "ThreadLocalRandomSource"...
> > >> -- What is the wayout other than that. Please suggest.
> >
> > >I think I did.
> > *--* The factory based approach would be useful only when we can have
> > separate copies of operators for each set of operations.
>
> If we don't have separate copies in each thread, then the operator
> will not be multithreaded...
>
> > >Maybe it's time to create a dedicated branch for the GA functionality
> > >so that we can try out the different approaches.
> >
> >
> > >
> > > >> I think first we need to decide on whether we really need this
> > > >> customization and if yes then why. Then we can decide on alternate
> > > >> implementation options.
> > > >
> > > >> >As per the recent updates of the math-related code bases, the
> > > >> >public API should provide factory methods (constructors should
> > > >> >be private).
> > > >> -- private constructors will make public API classes non-extensible.
> > This
> > > >> will severely restrict the extensibility of this framework which I
> want
> > > to
> > > >> avoid. I am not sure why we need to remove public constructors. It
> > would
> > > be
> > > >> helpful if you could refer me to any relevant discussion thread.
> > >
> > > >  Allowing extensibility is a huge burden on library maintainers.  The
> > > >  library must have been designed to support it; hence, you should
> > > >  first describe what kind(s) of extensions (with usage examples) you
> > > >  have in mind.
> > > --The library should be extensible to support customization. Users
> should
> > > be able to customise or provide their own implementation of genetic
> > > operators for crossover and mutation. The chromosome classes should
> also
> > be
> > > open for extension.
> >
> > >I don't get why we should support extensions outside this library.
> > *--* I think we should not block the extension.
>
> This would be going backwards to many things that have been done
> to improve the robustness and reduce the bug counts of the Commons
> Math codes.
>
> >
> > >Initially we discussed about having a light-weight library, for easier
> > usage
> > >than alternative existing framework(s).
> > *--* We can always think of making the framework lightweight but it
> should
> > not cost extensibility.
>
> There is no cost: We'll gladly merge every worthy extension into
> the Commons component.
>
> >
> > >> E.g. any developer should be able to extend the
> > >> IntegralChromosome class and define a child class which explicitly
> > >> specifies the range of integers to be used.
> >
> > >It does not look like this would need an extension, only configuration
> > >of the range.
> > *-- *I agree. But the question is should we block the extension.
>
> Please find a valid use case. ;-)
>
> >
> > >> I have initially implemented
> > >> the Binary chromosome and the corresponding binary mutation following
> the
> > >> same pattern. However, restricting extension of concrete classes by
> > private
> > >> constructor does not prevent users from extending the abstract parent
> > >> classes.
> >
> > >We should aim at coding the GA logic through (Java) interfaces, and not
> > >expose the "abstract" classes.
> > *-- *One of the primary reasons for me to contribute in Apache' GA
> library
> > is it's simplicity and extensibility.
>
> "Extensibility" does not necessarily imply "inheritance"-based.
> In fact, we do want to *avoid* in order to more easily and more robustly
> provide other advantages such as multi-threading.
>
> > I would like to have a framework
> > which should be always extensible for any problem domain with minor
> > changes.
>
> Any problem domain should indeed be amenable to be solved
> by the library; I don't see how that should imply a design based
> on inheritance.
>
> > The primary reason behind this is that application domains of GA
> > are too diverse. It is not possible to implement everything in a library.
> > We don't know all possible domain areas too. If we remove the
> extensibility
> > from the framework it would be useless in lots of areas.
>
> When that occurs, people are welcome to contribute back if
> something they need is missing.
> Your argument of "too much diversity" can be reversed, in that
> it is unlikely that one library would attract everyone that needs a
> genetic algorithm.
> Better make a design that can handle a fraction of use cases,
> and grow as needed.
>
> >
> > >Extending the functionality, if necessary, should be contributed back
> here
> > *-- *Sometimes the GA operators are very much specific to the domain and
> > it's hard to generalise. In those scenarios contributing back to the
> > library might not be possible.
>
> In such a case, how likely will it also be that whatever general
> framework this library has put in place, will also not be amenable
> to that domain's specifics?
> There is always a scope from which design decisions must be taken.
>
> If "multi-threading" is in the scope, then the design must avoid
> inheritance (in public classes) in order to much more easily
> ensure the correctness of applications.
>
> > However, if a library cannot be extended for
> > a new domain by users it becomes underutilised over time if not useless.
>
> Sure but that is a hypothetical for the long-term.
> However, if the library is buggy or slow, it will not be used at all.
>
> Regards,
> Gillles
>
> >>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak

Reply via email to