On Sun, 2 May 2021 at 16:51, Avijit Basak <avijit.ba...@gmail.com> wrote:

> Hi
>
> >>        Note: You cannot easily just use java.util.BitSet as you wish to
> have
> access to the underlying long[] to store the chromosome to enable efficient
> crossover.
> --Thanks for pointing this. However, I have considered few constraints
> while doing the implementation.
>      1) I extended the existing class AbstractListChromosome, which
> requires a Generic type. This is the reason for using a list of Long.
> However, I can extend the Chromosome and use an array of primitive long.
> BitSet also uses a similar data structure.
>      2) One problem of BitSet is the use of MSB to retain bits. As a
> result, we won't be able to use the static utility methods of wrapper
> classes(Long) for conversion between primitive type and string. We will
> have to write custom code for conversion between string and integral types.
> This is the only reason I have used BLOCKSIZE as 63 instead of 64.
>

I did state you cannot use BitSet as there are requirements to access the
underlying long[] for certain operations such as crossover. Thus you have
to build a custom implementation that uses a long[] representation with the
operations you need. You can then store the bits using big or little endian
as you require. The BitSet is using LSB for bit 0 to MSB for bit 63 of each
word.

Writing custom code for toString() would be simple. You can use a 256 entry
look-up table and output 8 blocks per long:

String[] OUTPUT = { "00000000", "00000001", "00000010", "00000011", etc. };
long[] alleles = ...;
StringBuilder sb = new StringBuilder(alleles.length * 64);
for (long bits : alleles) {
    // The order of this depends on the endianness of the representation
    sb.append(OUTPUT[(int)(bits & 0xff)])
       .append(OUTPUT[(int)((bits >> 8) & 0xff)])
       .append(OUTPUT[(int)((bits >> 16) & 0xff)])
       // etc ...
}

There would be extra work for the final block of 64 if it is not complete
(i.e. less than 64 bits are used) to avoid extra zeros in the output.

Writing fromString input code could use Long.parseUnsignedLong(long, int)
with a radix of 2 if you have the correct endianness per block of 64. This
allows you to intake 64 characters at a time to create the long[].

I do not see it as a problem to write custom code based around long[] if
the result is a large gain in speed and memory efficiency for the
implementation.

Restricting functionality to the current CM AbstractListChromosome
or Chromosome is not necessary for a new package. This is the opportunity
to build new data structures that are appropriate for the intended use.


> >>// This is not actually required...
> // int bit = cross & 64; // i.e. cross % 64
> --Do you mean bit index is not required to calculate? How can we handle
> crossover indexes which are not multiple of 64.
>

Sorry for not being clear. You need to create the mask to determine where
in the 64-bit long to perform the crossover. What I meant was you do not
need to identify the bit with a modulus operator. This:

int cross = ...

int index = cross / 64;
int bit = cross % 64;
long mask = 0xffff_ffff_ffff_ffffL << bit;

Is the same as:

int index = cross >>> 6;
long mask = -1 << cross;

This is because the left shift operator only uses the int value from the
lowest 6 bits of the integer. These are all the same:

-1 << 1
-1 << (1 + 64)
-1 << (1 + 128)
-1 << (1 + 256)


> >> Do you think that allele sets other than binary would be useful to
> implement? [IIUC your document above, it seems not (?).]
> --The document only describes the data structure related to Binary
> genotype. We already have an implementation of RandomKey genotype in
> commons. We can think of adding other genotypes gradually.
>
>
> Thanks & Regards
> --Avijit Basak
>
>
>
> On Sat, 1 May 2021 at 22:18, Gilles Sadowski <gillese...@gmail.com> wrote:
>
> > Le ven. 30 avr. 2021 à 17:40, Avijit Basak <avijit.ba...@gmail.com> a
> > écrit :
> > >
> > > Hi
> > >
> > >          >>lot of spurious references to "Commons Numbers"
> > >              --I have only created the basic project structure. Changes
> > > need to be made. Can anyone from the existing commons team help in
> doing
> > > this.
> >
> > Wel, you should "search and replace":
> >   "Numbers" -> "Machine Learning"
> >   commons-numbers -> commons-machinelearning
> >
> > Other things (repository URL, JIRA project name and URL) require that
> > a component be created (vote is pending).
> > [As long as those files are not part of a PR, it is not urgent to fix
> > them.]
> >
> > >          >> For sure, populate it with the code extracted from CM's
> > > "genetics"
> > > package and proceed with the enhancements.
> > > At first, I'd suggest to refactor the layout of the package (i.e.
> create
> > > a "subpackage" for each component of a genetic algorithm).
> > >               -- I am working on it.
> >
> > Great!
> >
> > > Did not commit the code till now.
> >
> > OK.  When you do, please ask for review on the "dev" ML.
> >
> > >           >>  Then some examination of the data-structures is required
> (a
> > > binary chromosome is currently stored as a "List<Integer>").
> > >               -- I have recently done some work on this. Could you
> please
> > > check this article and share your thought.
> > >                   "*https://arxiv.org/abs/2103.04751
> > > <https://arxiv.org/abs/2103.04751>*"
> >
> > Alex already provided a thorough response.
> > It's a pity that JDK's BitSet is missing a few methods (e.g. "append")
> > for a readily usable implementation of a "binary chromosome".
> >
> > Do you think that allele sets other than binary would be useful to
> > implement? [IIUC your document above, it seems not (?).]
> >
> > >           Are we thinking to use Spark for our parallelism
> >
> > No, if the code is to reside in Commons.
> >
> > > or a simple
> > > multi-threading of Java.
> >
> > Yes, we'd depend only on JDK classes.
> >
> > > I would prefer to use java multi-threading and
> > > avoid any other framework.
> > >           In java we don't have any library which can be used for AI/ML
> > > programming with a very minimal learning curve. Can we think of
> > fulfilling
> > > this need?
> >
> > That would be nice. Don't hesitate to enlist fellow programmers. :-)
> >
> > Regards,
> > Gilles
> >
> > >           This will be helpful for many java developers to venture into
> > > AI/ML without learning a new language like Python.
> > >
> > >
> > >>> [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> --
> Avijit Basak
>

Reply via email to