Can I get a review of a PR that changes the ProviderBuilder [1]?
The aim is to move the creation of the seed and the RNG into the
RandomSourceInternal.
This allows for customisation of the seed creation on a per-RNG basis.
The reason is that some new generators in the library for v1.3 require
the seed is non-zero. For example this is a realistic possibility when
the seed is an array of ints of size 2.
The code change has required updating the routines using the
SeedConverter interface and adding a Seed2ArrayConverter interface:
public interface Seed2ArrayConverter<IN, OUT> extends SeedConverter<IN, OUT> {
/**
* Converts seed from input type to output type. The output type is
expected to be an array.
*
* @param seed Original seed value.
* @param outputSize Output size.
* @return the converted seed value.
*/
OUT convert(IN seed, int outputSize);
}
I've moved conversion to a new enum class for seed creation and
conversion. Previously the ProviderBuilder used maps to store all the
conversions. These are now explicitly written as conversion methods in
the enum. The amount of code is the same and the conversions are the
same. However use of the enum for conversion has removed the need to
support the Seed2ArrayConverter interface in all the converters. This
has deprecated some methods and classes. I've not yet marked them as so
in the PR.
i.e.
private static final Map<Class<?>, SeedConverter<long[], ?>> CONV_LONG_ARRAY =
new ConcurrentHashMap<Class<?>, SeedConverter<long[], ?>>();
does not have to become:
private static final Map<Class<?>, Seed2ArrayConverter<long[], ?>>
CONV_LONG_ARRAY =
new ConcurrentHashMap<Class<?>, Seed2ArrayConverter<long[], ?>>();
with the corresponding conversion as:
nativeSeed = CONV_LONG_ARRAY.get(source.getSeed()).convert((long[]) seed,
source.getSeedSize());
Currently conversions of arrays to arrays ignore the seed size. Creation
of a new seed is limited to a maximum of 128. This matches the previous
functionality. It could be changed to create the full length seed. The
impact would be more work done within synchronized blocks in the
SeedFactory. I would expect the generation to be slower but the seed
quality will be higher.
Only creation of arrays from int/long seeds will use the seed size. This
does not operate within a synchronized block.
Note that RNG constructor is obtained using reflection. Previously was
done on each invocation. However now that the method is within the
RandomSourceInternal enum caching the constructor is a natural
modification since it is always the same. I have not done this in the
constructor for the RandomSourceInternal enum as:
- it adds overhead when building all the enums (e.g.
RandomSourceInternal.values())
- it may throw lots of different types of exception. Rather than
catching them all in the constructor for the enum they can now be thrown
during creation of the RNG instance so the user gets the appropriate
stack trace.
The change to create an array using the correct size has performance
implications (see [2]):
- For small array sizes the creation is faster
- For large array sizes built using an int/long the creation is
marginally slower (but the seed should be better as it uses the SplitMix
algorithm rather than the self-seeding strategy of the BaseProvider [3])
Caching the constructor for use with reflection has improved performance.
[1] https://github.com/apache/commons-rng/pull/46
[2] https://issues.apache.org/jira/browse/RNG-75
[3] This could be tested by creating a new generator that implements the
self-seeding strategy as its output and running it through the stress
test applications.