On 11/04/2019 13:22, Gilles Sadowski wrote:
[...]
Not adding a dedicated method would mean everyone has to do this:

JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) 
RandomSource.create(…)

But adding a mirror methods:

JumpableUniformRandomProvider RandomSource::createJumpable(…)
LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)

Does seem to add clutter to RandomSource. If we leave it out for now it can be 
added in the future.
+1 (for leaving out).

Is there scope for needing the Jumpable to be detected through the API? E.g. 
add:

boolean RandomSource::isJumpable(RandomSource)

We would just need to maintain an EnumSet for Jumpable and likewise for 
LongJumpable. Again more clutter to the RandomSource interface but perhaps less 
so than a mirror create method that would throw IllegalArgumentException for 
any RandomSource specified that is not Jumpable.
+1 (for leaving out for now).

[...]
I was hoping to avoid creating a duplicate class. But actually that
would be fine and easier for testing. The implementation is trivial anyway.
Why duplicate?
IIUC, shouldn't the existing "core" class define an additional constructor
that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
would use the default increment.
[Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
OK. That was where I should have gone to start with. I’ll do that.

I've just finished an initial implementation of the MSWS RNG that uses a
self-generated Weyl sequence. It works. So the idea for this can be
applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
both. Perhaps moving the code to create the Weyl sequence increment to
NumberFactory.
+1
OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll 
get around to doing this though. It seems less important than other tasks. It 
may even be redundant if the only reason is to increase the state space for the 
generator. Each generator would still only have a period of 2^64. You can just 
create a lot of them with a weak assurance they will not overlap and that the 
uniformity is statistically the same. Since there are of the order of 2^63 
variants we are not going to be able to test this and would have to rely on the 
theory behind it.

If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create 
either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs 
that can each be jumped 2^16 times with no overlap for the first 2^32 output 
numbers. That is a lot for a small parallel situation and does have the 
assurance of no overlap. Any parallel usage where longer sequences are expected 
to be used can use one of the XorShiRo generators.
I'm a bit lost here.  I thought that "SplitMix64" did not provide
"jump",

The state it is a linear sum. So can be jumped very easily.

y = mx + c

c is the current Weyl state, m the number of transitions and x the Weyl increment. So we can make SplitMix Jumpable. The maximum jump length is 2^64 which will wrap the state.

hence the
"Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)
I used the term "weak assurance" as opposed to "concrete assurance". The wording used by the JDK is "with very high probability, the set of values collectively generated by the two objects has the same statistical properties as if the same quantity of values were generated by a single object". It is a bit of semantics.


Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
cannot ensure that the added flexibility does not come with unsuspected
drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
choosing the (hard-coded) values that do produce good sequences.]

The likelihood of overlap is 1/[the number of different Weyl sequences]. This is around 2^-63. So it is small. However the base implementation for SplitMix uses a specific Weyl sequence increment based on the golden ratio. I do not know the original of this. For example is it better that any other irrational number converted to an integer, e.g. pi, Euler's number (e), sqrt(2)? The irrational number comes from the origin of the Weyl sequence which is based on irrationals. But there may be no reason that the golden ratio was chosen rather than any other source irrational.


On the subject of splitting verses jumping:

Jumping seems to be a case where the limit of the multi-threaded scenario is known beforehand, e.g. a simulation with 1024 parallel tasks, each task with a known limit on the count of random numbers required. So if your jump length is above the number of random numbers required for each task you can jump to get a single generator for each task and ensure no overlap.

Split is used when the multi-threaded scenario is not known. This is a case for SplittableRandom which is used in parallel algorithms which use forking. The point at which a fork will occur and the task size of the fork are not known beforehand. The generator is able to create new generators for each fork that will very unlikely overlap with any other generator currently in use. For SplittableRandom the probability of the same generator is about 2^-63. The probability of overlap is reduced further as the start point of the sequence is variable too and the length of the sequence that is consumed is not known. So if the number of forks is much less than 2^63 (which it should be) then split will suffice in this situation.

I have an algorithm that uses a set number of tasks to do random splitting of a dataset. The dataset is shuffled before the task so even if the same generator was used the output would work. Currently I create a new generator to allow parallel computation but I could split a master one or use a jumpable. Splitting would be faster than jumping for this scenario and perfectly acceptable.

This leans me back towards adding a Splittable interface. The conditions for a split would be that the split creates a new instance based on the current state of the generator and updates the state. This allows split to be called repeatedly on the same generator to get new distinct instances that will not coincide with very high probability. It would be akin to non-synchronized self-construction. This would then be faster than using RandomSource.create because that uses synchronized seed generation. This opens up more generators for use in recursive forking algorithms that may not care if the two forks have the same generator but require it to be created fast and thread local. So perhaps do not support it for JDK (because it is a bad generator) and rule out those with long initialisation routines TwoCmres, MT, etc. and perhaps also ruling out those with large states (e.g. Well44497b).


On the subject of jumping we have three options:

1. jump advances the state and returns a copy with the advanced state.

2. jump advances the state and returns a copy with the old state.

3. jump advances the state. Add a copy method.

I think we ruled out number 3 as having a copy command is deemed to be prone to misuse if using the same copy in multiple places.

Option 1 I think has an issue in that if you do 1 jump and believe you have two non-overlapping generators you will be wrong. Take this case:

JumpableUniformRandomProvider rng1 = ...;

UniformRandomProvider rng2 = rng1.jump();

In option 1 rng1 and rng2 are actually the same.

In option 2 rng1 and rng2 are in different states where rng1 is far ahead of rng2. So the returned state (rng2) can be documented to not overlap the existing generator (rng1) until the jump length has consumed by calls to generate numbers.


Note that neither option provides a loophole for a poor man's copy. Option 1 would create a copy but it would be of a state far ahead because it is after a jump. Option 2 creates a copy of the current state but then the current state is changed by the jump so you don't have a copy.

To get a copy of the current state would require saving the state using the RestoreableUniformRandomProvider, doing either of the jump options to get an instance of the same class, then restoring the state to both.


To rule out doing this via the RestoreableUniformRandomProvider functionality would require a method in RandomSource to restrict the implementation to just the defined interface like the unrestorable method:

UniformRandomProvider unrestorable(UniformRandomProvider delegate)

Becomes:

UniformRandomProvider restrict(UniformRandomProvider delegate)
RestorableUniformRandomProvider restrict(RestorableUniformRandomProvider delegate) JumpableUniformRandomProvider restrict(JumpableUniformRandomProvider delegate) LongJumpableUniformRandomProvider restrict(LongJumpableUniformRandomProvider delegate)

The correct method would have to be selected by casting:

LongJumpableUniformRandomProvider delegate = ...
JumpableUniformRandomProvider restrict((JumpableUniformRandomProvider) delegate)

Or do not use overloaded methods and (with A, B, C, D appropriately named):

UniformRandomProvider restrictA(UniformRandomProvider delegate)
RestorableUniformRandomProvider restrictB(RestorableUniformRandomProvider delegate) JumpableUniformRandomProvider restrictC(JumpableUniformRandomProvider delegate) LongJumpableUniformRandomProvider restrictD(LongJumpableUniformRandomProvider delegate)

A move to Java 8 would allow the restrict method for the interface to be added as a static method to the interface itself which would be neater.


Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to