2022年2月18日(金) 19:46 Tim Düsterhus <t...@bastelstu.be>:

> Hi
>
> On 2/18/22 07:31, Go Kudo wrote:
> > I have been looking into output buffering, but don't know the right way
> to
> > do it. The buffering works fine if all RNG generation widths are static,
> > but if they are dynamic so complicated.
>
> I believe the primary issue here is that the engines are expected to
> return an uint64_t, instead of a buffer with raw bytes. This requires
> you to perform many conversions between the uint64 and the raw buffer:
>
> When calling Randomizer::getBytes() for a custom engine the following
> needs to happen:
>
> - The Engine returns a byte string.
> - This bytestring is then internally converted into an uint64_t.
> - Then calling Randomizer::getBytes() this uint64_t needs to be
> converted back to a bytestring.
>
> To avoid those conversations without sacrificing too much performance it
> might be possible to return a struct that contains a single 4 or 8-byte
> array:
>
>      struct four_bytes {
>          unsigned char val[4];
>      };
>
>      struct four_bytes r;
>      r.val[0] = (result >> 0) & 0xff;
>      r.val[1] = (result >> 8) & 0xff;
>      r.val[2] = (result >> 16) & 0xff;
>      r.val[3] = (result >> 24) & 0xff;
>
>      return r;
>
> .val can be treated as a bytestring, but it does not require dynamic
> allocation. By doing that the internal engines (e.g. Xoshiro) would be
> consistent with the userland engines.
>
> > It is possible to solve this problem by allowing generate() itself to
> > specify the size it wants, but this would significantly slow down
> > performance.
>
> I don't think it's a good idea to add a size parameter to generate().
>
> > I've looked at the sample code, but do you really need support for
> > Randomizer? Engine::generate() can output dynamic binaries up to 64 bits.
> > You can use Engine directly, instead of Randomizer::getBytes().
> >
> > What exactly is the situation where buffering by Randomizer is needed?
>
> *I* don't need anything. I'm just trying to think of use-cases and
> edge-cases. Basically: What would a user attempt to do and what would
> their expectations be?
>
> I'm not saying that this buffering *must* be implemented, but this is
> something we need to think about. Because changing the behavior later is
> pretty much impossible, as users might rely on a specific behavior for
> their seeded sequences. The behavior might also need to be part of the
> documentation.
>
> Basically what we need to think about is what guarantees we give. As an
> example:
>
> 1. Calling Engine::generate() with the same seed results in the same
> sequence (This guarantee we give, and it is useful).
> 2. Calling Randomizer::getInt() with the same seeded engine results in
> the same numbers for the same parameters (I think this also is useful).
> 3. Calling Randomizer::getBytes() with the same seeded engine results in
> the same byte sequence (This is something we are currently discussing).
> 4. Calling Randomizer::getBytes() simply concatenates the raw bytes
> retrieved by the Engine (This ties into (3)).
> 5. Calling Randomizer::shuffleArray() with the same seeded engine
> results in the same result for the same string (This one is more
> debatable, because then we must maintain the exact same shuffleArray()
> implementation forever).
>
> All these guarantees should be properly documented within the RFC. The
> RFC template (https://wiki.php.net/rfc/template) says:
>
>  >  Remember that the RFC contents should be easily reusable in the PHP
> Documentation.
>
> So by thinking about this now and putting it in the RFC, the
> explanations can easily be copied into the documentation if the RFC
> passes the vote.
>
> One should not need to look into the implementation to understand how
> the Engines and the Randomizer is supposed to work.
>
> > Also worried that buffering will cut off random numbers at arbitrary
> sizes.
> > It may cause bias in the generated results.
> >
>
> If there's bias in specific bits or bytes of the generated number then
> getBytes(32) will already be biased even without buffering, as the raw
> bytes are what's of interest here. It does not matter if they are at the
> 1st or 4th position (for a 32-bit engine).
>
> Best regards
> Tim Düsterhus
>

Hi

 I am sorry for the delay in replying.

Thank you for the clear explanation.
It is true that the RFC in its current form lacks explanation. I'll try to
fix this first.

Also, as I look into other languages' implementations, I see the need to
add some RNGs such as PCG. I will update the RFC to include these.

Here is a Rust example:
https://docs.rs/rand/latest/rand/

PCG:
https://www.pcg-random.org/index.html

Regards
Go Kudo

Reply via email to