[sage-devel] random seed framework (was: Re: problem with test framework)

Carl Witty Thu, 14 Feb 2008 20:44:44 -0800

On Feb 13, 10:21 pm, Robert Bradshaw <[EMAIL PROTECTED]>
wrote:
> On Feb 11, 2008, at 10:14 AM, Carl Witty wrote:
> > I'm still willing to work on the "randgen" class I described toward
> > the end of this thread:
> >http://groups.google.com/group/sage-devel/browse_thread/thread/
> > c2d86a2685018112/4b3136c4a784015a?#4b3136c4a784015a
>
> > Basically I'm just waiting for somebody to say "Yes, that looks like a
> > good design" before I start.
>
> I would really like to see a sane, centralized pseudo-random number
> framework. Requiring every function that uses (perhaps implicitly)
> random numbers to pass around optional randgen objects will, in my
> opinion, be both inefficient and cumbersome to program with. Rather,
> I think the best option is to have a global randgen object that holds
> states for the various frameworks we use (e.g. gmp, ntl, etc.) where
> algorithms can access it directly. Swapping out this generator for a
> new one should be handled via python contexts.


I like Robert's suggestion of using Python contexts.  So here's a
revised proposal:

I describe a module sage.misc.random, which holds a few global
functions (which will be imported into the command-line namespace) and
the class randgen.  The purpose of this module is to manage
pseudorandom
number generators and their seeds.

Note that the methods of randgen are intended to be used by library
authors (like the authors of ZZ.random_element() and
RR.random_element()), not directly by end-users; end-users will
probably only use the global functions.

A large part of the purpose of this module is to enable
reproducibility: proper use of the functions in this module should
allow results to be consistent from one run of Sage to the next.  To
the extent feasible (without modifying the underlying systems), we
will also try to make results consistent from one architecture to
another.

With preparation and care, it may sometimes also be possible to use
these functions to get results that are consistent when some parts of
your algorithm change.  We refer to this as isolation; the idea is
that you can allocate a new randgen object for use by some
subalgorithm.  Then this subalgorithm can allocate as many random
numbers as it wants.  When the subalgorithm is finished, the original
randgen object (which was unaffected by the subalgorithm) is restored.
Thus, changes to your subalgorithm (which might make it request more
random numbers, for instance) do not change the behavior of the outer
algorithm (as long as the return value of the subalgorithm is
unchanged, of course).

Isolation also works the other way: a subalgorithm which wants to give
consistent answers regardless of the current random number state, may
allocate a new randgen object (with either a constant seed, or a seed
which is a hash of its input).  Once the subalgorithm is complete, the
original randgen object is restored.

We do not attempt to provide any actual random number generators,
random algorithms, etc., in this module (or even any wrappers for
random number generators); we only wrap the seed and state handling of
underlying systems.

We can handle seed management for pseudorandom number generators with
three kinds of interfaces.

The first and nicest kind is generators where the current state is a
separate object, and pseudorandom number generation routines take a
reference to this object.  This is the nicest kind because we can
trivially provide perfect isolation.

The second kind is generators where the current state is a global
variable in the subsystem, but where we can read out the old state
before we replace it.  This still allows us to provide perfect
isolation, but doing so requires more discipline on the parts of our
callers.

The third kind is generators where the current state is a global
variable which can only be written, not read.  In these cases we
cannot (reasonably) provide perfect isolation.

The global functions are set_random_seed(), random_seed(), and
initial_seed().

set_random_seed() should only be called from the command line, never
from within library code.  When called with an integer parameter, it
creates a new randgen with the given seed, and sets that as the
current global randgen.  When called with no parameter, it picks a new
seed itself and prints it.  initial_seed() returns the initial random
number seed of the current global randgen.

  sage: set_random_seed(42)
  sage: initial_seed()
  42
  sage: set_random_seed()
  The new random number seed is: 314159265
  sage: initial_seed()
  314159265

random_seed() returns a new randgen.

  sage: random_seed(42)
  Random seed object with initial seed 42
  sage: rgen = random_seed(); rgen
  Random seed object with initial seed 2718281828

randgen objects are Python context managers, so the typical use case
for random_seed() is actually:

  sage: with random_seed(42): print ZZ.random_element()
  -2
  sage: with random_seed(42): print ZZ.random_element()
  -2
  sage: with random_seed(42): print ZZ.random_element()
  -2

This is how you provide isolation in library code.

randgen is a Cython class.  The main state it holds is a
gmp_randstate_t,
although it also has some other cached information.

randgen methods include:
  randstate_python()
    Returns an instance of random.Random.  The first time it is called
    on a given instance of randgen, a new random.Random is created and
    seeded from the gmp_randstate_t; this is saved, and subsequent
calls
    return the same random.Random instance.
  ... There will be similar methods for every subsystem of the first
kind
  (with separate random state objects) (if there are any more, other
than
  Python).

  set_seed_libc(force=False)
  set_seed_ntl(force=False)
  set_seed_pari(force=False)
  set_seed_magma(force=False)
  set_seed_mathematica(force=False)
  set_seed_...()
    Sets the seed of the specified random number generator, from a new
    random number from the gmp_randstate_t.

    Whenever library code is about to use a generator, it should call
    the corresponding method.

    For each subsystem, we remember (globally) which randgen was last
    used to set its seed.  If you try to seed the subsystem again from
    the same randgen, then we return immediately without setting the
seed
    (unless called with force=True, in which case the seed is set
    unconditionally; this is a performance vs. isolation tradeoff).

  new_randgen()
    Creates a new randgen object, seeded from a random number from
this
    object's gmp_randstate_t.

  initial_seed()
    Returns the initial seed used to create this randgen.

  Also, Cython code can just access the gmp_randstate_t directly.

Constructor:
  randgen()
    Create a new randgen, seeded randomly (from os.urandom() if
available,
    from the system time otherwise).
  randgen(n)
    Create a new randgen, seeded from n.

The current global randgen is available from current_randgen().  So
library code might look like:

  import sage.misc.random as random

  py_random = random.current_randgen().randstate_python()
  print py_random.random()

or:

  random.current_randgen().set_seed_gp()
  print gp('random()')

or, from Cython:

  cdef randgen rgen = random.current_randgen()
  mpz_urandomb(z, rgen.gmp_randstate, 512)

What do you think?

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] random seed framework (was: Re: problem with test framework)

Reply via email to