[Numpy-discussion] Google Summer of Docs Faq and Tutorial Proposed Structure .

2020-07-04 Thread Ben Nathanson
Let me first say I'm a volunteer and not a member of the group that
will evaluate proposals. But I can tell you how I would choose.

A right-sized proposal is stronger than an overambitious one. I'd
scale back and pick a single target. From your listed experience, this
would be your first significant docs work. Before you can overhaul an
entire site, you need to get a feel for the pace of the work. The
review process alone guarantees the site can't be transformed in three
months.

For argument's sake, let's say you focus on mining the internet for
how-to subjects. This is not a bad GSoD idea, because it's open-ended:
No matter how fast you write, you'll never exhaust the supply of
topics, and if work proceeds more slowly than you'd hoped, you will
have established a methodology, and any how-tos we get are more than
we have now.

Apart from appropriate scope, a proposal needs credibility. GSoD tech
writers are a coveted resource, so a project wants to be confident
that the writer they pick will deliver the goods. We're scientists and
engineers, yourself included, so we look for evidence. Put evidence in
your proposal. Assuming you pick how-tos, show the NumPy team how
thoroughly you've considered the issues, explain your methodology and
its strengths and shortcomings, and, most importantly, give samples of
how-tos you've transformed.

It's like a job interview, but harder: Not only do you have to provide
the answers, you have to anticipate the questions.

Does that mean you have to do work upfront that you might do during
GSoD? Yes. You're staking capital. The more you put on the table, the
more confident the team can be in your sincerity and skill. That said,
you do stand to lose it all if someone else is chosen. That can happen
even if you send in a proposal that everyone agrees looks good. Your
effort will not be a waste; you'll have developed skill in drafting a
competitive proposal -- useful in grant writing, calls for papers, and
next year's GSoD.

Again, this is peer review, not official guidance. And to be clear, I
myself am not participating in GSoD; I thought I might but chose
instead to simply volunteer.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Google Summer of Docs Faq and Tutorial Proposed Structure .

2020-07-04 Thread Ben Nathanson
If you do choose how-tos, I meant to say that the first place to mine
should be this very list. For instance, a question the other day on
seeding random sequences sparked an illuminating and far-reaching
discussion.

Some things that make this list a great source:

 * Extracting how-tos from the mailing list does a real service --
questions on the mailing list are much less visible via Google search
than SO questions

 * Answers here are likely to be deep and interesting (i.e., not
simply answers you'll find in the docs)

 * We own the list; no doubts about usage rights

 * We have authoritative answers from code authors

Mining only this list would not be enough for a proposal, however;
there'd need to be something else as well (e.g., mining SO/Reddit).

On the subject of mining SO, I'd suggest not only weighting by
frequency but also searching out answers that came from the community
-- e.g. Robert Kern, Warren Weckesser, Jaime Fernández del Río
(jaime), and others whom I apologize for missing. Here again we'd add
value by giving prominence (and an imprimatur) to the best answers.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-07-04 Thread Evgeni Burovski
Thanks Kevin, thanks Robert, this is very helpful!

I'd strongly agree with Matti that your explanations could/should make
it to the docs. Maybe it's something for the GSoD.

While we're on the subject, one comment and two (hopefully last) questions:

1. My two cents w.r.t. `np.random.simple_seed()` function Robert
mentioned: I personally would find it way more confusing than a clear
explanation + example in the docs. I'd ask myself what's "simple"
here, click through to the source of this `simple_seed`, find out that
it's a docsting and a two-liner, and just copy-paste the latter into
my user code. Again, just FWIW.

2. What would be a preferred way of spelling out "give me the N-th
spawned child SeedSequence"?
The use case is that I prepare (human-readable) input files once and
run a number of computational jobs in separate OS processes. From what
Kevin said, I can of course five each worker a pair of (entropy,
worker_id) and then each of them does at startup

> parent_seq = SeedSequence(entropy)
> this_sequence = seed_seq.spawn(worker_id)[worker_id]

Is this a recommended way, or is there a better API? Or does the
number of spawned children need to be known beforehand?
I'd much rather avoid serialization/deserialization if possible.

3. Is there a way of telling the number of draws a generator did?

The use case is to checkpoint the number of draws and `.advance` the
bit generator when resuming from the checkpoint. (The runs are longer
then the batch queue limits).

Thanks!

Evgeni

On Mon, Jun 29, 2020 at 11:06 PM Robert Kern  wrote:
>
> On Mon, Jun 29, 2020 at 11:30 AM Robert Kern  wrote:
>>
>> On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard  
>> wrote:
>>>
>>> The total number of digits in the binary representation is somewhere 
>>> between 32 and 128.
>>
>>
>> I like using the standard library `secrets` module.
>>
>> >>> import secrets
>> >>> secrets.randbelow(1<<128)
>> 8080125189471896523368405732926911908
>>
>> If you want an easy-to-follow rule, just use the above snippet to get a 
>> 128-bit number. More than 128 bits won't do you any good (at least by 
>> default, the internal bottleneck inside of SeedSequence is a 128-bit pool), 
>> and 128-bit numbers are just about small enough to copy-paste comfortably.
>
>
> Sorry, `secrets.randbits(128)` is the cleaner form of this.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-07-04 Thread Robert Kern
On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski 
wrote:

> Thanks Kevin, thanks Robert, this is very helpful!
>
> I'd strongly agree with Matti that your explanations could/should make
> it to the docs. Maybe it's something for the GSoD.
>
> While we're on the subject, one comment and two (hopefully last) questions:
>
> 1. My two cents w.r.t. `np.random.simple_seed()` function Robert
> mentioned: I personally would find it way more confusing than a clear
> explanation + example in the docs. I'd ask myself what's "simple"
> here, click through to the source of this `simple_seed`, find out that
> it's a docsting and a two-liner, and just copy-paste the latter into
> my user code. Again, just FWIW.
>

Noted.


> 2. What would be a preferred way of spelling out "give me the N-th
> spawned child SeedSequence"?
> The use case is that I prepare (human-readable) input files once and
> run a number of computational jobs in separate OS processes. From what
> Kevin said, I can of course five each worker a pair of (entropy,
> worker_id) and then each of them does at startup
>
> > parent_seq = SeedSequence(entropy)
> > this_sequence = seed_seq.spawn(worker_id)[worker_id]
>
> Is this a recommended way, or is there a better API? Or does the
> number of spawned children need to be known beforehand?
> I'd much rather avoid serialization/deserialization if possible.
>

Assuming that `worker_id` starts at 0:

  this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))


> 3. Is there a way of telling the number of draws a generator did?
>
> The use case is to checkpoint the number of draws and `.advance` the
> bit generator when resuming from the checkpoint. (The runs are longer
> then the batch queue limits).
>

There are computations you can do on the internal state of PCG64 and Philox
to get this information, but not in general, no. I do recommend serializing
the Generator or BitGenerator (or at least the BitGenerator's .state
property, which is a nice JSONable dict for PCG64) for checkpointing
purposes. Among other things, there is a cached uint32 for when odd numbers
of uint32s are drawn that you might need to handle. The state of the
default PCG64 is much smaller than MT19937. It's less work and more
reliable than computing that distance and storing the original seed and the
distance.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-07-04 Thread Neal Becker
On Sat, Jul 4, 2020 at 1:56 PM Robert Kern  wrote:


>
> 3. Is there a way of telling the number of draws a generator did?
>>
>> The use case is to checkpoint the number of draws and `.advance` the
>> bit generator when resuming from the checkpoint. (The runs are longer
>> then the batch queue limits).
>>
>
> There are computations you can do on the internal state of PCG64 and
> Philox to get this information, but not in general, no. I do recommend
> serializing the Generator or BitGenerator (or at least the BitGenerator's
> .state property, which is a nice JSONable dict for PCG64) for checkpointing
> purposes. Among other things, there is a cached uint32 for when odd numbers
> of uint32s are drawn that you might need to handle. The state of the
> default PCG64 is much smaller than MT19937. It's less work and more
> reliable than computing that distance and storing the original seed and the
> distance.
>
> --
> Robert Kern
>

Sorry, you lost me here.  If I want to save, restore the state of a
generator, can I use pickle/unpickle?


-- 
*Those who don't understand recursion are doomed to repeat it*
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-07-04 Thread Robert Kern
On Sat, Jul 4, 2020, 2:39 PM Neal Becker  wrote:

>
>
> On Sat, Jul 4, 2020 at 1:56 PM Robert Kern  wrote:
> 
>
>>
>> 3. Is there a way of telling the number of draws a generator did?
>>>
>>> The use case is to checkpoint the number of draws and `.advance` the
>>> bit generator when resuming from the checkpoint. (The runs are longer
>>> then the batch queue limits).
>>>
>>
>> There are computations you can do on the internal state of PCG64 and
>> Philox to get this information, but not in general, no. I do recommend
>> serializing the Generator or BitGenerator (or at least the BitGenerator's
>> .state property, which is a nice JSONable dict for PCG64) for checkpointing
>> purposes. Among other things, there is a cached uint32 for when odd numbers
>> of uint32s are drawn that you might need to handle. The state of the
>> default PCG64 is much smaller than MT19937. It's less work and more
>> reliable than computing that distance and storing the original seed and the
>> distance.
>>
>> --
>> Robert Kern
>>
>
> Sorry, you lost me here.  If I want to save, restore the state of a
> generator, can I use pickle/unpickle?
>

Absolutely.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SciPy 1.5.1

2020-07-04 Thread Tyler Reddy
Hi all,

On behalf of the SciPy development team I'm pleased to announce
the release of SciPy 1.5.1, which is a bug fix release.

Sources and binary wheels can be found at:
https://pypi.org/project/scipy/
and at:  https://github.com/scipy/scipy/releases/tag/v1.5.1

One of a few ways to install this release with pip:

pip install scipy==1.5.1

==
SciPy 1.5.1 Release Notes
==

SciPy 1.5.1 is a bug-fix release with no new features
compared to 1.5.0. In particular, an issue where DLL loading
can fail for SciPy wheels on Windows with Python 3.6 has been
fixed.

Authors
===

* Peter Bell
* Loïc Estève
* Philipp Thölke +
* Tyler Reddy
* Paul van Mulbregt
* Pauli Virtanen
* Warren Weckesser

A total of 7 people contributed to this release.
People with a "+" by their names contributed a patch for the first time.
This list of names is automatically generated, and may not be fully
complete.

Issues closed for 1.5.1


* `#9108 `__: documentation:
scipy.spatial.KDTree vs. scipy.spatial.cKDTree
* `#12218 `__: Type error in
stats.ks_2samp when alternative != 'two-sided-
* `#12406 `__: DOC: Docstring
in stats.anderson function not properly formatted
* `#12418 `__: Regression in
hierarchy.dendogram


Pull requests for 1.5.1


* `#12280 `__: BUG: Fixes
gh-12218, TypeError converting int to float inside...
* `#12336 `__: BUG: KDTree
should reject complex input points
* `#12344 `__: MAINT: Don't use
numpy's aliases of Python builtin objects.
* `#12407 `__: DOC: Fix
docstring for dist param in anderson function
* `#12410 `__: CI: Run the Azure
Windows Python36 32bit tests with mode 'fast'
* `#12421 `__: Fix regression in
scipy 1.5.0 in dendogram when labels is a numpy...
* `#12462 `__: MAINT: move
distributor_init import after __config__ import


Checksums
=

MD5
~~~

b71e8115d61c604cc65e5ecc556131f6
 scipy-1.5.1-cp36-cp36m-macosx_10_9_x86_64.whl
0190c11f75ed28a7e56050182ca95a18  scipy-1.5.1-cp36-cp36m-manylinux1_i686.whl
c4dd717a3a0c3fe64380039e4fda663f
 scipy-1.5.1-cp36-cp36m-manylinux1_x86_64.whl
baad02c954e85e7fd3d4a9fd49fc6359  scipy-1.5.1-cp36-cp36m-win32.whl
9edc3a9aedf6bffccb17101c905126d0  scipy-1.5.1-cp36-cp36m-win_amd64.whl
83479a6de66a6bc2da0990fa71cf3cec
 scipy-1.5.1-cp37-cp37m-macosx_10_9_x86_64.whl
f2d5c8713b087545c5ec19cc8e46212c  scipy-1.5.1-cp37-cp37m-manylinux1_i686.whl
6a18a9636342574ae55d3a80136c550c
 scipy-1.5.1-cp37-cp37m-manylinux1_x86_64.whl
5da68faf5b32c539d1cb5390df010cc8  scipy-1.5.1-cp37-cp37m-win32.whl
2ca8c59a6712e91ac78b8540ab694b53  scipy-1.5.1-cp37-cp37m-win_amd64.whl
cceb059d0cf6a70e62452deb5571ba00
 scipy-1.5.1-cp38-cp38-macosx_10_9_x86_64.whl
8a65b30ccd72409704d3300922da2b7f  scipy-1.5.1-cp38-cp38-manylinux1_i686.whl
00181f52a7917d1c3d50e42a76a6df96
 scipy-1.5.1-cp38-cp38-manylinux1_x86_64.whl
2aa8b6ddceaebe7b33d71dbad0e208cc  scipy-1.5.1-cp38-cp38-win32.whl
a626585d08b0991c8f2df0caacdf9997  scipy-1.5.1-cp38-cp38-win_amd64.whl
f6986798b7d22ffc5f80b749d7ec27ca  scipy-1.5.1.tar.gz
e126a1a0ff954b924a8273faa7437fe3  scipy-1.5.1.tar.xz
3bce82b23d45d1a96ee270f23176746a  scipy-1.5.1.zip

SHA256
~~

058e84930407927f71963a4ad8c1dc96c4d2d075636a68578195648c81f78810
 scipy-1.5.1-cp36-cp36m-macosx_10_9_x86_64.whl
7908c85854c5b5b6d3ce7fefafac1ca3e23ff9ac41edabc2d46ae5dc9fa070ac
 scipy-1.5.1-cp36-cp36m-manylinux1_i686.whl
8302d69fb1528ea7c7f2a1ea640d354c981b6eb8192d1c175349874209397604
 scipy-1.5.1-cp36-cp36m-manylinux1_x86_64.whl
35d042d6499caf1a5d171baed0ebf01eb665b7af2ad98a8ff1b0e6e783654540
 scipy-1.5.1-cp36-cp36m-win32.whl
5e0bb43ff581811ab7f27425f6b96c1ddf7591ccad2e486c9af0b910c18f7185
 scipy-1.5.1-cp36-cp36m-win_amd64.whl
b4858ccbd88f4b53950fb9fc0069c1d9fea83d7cff2382e1d8b023d3f4883014
 scipy-1.5.1-cp37-cp37m-macosx_10_9_x86_64.whl
eb46d8b5947ca27b0bc972cecfba8130f088a83ab3d08c1a6033d9070b3046b3
 scipy-1.5.1-cp37-cp37m-manylinux1_i686.whl
fff15df01bef1243468be60c55178ed7576270b200aab08a7ffd5b8e0bbc340c
 scipy-1.5.1-cp37-cp37m-manylinux1_x86_64.whl
81859ed3aad620752dd2c07c32b5d3a80a0d47c5e3813904621954a78a0ae899
 scipy-1.5.1-cp37-cp37m-win32.whl
c05c6fe76228cc13c5214e9faf5f2a871a1da54473bc417ab9da310d0e5fff8b
 scipy-1.5.1-cp37-cp37m-win_amd64.whl
71742889393a724dfce755b6b61228677873d269a4234e51ddaf08b998433c91
 scipy-1.5.1-cp38-cp38-macosx_10_9_x86_64.whl
9323d268775991b79690f7b9a28a4e8b8c4f2b160ed9f8a90123127314e2d3c1
 scipy-1.5.1-cp38-cp38-manylinux1_i686.whl
06b19a650471781056c1a2172b777b8b516e9