GitHub user freeman-lab opened a pull request:
https://github.com/apache/spark/pull/2889
Fix for sampling error in NumPy v1.9 [SPARK-3995][PYSPARK]
Change maximum value for default seed during RDD sampling so that it is
strictly less than 2 ** 32. This prevents a bug in the most recent version of
NumPy, which cannot accept random seeds above this bound.
Adds an extra test that uses the default seed (instead of setting it
manually, as in the docstrings).
@mengxr
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/freeman-lab/spark pyspark-sampling
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2889.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2889
----
commit dc385ef6f103a28361de3e7599a1e15528973180
Author: freeman <[email protected]>
Date: 2014-10-22T06:45:40Z
Change maximum value for default seed
- Fixes bug in NumPy v1.9 which truncates random seeds larger than or
equal to 2 ** 32
- Add an extra test for sampling with default seed
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]