Olivier Grisel added the comment:
Adding such a hook would make it possible to reimplement
cloudpickle.CloudPickler by deriving from the fast _pickle.Pickler class
(instead of the slow pickle._Pickler as done currently). This would mean
rewriting most of the CloudPickler method to only rely
Olivier Grisel added the comment:
As Victor said, the `time.sleep(1.0)` might lead to Heisen failures. I am not
sure how to write proper strong synchronization in this case but we could
instead go for something intermediate such as the following pattern:
...
p.terminate
New submission from Olivier Grisel :
I noticed that both pickle.Pickler (C version) and pickle._Pickler (Python
version) make unnecessary memory copies when dumping large str, bytes and
bytearray objects.
This is caused by unnecessary concatenation of the opcode and size header with
the
Olivier Grisel added the comment:
I wrote a script to monitor the memory when dumping 2GB of data with python
master (C pickler and Python pickler):
```
(py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping
Olivier Grisel added the comment:
Note that the time difference is not significant. I rerun the last command I
got:
```
(py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py
--use-pypickle
Allocating source data...
=> peak memory usage: 2.014 GB
Dumping to disk...
done
Olivier Grisel added the comment:
More benchmarks with the unix time command:
```
(py37) ogrisel@ici:~/code/cpython$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
(py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_p
Olivier Grisel added the comment:
In my last comment, I also reported the user times (not spend in OS level disk
access stuff): the code of the PR is on the order of 300-400ms while master is
around 800ms or more.
--
___
Python tracker
<ht
Olivier Grisel added the comment:
I have pushed a new version of the code that now has a 10% overhead for small
bytes (instead of 40% previously).
It could be possible to optimize further but I think that would render the code
much less readable so I would be tempted to keep it this way
Olivier Grisel added the comment:
Actually, I think this can still be improved while keeping it readable. Let me
try again :)
--
___
Python tracker
<https://bugs.python.org/issue31
Olivier Grisel added the comment:
Alright, the last version has now ~4% overhead for small bytes.
--
___
Python tracker
<https://bugs.python.org/issue31
Olivier Grisel added the comment:
BTW, I am looking at the C implementation at the moment. I think I can do it.
--
___
Python tracker
<https://bugs.python.org/issue31
Olivier Grisel added the comment:
I have tried to implement the direct write bypass for the C version of the
pickler but I get a segfault in a Py_INCREF on obj during the call to
memo_put(self, obj) after the call to _Pickler_write_large_bytes.
Here is the diff of my current version of the
Olivier Grisel added the comment:
Alright, I found the source of my refcounting bug. I updated the PR to include
the C version of the dump for PyBytes.
I ran Serhiy's microbenchmarks on the C version and I could not detect any
overhead on small bytes objects while I get a ~20x speedup
Olivier Grisel added the comment:
Thanks Antoine, I updated my code to what you suggested.
--
___
Python tracker
<https://bugs.python.org/issue31993>
___
___
Olivier Grisel added the comment:
> While we are here, wouldn't be worth to flush the buffer in the C
> implementation to the disk always after committing a frame? This will save a
> memory when dump a lot of small objects.
I think it's a good idea. The C pickler would b
Olivier Grisel added the comment:
Flushing the buffer at each frame commit will cause a medium-sized write every
64kB on average (instead of one big write at the end). So that might actually
cause a performance regression for some users if the individual file-object
writes induce significant
Olivier Grisel added the comment:
Shall we close this issue now that the PR has been merged to master?
--
___
Python tracker
<https://bugs.python.org/issue31
Olivier Grisel added the comment:
Thanks for the very helpful feedback and guidance during the review.
--
___
Python tracker
<https://bugs.python.org/issue31
Olivier Grisel added the comment:
I have implemented a custom subclass of the multiprocessing Pool to be able
plug custom pickling strategy for this specific use case in joblib:
https://github.com/joblib/joblib/blob/master/joblib/pool.py#L327
In particular it can:
- detect mmap-backed numpy
Olivier Grisel added the comment:
I forgot to end a sentence in my last comment:
- detect mmap-backed numpy
should read:
- detect mmap-backed numpy arrays and pickle only the filename and other buffer
metadata to reconstruct a mmap-backed array in the worker processes instead of
copying the
Olivier Grisel added the comment:
> In 3.3 you can do
>
> from multiprocessing.forking import ForkingPickler
> ForkingPickler.register(MyType, reduce_MyType)
>
> Is this sufficient for you needs? This is private (and its definition has
> moved in 3.4) but it
Olivier Grisel added the comment:
Related question: is there any good reason that would prevent to pass a custom
`start_method` kwarg to the `Pool` constructor to make it use an alternative
`Popen` instance (that is an instance different from the
`multiprocessing._Popen` singleton)?
This
Olivier Grisel added the comment:
> Maybe it would be better to have separate contexts for each start method.
> That way joblib could use the forkserver context without interfering with the
> rest of the user's program.
Yes in general it would be great if libraries could
Olivier Grisel added the comment:
The process pool executor [1] from the concurrent futures API would be suitable
to explicitly start and stop the helper process for the `forkserver` mode.
[1]
http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor
Olivier Grisel added the comment:
Richard Oudkerk: thanks for the clarification, that makes sense. I don't have
the time either in the coming month, maybe later.
--
___
Python tracker
<http://bugs.python.org/is
Olivier Grisel added the comment:
I tested the patch on the current HEAD and it fixes a regression introduced
between 3.3 and 3.4b1 that prevented to build scipy from source with "pip
install scipy".
--
nosy: +Olivier.Grisel
___
Pyth
New submission from Olivier Grisel:
`pickle.whichmodule` performs an iteration over `sys.modules` and tries to
perform `getattr` calls on those modules. Unfortunately some modules such as
those from the `six.moves` dynamic module can trigger imports when calling
`getattr` on them, hence
Olivier Grisel added the comment:
New version of the patch to add an inline comment.
--
Added file: http://bugs.python.org/file35841/pickle_whichmodule_20140703.patch
___
Python tracker
<http://bugs.python.org/issue21
Olivier Grisel added the comment:
No problem. Thanks Antoine for the review!
--
___
Python tracker
<http://bugs.python.org/issue21905>
___
___
Python-bugs-list m
New submission from Olivier Grisel:
Here is a simple python program that uses the new forkserver feature introduced
in 3.4b1:
name: checkforkserver.py
"""
import multiprocessing
import os
def do(i):
print(i, os.getpid())
def test_forkserver():
mp = multiprocess
Changes by Olivier Grisel :
--
type: -> crash
___
Python tracker
<http://bugs.python.org/issue19946>
___
___
Python-bugs-list mailing list
Unsubscrib
Olivier Grisel added the comment:
> So the question is exactly what module is being passed to
> importlib.find_spec() and why isn't it finding a spec/loader for that module.
The module is the `nosetests` python script. module_name == 'nosetests' in this
case. Howe
Olivier Grisel added the comment:
I agree that a failure to lookup the module should raise an explicit exception.
> Second, there is no way that 'nosetests' will ever succeed as an import
> since, as Oliver pointed out, it doesn't end in '.py' or any other
>
Olivier Grisel added the comment:
> what is sys.modules['__main__'] and sys.modules['__main__'].__file__ if you
> run under nose?
$ cat check_stuff.py
import sys
def test_main():
print("sys.modules['__main__']=%r"
% sys.modules
Olivier Grisel added the comment:
Note however that the problem is not specific to nose. If I rename my initial
'check_forserver.py' script to 'check_forserver', add the '#!/usr/bin/env
python' header and make it 'chmod +x' I get the same crash.
So
Olivier Grisel added the comment:
Here is a patch that uses `imp.load_source` when the first importlib name-based
lookup fails.
Apparently it fixes the issue on my box but I am not sure whether this is the
correct way to do it.
--
keywords: +patch
Added file: http://bugs.python.org
Olivier Grisel added the comment:
Why has this issue been closed? Won't the spawn and forkserver mode work in
Python 3.4 for Python program started by a Python script (which is probably the
majority of programs written in Python under unix)?
Is there any reason not to use the `imp.load_s
Olivier Grisel added the comment:
> The semantics are not going to change in python 3.4 and will just stay as
> they were in Python 3.3.
Well the semantics do change: in Python 3.3 the spawn and forkserver modes did
not exist at all. The "spawn" mode existed but only implicitl
Olivier Grisel added the comment:
I can wait (or monkey-patch the stuff I need as a temporary workaround in my
code). My worry is that Python 3.4 will introduce a new feature that is very
crash-prone.
Take this simple program that uses the newly introduced `get_context` function
(the same
Olivier Grisel added the comment:
For Python 3.4:
Maybe rather than raising ImportError, we could issue warning to notify the
users that names from the __main__ namespace could not be loaded and make the
init_module_attrs return early.
This way a multiprocessing program that only calls
Olivier Grisel added the comment:
I applied issue19946_pep_451_multiprocessing_v2.diff and I confirm that it
fixes the problem that I reported initially.
--
___
Python tracker
<http://bugs.python.org/issue19
41 matches
Mail list logo