[issue35900] Add pickler hook for the user to customize the serialization of user defined functions and types.

2019-02-13 Thread Olivier Grisel
Olivier Grisel added the comment: Adding such a hook would make it possible to reimplement cloudpickle.CloudPickler by deriving from the fast _pickle.Pickler class (instead of the slow pickle._Pickler as done currently). This would mean rewriting most of the CloudPickler method to only rely

[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Olivier Grisel
Olivier Grisel added the comment: As Victor said, the `time.sleep(1.0)` might lead to Heisen failures. I am not sure how to write proper strong synchronization in this case but we could instead go for something intermediate such as the following pattern: ... p.terminate

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
New submission from Olivier Grisel : I noticed that both pickle.Pickler (C version) and pickle._Pickler (Python version) make unnecessary memory copies when dumping large str, bytes and bytearray objects. This is caused by unnecessary concatenation of the opcode and size header with the

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel added the comment: I wrote a script to monitor the memory when dumping 2GB of data with python master (C pickler and Python pickler): ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py Allocating source data... => peak memory usage: 2.014 GB Dumping

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel added the comment: Note that the time difference is not significant. I rerun the last command I got: ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage: 2.014 GB Dumping to disk... done

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel added the comment: More benchmarks with the unix time command: ``` (py37) ogrisel@ici:~/code/cpython$ git checkout master Switched to branch 'master' Your branch is up-to-date with 'origin/master'. (py37) ogrisel@ici:~/code/cpython$ time python ~/tmp/large_p

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: In my last comment, I also reported the user times (not spend in OS level disk access stuff): the code of the PR is on the order of 300-400ms while master is around 800ms or more. -- ___ Python tracker <ht

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: I have pushed a new version of the code that now has a 10% overhead for small bytes (instead of 40% previously). It could be possible to optimize further but I think that would render the code much less readable so I would be tempted to keep it this way

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: Actually, I think this can still be improved while keeping it readable. Let me try again :) -- ___ Python tracker <https://bugs.python.org/issue31

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: Alright, the last version has now ~4% overhead for small bytes. -- ___ Python tracker <https://bugs.python.org/issue31

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: BTW, I am looking at the C implementation at the moment. I think I can do it. -- ___ Python tracker <https://bugs.python.org/issue31

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel added the comment: I have tried to implement the direct write bypass for the C version of the pickler but I get a segfault in a Py_INCREF on obj during the call to memo_put(self, obj) after the call to _Pickler_write_large_bytes. Here is the diff of my current version of the

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-11 Thread Olivier Grisel
Olivier Grisel added the comment: Alright, I found the source of my refcounting bug. I updated the PR to include the C version of the dump for PyBytes. I ran Serhiy's microbenchmarks on the C version and I could not detect any overhead on small bytes objects while I get a ~20x speedup

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel added the comment: Thanks Antoine, I updated my code to what you suggested. -- ___ Python tracker <https://bugs.python.org/issue31993> ___ ___

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel added the comment: > While we are here, wouldn't be worth to flush the buffer in the C > implementation to the disk always after committing a frame? This will save a > memory when dump a lot of small objects. I think it's a good idea. The C pickler would b

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel added the comment: Flushing the buffer at each frame commit will cause a medium-sized write every 64kB on average (instead of one big write at the end). So that might actually cause a performance regression for some users if the individual file-object writes induce significant

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel
Olivier Grisel added the comment: Shall we close this issue now that the PR has been merged to master? -- ___ Python tracker <https://bugs.python.org/issue31

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel
Olivier Grisel added the comment: Thanks for the very helpful feedback and guidance during the review. -- ___ Python tracker <https://bugs.python.org/issue31

[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel
Olivier Grisel added the comment: I have implemented a custom subclass of the multiprocessing Pool to be able plug custom pickling strategy for this specific use case in joblib: https://github.com/joblib/joblib/blob/master/joblib/pool.py#L327 In particular it can: - detect mmap-backed numpy

[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel
Olivier Grisel added the comment: I forgot to end a sentence in my last comment: - detect mmap-backed numpy should read: - detect mmap-backed numpy arrays and pickle only the filename and other buffer metadata to reconstruct a mmap-backed array in the worker processes instead of copying the

[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel
Olivier Grisel added the comment: > In 3.3 you can do > > from multiprocessing.forking import ForkingPickler > ForkingPickler.register(MyType, reduce_MyType) > > Is this sufficient for you needs? This is private (and its definition has > moved in 3.4) but it

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-10 Thread Olivier Grisel
Olivier Grisel added the comment: Related question: is there any good reason that would prevent to pass a custom `start_method` kwarg to the `Pool` constructor to make it use an alternative `Popen` instance (that is an instance different from the `multiprocessing._Popen` singleton)? This

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-11 Thread Olivier Grisel
Olivier Grisel added the comment: > Maybe it would be better to have separate contexts for each start method. > That way joblib could use the forkserver context without interfering with the > rest of the user's program. Yes in general it would be great if libraries could

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-12 Thread Olivier Grisel
Olivier Grisel added the comment: The process pool executor [1] from the concurrent futures API would be suitable to explicitly start and stop the helper process for the `forkserver` mode. [1] http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-12 Thread Olivier Grisel
Olivier Grisel added the comment: Richard Oudkerk: thanks for the clarification, that makes sense. I don't have the time either in the coming month, maybe later. -- ___ Python tracker <http://bugs.python.org/is

[issue19851] reload problem with submodule

2013-12-09 Thread Olivier Grisel
Olivier Grisel added the comment: I tested the patch on the current HEAD and it fixes a regression introduced between 3.3 and 3.4b1 that prevented to build scipy from source with "pip install scipy". -- nosy: +Olivier.Grisel ___ Pyth

[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-07-02 Thread Olivier Grisel
New submission from Olivier Grisel: `pickle.whichmodule` performs an iteration over `sys.modules` and tries to perform `getattr` calls on those modules. Unfortunately some modules such as those from the `six.moves` dynamic module can trigger imports when calling `getattr` on them, hence

[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-07-03 Thread Olivier Grisel
Olivier Grisel added the comment: New version of the patch to add an inline comment. -- Added file: http://bugs.python.org/file35841/pickle_whichmodule_20140703.patch ___ Python tracker <http://bugs.python.org/issue21

[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-10-06 Thread Olivier Grisel
Olivier Grisel added the comment: No problem. Thanks Antoine for the review! -- ___ Python tracker <http://bugs.python.org/issue21905> ___ ___ Python-bugs-list m

[issue19946] multiprocessing crash with forkserver or spawn when run from a non ".py" ending script

2013-12-10 Thread Olivier Grisel
New submission from Olivier Grisel: Here is a simple python program that uses the new forkserver feature introduced in 3.4b1: name: checkforkserver.py """ import multiprocessing import os def do(i): print(i, os.getpid()) def test_forkserver(): mp = multiprocess

[issue19946] multiprocessing crash with forkserver or spawn when run from a non ".py" ending script

2013-12-10 Thread Olivier Grisel
Changes by Olivier Grisel : -- type: -> crash ___ Python tracker <http://bugs.python.org/issue19946> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue19946] multiprocessing crash with forkserver or spawn when run from a non ".py" ending script

2013-12-10 Thread Olivier Grisel
Olivier Grisel added the comment: > So the question is exactly what module is being passed to > importlib.find_spec() and why isn't it finding a spec/loader for that module. The module is the `nosetests` python script. module_name == 'nosetests' in this case. Howe

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: I agree that a failure to lookup the module should raise an explicit exception. > Second, there is no way that 'nosetests' will ever succeed as an import > since, as Oliver pointed out, it doesn't end in '.py' or any other >

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: > what is sys.modules['__main__'] and sys.modules['__main__'].__file__ if you > run under nose? $ cat check_stuff.py import sys def test_main(): print("sys.modules['__main__']=%r" % sys.modules

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: Note however that the problem is not specific to nose. If I rename my initial 'check_forserver.py' script to 'check_forserver', add the '#!/usr/bin/env python' header and make it 'chmod +x' I get the same crash. So

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: Here is a patch that uses `imp.load_source` when the first importlib name-based lookup fails. Apparently it fixes the issue on my box but I am not sure whether this is the correct way to do it. -- keywords: +patch Added file: http://bugs.python.org

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: Why has this issue been closed? Won't the spawn and forkserver mode work in Python 3.4 for Python program started by a Python script (which is probably the majority of programs written in Python under unix)? Is there any reason not to use the `imp.load_s

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: > The semantics are not going to change in python 3.4 and will just stay as > they were in Python 3.3. Well the semantics do change: in Python 3.3 the spawn and forkserver modes did not exist at all. The "spawn" mode existed but only implicitl

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: I can wait (or monkey-patch the stuff I need as a temporary workaround in my code). My worry is that Python 3.4 will introduce a new feature that is very crash-prone. Take this simple program that uses the newly introduced `get_context` function (the same

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the "main" module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: For Python 3.4: Maybe rather than raising ImportError, we could issue warning to notify the users that names from the __main__ namespace could not be loaded and make the init_module_attrs return early. This way a multiprocessing program that only calls

[issue19946] Handle a non-importable __main__ in multiprocessing

2013-12-16 Thread Olivier Grisel
Olivier Grisel added the comment: I applied issue19946_pep_451_multiprocessing_v2.diff and I confirm that it fixes the problem that I reported initially. -- ___ Python tracker <http://bugs.python.org/issue19