from:"Nir"

[issue3415] Interpreter error when running a script under debugger control

2008-07-18 Thread Nir


New submission from Nir <[EMAIL PROTECTED]>:

Interpreter error results in erroneous exception when running a script
under debugger control.

Full repro description:
On Windows System, try to run idle.py under the inspection of pdb.py. 
Note that you must set a breakpoint somewhere otherwise pdb will not
trace the script and the issue will not surface.

You should get the bad exception in line 295 of multicall.py

Python complains that a local variable has been used before being
declared while in effect it has been a couple of lines before that point.

Nir

--
components: Interpreter Core
messages: 70005
nosy: nirai
severity: normal
status: open
title: Interpreter error when running a script under debugger control
type: behavior
versions: Python 2.6, Python 3.0

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3415>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7471] GZipFile.readline too slow

2009-12-13 Thread Nir


Nir  added the comment:

First patch, please forgive long comment :)

I submit a small patch which speeds up readline() on my data set - a 
74MB (5MB .gz) log file with 600K lines.

The speedup is 350%.

Source of slowness is that (~20KB) extrabuf is allocated/deallocated in 
read() and _unread() with each call to readline().

In the patch read() returns a slice from extrabuf and defers 
manipulation of extrabuf to _read().

In the following, the first timeit() corresponds to reading extrabuf 
slices while the second timeit() corresponds to read() and _unread() as 
they are done today:

>>> timeit.Timer("x[1: 10100]", "x = 'x' * 2").timeit()
0.25299811363220215

>>> timeit.Timer("x[: 100]; x[100:]; x[100:] + x[: 100]", "x = 'x' * 
1").timeit()
5.843876838684082

Another speedup is achieved by doing a small shortcut in readline() for 
the typical case in which the entire line is already in extrabuf.

The patch only addresses the typical case of calling readline() with no 
arguments. It does not address other problems in readline() logic. In 
particular the current 512 chunk size is not a sweet spot. Regardless of 
the size argument passed to readline(), read() will continue to 
decompress just 1024 bytes with each call as the size of extrabuf swings 
around the target size argument as result of the interaction between 
_unread() and read().

--
keywords: +patch
nosy: +nirai
Added file: http://bugs.python.org/file15536/gzip_7471_patch.diff

___
Python tracker 
<http://bugs.python.org/issue7471>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1010] Broken bug tracker url

2007-08-24 Thread Nir Soffer


New submission from Nir Soffer:

In http://docs.python.org/lib/about.html, the link to "Python Bug Tracker" 
point to the old close tracker in sourceforge.net. Should be replaced to 
bugs.python.org.

--
components: Documentation
messages: 55253
nosy: nirs
severity: normal
status: open
title: Broken bug tracker url
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1010>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1011] Wrong documentation for rfc822.Message.getheader

2007-08-24 Thread Nir Soffer


New submission from Nir Soffer:

In http://docs.python.org/lib/message-objects.html, getheader doc say:

 "Like getrawheader(name), but strip leading and trailing whitespace. 
Internal whitespace is not stripped. The optional default argument can 
be used to specify a different default to be returned when there is no 
header matching name."

However, getheader is not like getrawheader. getheader return the *last* 
header seen, using the message dict. getrawheader retruns the *first* 
header line seen, searching through the list of parsed header lines.

The text should also note that getheader is faster and the preferred way 
to get parsed headers.

--
components: Documentation
messages: 55254
nosy: nirs
severity: normal
status: open
title: Wrong documentation for rfc822.Message.getheader
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1011>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1072] Documentaion font size too small

2007-08-31 Thread Nir Soffer


New submission from Nir Soffer:

The css uses font-size of 13px. This is way too small and hard to read 
specially on high resolution screens used typically on laptops.

Font size for body text should be 100%. A user can select the preferred 
font size using the browser.

Python 2.x documentation is much more readable.

--
components: Documentation
messages: 55534
nosy: nirs
severity: major
status: open
title: Documentaion font size too small
versions: Python 3.0

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1072>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1072] Documentaion font size too small

2007-09-01 Thread Nir Soffer


Nir Soffer added the comment:

The body font size is good now, but now lot of elements are too big. 
Here are list of issues in typical pages related to the font change:

Module page: (e.g. http://docs.python.org/dev/library/bisect.html)

- content headings
- the bread-crumbs navigation flow to out of its div when the using 
narrow window or huge font in the browser
- previous|next... links in the header
- headings in the sidebar are huge
- text in the sidebar can be smaller then the body text
- copyright line - can by tiny

Main page (http://docs.python.org/dev/index.html):

- The main titles (e.g. What's new in ...) are huge - 100-120% should be 
fine. (CSS class biglink)
- The links description may be smaller then the regular body text

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1072>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1080] Search broken

2007-09-01 Thread Nir Soffer


New submission from Nir Soffer:

http://docs.python.org/dev/search.html

In Safari:
- does not find anything. e.g. search for print.
- The sections selection do not remember the user selection. e.g. select 
Language Reference, search, the page comes out with Language Reference 
deselected.
- The search term is not remembered - the search box is always empty
- There are not search results

In Firefox:

- Search interface remember the search term and the sections selection, 
but the search results are always empty

--
components: Documentation
messages: 7
nosy: nirs
severity: normal
status: open
title: Search broken
type: behavior
versions: Python 2.6, Python 3.0

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1080>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1125] bytes.split shold have same interface as str.split, or different name

2007-09-06 Thread Nir Soffer


New submission from Nir Soffer:

>>> b'foo  bar'.split()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: split() takes at least 1 argument (0 given)

>>> b'foo  bar'.split(None)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: expected an object with the buffer interface

str.split and bytes.split should have the same interface, or different 
names.

--
components: Library (Lib)
messages: 55723
nosy: nirs
severity: normal
status: open
title: bytes.split shold have same interface as str.split, or different name
versions: Python 3.0

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1125>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1123] split(None, maxsplit) does not strip whitespace correctly

2007-09-06 Thread Nir Soffer


Nir Soffer added the comment:

typo in the title

--
title: split(None, maxplit) does not strip whitespace correctly -> split(None, 
maxsplit) does not strip whitespace correctly

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1123>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1123] split(None, maxplit) does not strip whitespace correctly

2007-09-06 Thread Nir Soffer


New submission from Nir Soffer:

string object .split doc say (http://docs.python.org/lib/string-
methods.html):

"If sep is not specified or is None, a different splitting algorithm 
is applied. First, whitespace characters (spaces, tabs, newlines, 
returns, and formfeeds) are stripped from both ends."

If the maxsplit argument is set and is smaller then the number of 
possible parts, whitespace is not removed.

Examples:

>>> 'k: v\n'.split(None, 1)
['k:', 'v\n']

Expected: ['k:', 'v']

>>> u'k: v\n'.split(None, 1)
[u'k:', u'v\n']

Expected: [u'k:', u'v']

With larger values of maxsplits, it works correctly:

>>> 'k: v\n'.split(None, 2)
['k:', 'v']
>>> u'k: v\n'.split(None, 2)
[u'k:', u'v']

This looks like implementation bug, because there it does not make sense 
that the striping depends on the maxsplit argument, and it will be hard 
to explain such behavior.

Maybe the striping should be removed in Python 3? It does not make sense 
to strip a string behind your back when you want to split it, and the 
caller can easily strip the string if needed.

--
components: Library (Lib)
messages: 55720
nosy: nirs
severity: normal
status: open
title: split(None, maxplit) does not strip whitespace correctly
versions: Python 2.4, Python 2.5, Python 3.0

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1123>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1123] split(None, maxsplit) does not strip whitespace correctly

2007-09-06 Thread Nir Soffer


Nir Soffer added the comment:

set type

--
type:  -> behavior

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1123>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1126] file.fileno and file.isatty() should be implementable by any file like object

2007-09-06 Thread Nir Soffer


New submission from Nir Soffer:

The docs (http://docs.python.org/dev/3.0/library/stdtypes.html#sequence-
types-str-bytes-list-tuple-buffer-range) warn that .fileno and .istty 
should not be implemented by a file like object.

This require client to check if the file object has the attribute before 
they call the method, instead of treating all files the same.

if hasattr(foo, 'istty'):
if foo.istty():
# ...

Instead of:

if foo.istty():
# ...

istty can easily return False. fileno can return invalid file descriptor 
number (-1?).

--
components: Library (Lib)
messages: 55724
nosy: nirs
severity: normal
status: open
title: file.fileno and file.isatty() should be implementable by any file like 
object
type: rfe
versions: Python 3.0

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1126>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1125] bytes.split shold have same interface as str.split, or different name

2007-09-07 Thread Nir Soffer


Nir Soffer added the comment:

Why bytes should not use a default whitespace split behavior as str?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1125>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1014] cgi: parse_qs and parse_qsl misbehave on empty strings

2007-09-06 Thread Nir Soffer


Nir Soffer added the comment:

Addionally, if the default value is empty string, you expect it work with 
empty string. If a non empty value is needed, it would use None as the 
default.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1014>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1125] bytes.split shold have same interface as str.split, or different name

2007-09-06 Thread Nir Soffer


Nir Soffer added the comment:

set type

--
type:  -> rfe

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1125>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1123] split(None, maxsplit) does not strip whitespace correctly

2007-09-10 Thread Nir Soffer


Nir Soffer added the comment:

I did not look into the source, but obviously there is striping of 
leading and trailing whitespace. 

When you specify a separator you get:
>>> '  '.split(' ')
['', '', '']

>>> '  a  b  '.split('  ')
['', 'a', 'b', '']

So one would expect to get this without striping:
>>> '  a  b  '.split()
['', 'a', 'b', '']

But you get this:
>>> '  a  b  '.split()
['a', 'b']

So the documentation is correct.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1123>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1123] split(None, maxsplit) does not strip whitespace correctly

2007-09-11 Thread Nir Soffer


Nir Soffer added the comment:

There is a problem only when maxsplit is smaller than the available 
splits. In other cases, the docs and the behavior match.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1123>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1123] split(None, maxsplit) does not strip whitespace correctly

2007-09-18 Thread Nir Soffer


Nir Soffer added the comment:

I quoted str.split docs:

- http://docs.python.org/lib/string-methods.html
- http://docs.python.org/dev/library/stdtypes.html
- http://docs.python.org/dev/3.0/library/stdtypes.html

string.split doc does it explain this:

>>> ' a b '.split(None, 1)
['a', 'b ']
>>> ' a b '.split(None, 2)
['a', 'b']

.split method docs is more clear and describe this in a very simple way. 

This is a better description of the current behavior:

"If sep is not specified or is None, a different splitting algorithm 
is applied. First, whitespace characters (spaces, tabs, newlines, 
returns, and formfeeds) are stripped from the start of the string. Then, 
words are separated by arbitrary length strings of whitespace 
characters. Consecutive whitespace delimiters are treated as a single 
delimiter ("' 1 \t 2 \n 3 '.split()" returns "['1', '2', '3']").

If maxsplit is nonzero, at most maxsplit number of splits occur, and 
the remainder of the string is returned as the final element of the 
list, unless it is empty. Splitting an empty string or a string 
consisting of just whitespace returns an empty list."

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1123>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1625] bz2.BZ2File doesn't support multiple streams

2011-05-23 Thread Nir Aides


Changes by Nir Aides :


--
assignee: niemeyer -> nirai

___
Python tracker 
<http://bugs.python.org/issue1625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1625] bz2.BZ2File doesn't support multiple streams

2011-05-23 Thread Nir Aides


Nir Aides  added the comment:

Hi, I attach a patch to Python 3.3 Lib/bz2.py with updated tests:
cpython-bz2-streams.patch

--
keywords: +needs review
stage: needs patch -> patch review
Added file: http://bugs.python.org/file22087/cpython-bz2-streams.patch

___
Python tracker 
<http://bugs.python.org/issue1625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1625] bz2.BZ2File doesn't support multiple streams

2011-05-23 Thread Nir Aides


Nir Aides  added the comment:

Wait, the tests seem wrong. I'll post an update later today.

--

___
Python tracker 
<http://bugs.python.org/issue1625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1625] bz2.BZ2File doesn't support multiple streams

2011-05-24 Thread Nir Aides


Nir Aides  added the comment:

False alarm; go ahead with the review. I took a look too early in the morning 
before caffeine kicked in.

Note Lib/test/test_bz2.py was directly upgraded from bz2ms.patch.

A note on bz2 behavior: A BZ2Decompressor object is only good for one stream; 
after that eof is set and it will refuse to continue to the next stream; this 
seems in line with bzip2 manual:
http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html#bzDecompress

--

___
Python tracker 
<http://bugs.python.org/issue1625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1625] bz2.BZ2File doesn't support multiple streams

2011-05-25 Thread Nir Aides


Nir Aides  added the comment:

Right! I updated the patch and added a test for the aligned stream/buffer case.

--
Added file: http://bugs.python.org/file22114/cpython-bz2-streams.patch

___
Python tracker 
<http://bugs.python.org/issue1625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1625] bz2.BZ2File doesn't support multiple streams

2011-05-25 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file22087/cpython-bz2-streams.patch

___
Python tracker 
<http://bugs.python.org/issue1625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-06-30 Thread Nir Aides


Nir Aides  added the comment:

Well, I ping my view that we should:

1) Add general atfork() mechanism.
2) Dive into the std lib and add handlers one by one, that depending on case, 
either do the lock/init thing or just init the state of the library to some 
valid state in the child.

Once this mechanism is in place and committed with a few obvious handlers such 
as the one for the logging library, other handlers can be added over time.

Following this path we will slowly resolve the problem, handler by handler, 
without introducing the invalid state problem.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-06-30 Thread Nir Aides


Nir Aides  added the comment:

> I believe that the comp.programming.threads post from 
> David Butenhof linked above explains why adding atfork() 
> handlers isn't going to solve this.

In Python atfork() handlers will never run from signal handlers, and if I 
understood correctly, Charles-François described a way to "re-initialize" a 
Python lock safely under that assumption.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-01 Thread Nir Aides


Nir Aides  added the comment:

> - what would be the API of this atfork() mechanism (with an example of how it 
> would be used in the library)?

The atfork API is defined in POSIX and Gregory P. Smith proposed a Python one 
above that we can look into.
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

We may need an API to reset a lock.

> - how do you find the correct order to acquire locks in the parent process?

One option is to use the import graph to determine call order of atfork 
handlers. 
If a current std library does not fit into this scheme we can possibly fix it 
when writing its handlers.

> - what do you do with locks that can be held for arbitrarily long (e.g. I/O 
> locks)?

It is likely that such a lock does not need acquiring at the parent, and 
re-initializing the library in the child handler will do.
A  "critical section" lock that protects in-memory data should not be held for 
long.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-04 Thread Nir Aides


Nir Aides  added the comment:

> Sorry, I fail to see how the "import graph" is related to the correct
> lock acquisition order. Some locks are created dynamically, for
> example.

Import dependency is a reasonable heuristic to look into for inter-module 
locking order. 

The rational is explained in the following pthread_atfork man page:
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html
"A higher-level package may acquire locks on its own data structures before 
invoking lower-level packages. Under this scenario, the order specified for 
fork handler calls allows a simple rule of initialization for avoiding package 
deadlock: a package initializes all packages on which it depends before it 
calls the pthread_atfork() function for itself."

(The rational section is an interpretation which is not part of the standard)

A caveat is that since Python is an object oriented language it is more common 
than with C that code from a higher level module will be invoked by code from a 
lower level module, for example by calling an object method that was 
over-ridden by the higher level module - this actually happens in the logging 
module (emit method).

> That's why I asked for a specific API: when do you register a handler?
> When are they called? When are they reset?

Read the pthread_atfork man page.

> The whole point of atfork is to avoid breaking invariants and
> introduce invalid state in the child process. If there is one thing we
> want to avoid, it's precisely reading/writting corrupted data from/to
> files, so eluding the I/O problem seems foolish to me.

Please don't use insulting adjectives. 
If you think I am wrong, convincing me logically will do.

you can "avoid breaking invariants" using two different strategies:
1) Acquire locks before the fork and release/reset them after it.
2) Initialize the module to some known state after the fork.

For some (most?) modules it may be quite reasonable to initialize the module to 
a known state after the fork without acquiring its locks before the fork; this 
too is explained in the pthread_atfork man page:
"Alternatively, some libraries might be able to supply just a child routine 
that reinitializes the mutexes in the library and all associated states to some 
known value (for example, what it was when the image was originally executed)."

> > A  "critical section" lock that protects in-memory data should not be held 
> > for long.
>
> Not necessarily. See for example I/O locks and logging module, which
> hold locks until I/O completion.

Oops, I have always used the term "critical section" to describe a lock that 
protects data state as tightly as possible, ideally not even across function 
calls but now I see the Wikipedia defines one to protect any resource including 
IO.

The logging module locks the entire emit() function which I think is wrong. 
It should let the derived handler take care of locking when it needs to, if it 
needs to at all.

The logging module is an example for a module we should reinitialize after the 
fork without locking its locks before the fork.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-06 Thread Nir Aides


Nir Aides  added the comment:

> Would you like to work on a patch to add an atfork mechanism?

I will start with an attempt to generate a summary "report" on this rabbit hole 
of a problem, the views and suggestions raised by people here and what we may 
expect from atfork approach, its limitations, etc... I will also take a deeper 
look into the code.

Hopefully my brain will not deadlock or fork while I am at it.

more words, I know...

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue874900] malloc

2011-07-10 Thread Nir Aides


Changes by Nir Aides :


--
title: threading module can deadlock after fork -> malloc

___
Python tracker 
<http://bugs.python.org/issue874900>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue874900] threading module can deadlock after fork

2011-07-10 Thread Nir Aides


Changes by Nir Aides :


--
title: malloc -> threading module can deadlock after fork

___
Python tracker 
<http://bugs.python.org/issue874900>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-12 Thread Nir Aides


Nir Aides  added the comment:

Well, my brain did not deadlock, but after spinning on the problem for a while 
longer, it now thinks Tomaž Šolc and Steffen are right.

We should try to fix the multiprocessing module so it does not deadlock 
single-thread programs and deprecate fork in multi-threaded programs.

Here is the longer version, which is a summary of what people said here in 
various forms, observations from diving into the code and Googling around:


1) The rabbit hole

a) In a multi-threaded program, fork() may be called while another thread is in 
a critical section. That thread will not exist in the child and the critical 
section will remain locked. Any attempt to enter that critical section will 
deadlock the child.

b) POSIX included the pthread_atfork() API in an attempt to deal with the 
problem:
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

c) But it turns out atfork handlers are limited to calling async-signal-safe 
functions since fork may be called from a signal handler:
http://download.oracle.com/docs/cd/E19963-01/html/821-1601/gen-61908.html#gen-95948

This means atfork handlers may not actually acquire or release locks. See 
opinion by David Butenhof who was involved in the standardization effort of 
POSIX threads:
http://groups.google.com/group/comp.programming.threads/msg/3a43122820983fde

d) One consequence is that we can not assume third party libraries are safe to 
fork in multi-threaded program. It is likely their developers consider this 
scenario broken.

e) It seems the general consensus across the WWW concerning this problem is 
that it has no solution and that a fork should be followed by exec as soon as 
possible.

Some references:
http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
http://austingroupbugs.net/view.php?id=62
http://sourceware.org/bugzilla/show_bug.cgi?id=4737
http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them


2) Python’s killer rabbit

The standard library multiprocessing module does two things that force us into 
the rabbit hole; it creates worker threads and forks without exec.

Therefore, any program that uses the multiprocessing module is a 
multi-threading forking program.

One immediate problem is that a multiprocessing.Pool may fork from its worker 
thread in Pool._handle_workers(). This puts the forked child at risk of 
deadlock with any code that was run by the parent’s main thread (the entire 
program logic).

More problems may be found with a code review.

Other modules to look at are concurrent.futures.process (creates a worker 
thread and uses multiprocessing) and socketserver (ForkingMixIn forks without 
exec).


3) God bless the GIL

a) Python signal handlers run synchronously in the interpreter loop of the main 
thread, so os.fork() will never be called from a POSIX signal handler.

This means Python atfork prepare and parent handlers may run any code. The code 
run at the child is still restricted though and may deadlock into any acquired 
locks left behind by dead threads in the standard library or lower level third 
party libraries.

b) Turns out the GIL also helps by synchronizing threads.

Any lock held for the duration of a function call while the GIL is held will be 
released by the time os.fork() is called. But if a thread in the program calls 
a function that yields the GIL we are in la la land again and need to watch out 
step.


4) Landing gently at the bottom

a) I think we should try to review and sanitize the worker threads of the 
multiprocessing module and other implicit worker threads in the standard 
library.

Once we do (and since os.fork() is never run from a POSIX signal handler) the 
multiprocessing library should be safe to use in programs that do not start 
other threads.

b) Then we should declare the user scenario of mixing the threading and 
multiprocessing modules as broken by design.

c) Finally, optionally provide atfork API

The atfork API can be used to refactor existing fork handlers in the standard 
library, provide handlers for modules such as the logging module that will 
reduce the risk of deadlock in existing programs, and can be used by developers 
who insist on mixing threading and forking in their programs.


5) Sanitizing worker threads in the multiprocessing module

TODO :) 

(will try to post some ideas soon)

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-15 Thread Nir Aides


Nir Aides  added the comment:

Here is a morning reasoning exercise - please help find the flaws or refine it:


5) Sanitizing worker threads in the multiprocessing module

Sanitizing a worker thread in the context of this problem is to make sure it 
can not create a state that may deadlock another thread that calls fork(); or 
in other words fork-safe.
Keep in mind that in Python os.fork() is never called from a POSIX signal 
handler.
So what are examples of a fork-safe thread?

a) A thread that spins endlessly doing nothing in a C for(;;) loop is safe.

Another thread may call fork() without restrictions.

b) A Python thread that only calls function that do not yield the GIL and that 
does not acquire locks that are held beyond a Python tick is safe.

An example for such a lock is a critical-section lock acquired by a lower level 
third party library for the duration of a function call.
Such a lock will be released by the time os.fork() is called because of the GIL.

c) A Python thread that in addition to (2) also acquires a lock that is handled 
at fork is safe.

d) A Python thread that in addition to (2) and (3) calls function that yield 
the GIL but while the GIL is released only calls async-signal-safe code.

This is a bit tricky. We know that it is safe for thread A to fork and call 
async-signal-safe functions regardless of what thread B has been doing, but I 
do not know that thread A can fork and call non async-signal-safe functions if 
thread B was only calling async-signal-safe functions.

Nevertheless it makes sense: For example lets assume it isn't true, and that 
hypothetical thread A forked while thread B was doing the async-signal-safe 
function safe_foo(), and then thread A called non async-signal-safe function 
unsafe_bar() and deadlocked.

unsafe_bar() could have deadlocked trying to acquire a lock that was acquired 
by safe_foo(). But if this is so, then it could also happen the other way 
around.
Are there other practical possibilities?

Either way, we could double check and white list the async-signal-safe 
functions we are interested in, in a particular implementation.

e) Socket related functions such as bind() accept() send() and recv(), that 
Python calls without holding the GIL, are all async-signal-safe.

This means that in principle we can have a fork-safe worker thread for the 
purpose of communicating with a forked process using a socket.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4277] asynchat's handle_error inconsistency

2011-07-17 Thread Nir Soffer


Nir Soffer  added the comment:

The idea is good, but seems that error handling should be inlined into 
initiate_send.

Also those 3 special exceptions should be defined once in the module instead of 
repeating them.

--
nosy: +nirs

___
Python tracker 
<http://bugs.python.org/issue4277>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-19 Thread Nir Aides


Nir Aides  added the comment:

> then multiprocessing is completely brain-damaged and has been
> implemented by a moron.

Please do not use this kind of language. 
Being disrespectful to other people hurts the discussion.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-19 Thread Nir Aides


Nir Aides  added the comment:

> (BTW: there are religions without "god", so whom shall e.g. i praise for the 
> GIL?)

Guido? ;)

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-07-28 Thread Nir Aides


Nir Aides  added the comment:

Hi Gregory,

> Gregory P. Smith  added the comment:
> No Python thread is ever fork safe because the Python interpreter itself can 
> never be made fork safe. 
> Nor should anyone try to make the interpreter itself safe. It is too complex 
> and effectively impossible to guarantee.

a) I think the term "Guarantee" is not meaningful here since the interpreter is 
probably too complex to guarantee it does not contain other serious problems.
b) If no Python thread is ever fork safe, can you illustrate how a trivial 
Python thread spinning endlessly might deadlock a child forked by another 
Python thread?

I was not able to find reports of deadlocks clearly related to multiprocessing 
worker threads so they could be practically safe already, to the point other 
Python-Dev developers would be inclined to bury this as a theoretical problem :)

Anyway, there exists at least the problem of forking from the pool worker 
thread and possibly other issues, so the code should be reviewed.
Another latent problem is multiprocessing logging which is disabled by default?


> There is no general solution to this, fork and threading is simply broken in 
> POSIX and no amount of duct tape outside of the OS kernel can fix it. 

This is why we should "sanitize" the multithreading module and deprecate mixing 
of threading and multiprocessing. 
I bet most developers using Python are not even aware of this problem. 
We should make sure they are through documentation.

Here is another way to look at the current situation:

1) Don't use threading for concurrency because of the GIL.
2) Don't mix threading with multiprocessing because threading and forking don't 
mix well.
3) Don't use multiprocessing because it can deadlock.

We should make sure developers are aware of (2) and can use (3) safely***.


> My only desire is that we attempt to do the right thing when possible with 
> the locks we know about within the standard library.

Right, with an atfork() mechanism.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-08-31 Thread Nir Aides


Nir Aides  added the comment:

For the record, turns out there was a bit of misunderstanding. 

I used the term deprecate above to mean "warn users (through documentation) 
that they should not use (a feature)" and not in its Python-dev sense of 
"remove (a feature) after a period of warning".

I do not think the possibility to mix threading and multiprocessing together 
should be somehow forcibly disabled. 

Anyway, since my view does not seem to resonate with core developers I I'll 
give it a rest for now.

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7467] The zipfile module does not check files' CRCs, including in ZipFile.testzip

2010-08-12 Thread Nir Aides


Nir Aides  added the comment:

I think patch may be simplified. Instead of keeping track of CRC offset, invoke 
it directly on the 'data' variable being added to _readbuffer. 

Also the call to _update_crc() before the return from read1() looks redundant.

Finally, is it possible to determine end of file if length of 'data' (computed 
for crc) is 0?

--

___
Python tracker 
<http://bugs.python.org/issue7467>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7467] The zipfile module does not check files' CRCs, including in ZipFile.testzip

2010-08-12 Thread Nir Aides


Nir Aides  added the comment:

But you answered my question with code :) self._file_size is now unused and may 
be removed.

--

___
Python tracker 
<http://bugs.python.org/issue7467>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9962] GzipFile doesn't have peek()

2010-09-30 Thread Nir Aides


Nir Aides  added the comment:

Hi Antoine,

BufferedIOBase is not documented to have peek():
http://docs.python.org/dev/py3k/library/io.html

Small note about patch:
1) IOError string says "read() on write-only...", should be "peek() on 
write-only..." ?
2) Should be min() in self._read(max(self.max_read_chunk, n))

--

___
Python tracker 
<http://bugs.python.org/issue9962>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9962] GzipFile doesn't have peek()

2010-10-01 Thread Nir Aides


Nir Aides  added the comment:

Should be min(n, 1024) instead of max(...)

--

___
Python tracker 
<http://bugs.python.org/issue9962>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9962] GzipFile doesn't have peek()

2010-10-01 Thread Nir Aides


Nir Aides  added the comment:

Right, I missed the change from self.max_read_chunk to 1024 (read_size). Should 
not peek() limit to self.max_read_chunk as read() does?

--

___
Python tracker 
<http://bugs.python.org/issue9962>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10037] multiprocessing.pool processes started by worker handler stops working

2011-04-15 Thread Nir Aides


Changes by Nir Aides :


--
nosy: +nirai

___
Python tracker 
<http://bugs.python.org/issue10037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1625] bz2.BZ2File doesn't support multiple streams

2011-05-09 Thread Nir Aides


Changes by Nir Aides :


--
nosy: +nirai

___
Python tracker 
<http://bugs.python.org/issue1625>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9504] signal.signal/signal.alarm not working as expected

2011-05-09 Thread Nir Aides


Changes by Nir Aides :


--
nosy: +nirai

___
Python tracker 
<http://bugs.python.org/issue9504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11743] Rewrite PipeConnection and Connection in pure Python

2011-05-09 Thread Nir Aides


Changes by Nir Aides :


--
nosy: +nirai

___
Python tracker 
<http://bugs.python.org/issue11743>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9971] Optimize BufferedReader.readinto

2011-05-09 Thread Nir Aides


Changes by Nir Aides :


--
nosy: +nirai

___
Python tracker 
<http://bugs.python.org/issue9971>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-05-09 Thread Nir Aides


Changes by Nir Aides :


--
nosy: +nirai

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-05-12 Thread Nir Aides


Nir Aides  added the comment:

Hi,

There seem to be two alternatives for atfork handlers:
1) acquire locks during prepare phase and unlock them in parent and child after 
fork.
2) reset library to some consistent state in child after fork.

http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

Option (2) makes sense but is probably not always applicable.
Option (1) depends on being able to acquire locks in locking order, but how can 
we determine correct locking order across libraries?

Initializing locks in child after fork without acquiring them before the fork 
may result in corrupted program state and so is probably not a good idea.

On a positive note, if I understand correctly, Python signal handler functions 
are actually run in the regular interpreter loop (as pending calls) after the 
signal has been handled and so os.fork() atfork handlers will not be restricted 
to async-signal-safe operations (since a Python fork is never done in a signal 
handler).

http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
"It is therefore undefined for the fork handlers to execute functions that are 
not async-signal-safe when fork() is called from a signal handler."

Opinion by Butenhof who was involved in the standardization effort of POSIX 
threads:
http://groups.google.com/group/comp.programming.threads/msg/3a43122820983fde

...so how can we establish correct (cross library) locking order during prepare 
stage?

Nir

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-05-14 Thread Nir Aides


Nir Aides  added the comment:

I think that generally it is better to deadlock than corrupt data.

> 2) acquiring locks just before fork is probably one of the best way to
> deadlock (acquiring a lock we already hold, or acquiring a lock needed
> by another thread before it releases its own lock). Apart from adding
> dealock avoidance/recovery mechanisms - which would be far from
> trivial - I don't see how we could solve this, given that each library
> can use its own locks, not counting the user-created ones

a) We know the correct locking order in Python's std libraries so the problem 
there is kind of solved.

b) We can put the burden of other locks on application developers and since 
currently no one registers atfork handlers, there is no problem there yet.

> 4) Python locks differ from usual mutexes/semaphores in that they can
> be held for quite some time (for example while performing I/O). Thus,
> acquiring all the locks could take a long time, and users might get
> irritated if fork takes 2 seconds to complete.

We only need a prepare handler to acquire locks that protect data from 
corruption.

A lock synchronizing IO which is held for long periods may possibly be 
initialized in child without being acquired in a prepare handler; for example, 
a lock serializing logging messages.

In other cases or in general an atfork handler may reset or reinitialize a 
library without acquiring locks in a prepare handler.

> 5) Finally, there's a fundamental problem with this approach, because
> Python locks can be released by a thread other than the one that owns
> it.
> Imagine this happens:
>
> T1 T2
>  lock.acquire()
>  (do something without releasing lock)
> fork()
> lock.release()
>
> This is perfectly valid with the current lock implementation (for
> example, it can be used to implement a rendez-vous point so that T2
> doesn't start processing before T1 forked worker processes, or
> whatever).
> But if T1 tries to acquire lock (held by T2) before fork, then it will
> deadlock, since it will never be release by T2.

I think we do not need to acquire rendezvous locks in a prepare handler.

> > Initializing locks in child after fork without acquiring them before the
> > fork may result in corrupted program state and so is probably not a good
> > idea.
>
> Yes, but in practise, I think that this shouldn't be too much of a
> problem. Also note that you can very well have the same type of
> problem with sections not protected explicitely by locks: for example,
> if you have a thread working exclusively on an object (maybe part of a
> threadpool), a fork can very well happen while the object is in an
> inconsistent state. Acquiring locks before fork won't help that.

I think a worker thread that works exclusively on an object does not create the 
problem:
a) If the fork thread eventually needs to read the object then you need 
synchronization.
b) If the worker thread eventually writes data into file or DB then that 
operation will be completed at the parent process.

To summarize I think we should take the atfork path. An atfork handler does not 
need to acquire all locks, but only those required by library logic, which the 
handler is aware of, and as a bonus it can be used to do all sort of stuff such 
as cleaning up, reinitializing a library, etc...

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-05-15 Thread Nir Aides


Nir Aides  added the comment:

Is it possible the following issue is related to this one?
http://bugs.python.org/issue10037 - "multiprocessing.pool processes started by 
worker handler stops working"

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

2011-05-16 Thread Nir Aides


Nir Aides  added the comment:

Steffen, can you explain in layman's terms?

On Sun, May 15, 2011 at 8:03 PM, Steffen Daode Nurpmeso 
 wrote:
>
> @ Charles-François Natali wrote (2011-05-15 01:14+0200):
>> So if we really wanted to be safe, the only solution would be to
>> forbid fork() in a multi-threaded program.
>> Since it's not really a reasonable option
>
> But now - why this?  The only really acceptable thing if you have
> control about what you are doing is the following:
>
> class SMP::Process
>/*!
>* \brief Daemonize process.
>*[.]
>* \note
>* The implementation of this function is not trivial.
>* To avoid portability no-goes and other such problems,
>* you may \e not call this function after you have initialized
>* Thread::enableSMP(),
>* nor may there (have) be(en) Child objects,
>* nor may you have used an EventLoop!
>* I.e., the process has to be a single threaded, "synchronous" one.
>* [.]
>*/
>pub static si32 daemonize(ui32 _daemon_flags=df_default);
>
> namespace SMP::POSIX
>/*!
>* \brief \fn fork(2).
>*[.]
>* Be aware that this passes by all \SMP and Child related code,
>* i.e., this simply \e is the system-call.
>* Signal::resetAllSignalStates() and Child::killAll() are thus if
>* particular interest; thread handling is still entirely up to you.
>*/
>pub static sir fork(void);
>
> Which kind of programs cannot be written with this restriction?

--

___
Python tracker 
<http://bugs.python.org/issue6721>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7610] Cannot use both read and readline method in same ZipExtFile object

2010-01-17 Thread Nir Aides


Nir Aides  added the comment:

> I do not find the existing phrasing in the IO docs ambiguous, but since 
> it is obviously possible to misinterpret it it would be good to clarify 
> it.  Can you suggest an alternate phrasing that would be clearer?

Replace 'may' with 'will' or 'shall' everywhere the context indicates a 
mandatory requirement.

Since this possibly affects the entire Python documentation, does it make sense 
to discuss this on python-dev?

--

___
Python tracker 
<http://bugs.python.org/issue7610>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7610] Cannot use both read and readline method in same ZipExtFile object

2010-01-18 Thread Nir Aides


Nir Aides  added the comment:

Uploaded an updated patch with read() which calls underlying stream enough 
times to satisfy required read size.

--
Added file: http://bugs.python.org/file15941/zipfile_7610_py27_v5.diff

___
Python tracker 
<http://bugs.python.org/issue7610>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7610] Cannot use both read and readline method in same ZipExtFile object

2010-01-18 Thread Nir Aides


Nir Aides  added the comment:

Right, removed MAX_N from read(); remains in read1().
If good, what versions of Python is this patch desired for?

--
Added file: http://bugs.python.org/file15949/zipfile_7610_py27_v6.diff

___
Python tracker 
<http://bugs.python.org/issue7610>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1068268] subprocess is not EINTR-safe

2010-01-23 Thread Nir Soffer


Changes by Nir Soffer :


--
nosy: +nirs

___
Python tracker 
<http://bugs.python.org/issue1068268>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7610] Cannot use both read and readline method in same ZipExtFile object

2010-01-27 Thread Nir Aides


Nir Aides  added the comment:

The related scenario is a system without zlib. How do you suggest simulating 
this in test?

--

___
Python tracker 
<http://bugs.python.org/issue7610>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7610] Cannot use both read and readline method in same ZipExtFile object

2010-01-27 Thread Nir Aides


Nir Aides  added the comment:

Unconsumed data is compressed data. If the part which handles unconsumed data 
does not work when zlib is available, then the existing tests would fail. In 
any case the unconsumed buffer is an implementation detail of zipfile.

I see a point in adding a test to make sure zipfile behaves as expected when 
zlib is not available, but how?

Also, on which systems is zlib missing? I don't see this mentioned in the zlib 
docs.

--

___
Python tracker 
<http://bugs.python.org/issue7610>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7610] Cannot use both read and readline method in same ZipExtFile object

2010-01-28 Thread Nir Aides


Nir Aides  added the comment:

I actually meant how would you simulate zlib's absence on a system in which it 
is present?

--

___
Python tracker 
<http://bugs.python.org/issue7610>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-15 Thread Nir Aides


Changes by Nir Aides :


--
nosy: +nirai

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-16 Thread Nir Aides


Nir Aides  added the comment:

I tried Florent's modification to the write test and did not see the effect on 
my machine with an updated revision of Python32.

I am running Ubuntu Karmic 64 bit.
7s - no background threads.
20s - one background thread.

According to the following documentation the libc condition is using scheduling 
policy when waking a thread and not FIFO order:
The following documentation suggests ordering in Linux is not FIFO:
http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html#tag_03_518_08_06

I upload a quick and dirty patch (linux-7946.patch) to the new GIL just to 
reflect this by avoiding the timed waits.  

On my machine it behaves reasonably both with the TCP server and with the write 
test, but so does unpatched Python 3.2.

I noticed high context switching rate with dave's priority GIL - with both 
tests it goes above 40K/s context switches.

--
Added file: http://bugs.python.org/file16567/linux-7946.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-16 Thread Nir Aides


Nir Aides  added the comment:

I updated the patch with a small fix and increased the ticks 
countdown-to-release considerably. This seems to help the OS classify CPU bound 
threads as such and actually improves IO performance.

--
Added file: http://bugs.python.org/file16570/linux-7946.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-23 Thread Nir Aides


Nir Aides  added the comment:

I upload bfs.patch

To apply the patch use the following commands on updated python 3.2:
$ patch -fp1 < bfs.patch
$ ./configure

The patch replaces the GIL with a scheduler. The scheduler is a simplified 
implementation of the recent kernel Brain F**k Scheduler by the Linux hacker 
Con Kolivas:

http://ck.kolivas.org/patches/bfs/sched-BFS.txt
"The goal of the Brain Fuck Scheduler, referred to as BFS from here on, is to
completely do away with the complex designs of the past for the cpu process
scheduler and instead implement one that is very simple in basic design.
The main focus of BFS is to achieve excellent desktop interactivity and
responsiveness without heuristics and tuning knobs that are difficult to
understand, impossible to model and predict the effect of, and when tuned to
one workload cause massive detriment to another."

Con Kolivas is the hacker whose work inspired the current CFS scheduler of the 
Linux Kernel. 

On my core 2 duo laptop it performs as follows compared to the other patches:

1) Florent's writenums() test: ~same
2) UDP test: x6 faster
3) cpued test: works as expected, while the other patches starve the pure 
python threads.

cpued test spins 3 threads, 2 of them pure python and the 3rd does 
time.sleep(0) every ~1ms:

import threading
import time

def foo(n):
while n > 0:
'y' in 'x' * n
n -= 1

def bar(sleep, name):
for i in range(100):
print (name, i, sleep)
for j in range(300):
foo(1500)
if sleep:
time.sleep(0)

t0 = threading.Thread(target=bar, args=(False, 't0'))
t1 = threading.Thread(target=bar, args=(False, 't1'))
t2 = threading.Thread(target=bar, args=(True, 't2-interactive'))

list(map(threading.Thread.start, [t0, t1, t2]))
list(map(threading.Thread.join, [t0, t1, t2]))


The patch is still work in progress. In particular:
1) I still need to add support for Windows.
2) It currently requires Posix clock_gettime() and assumes good timer 
resolution.
3) I only verified it builds on Ubuntu Karmic 64bit.
4) I still need to optimize it and address cleanup.

The scheduler is very simple, straight forward and flexible, and it addresses 
the tuning problems discussed recently.

I think it can be a good replacement to the GIL, since Python really needs a 
scheduler, not a lock.

--
Added file: http://bugs.python.org/file16634/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Nir Aides  added the comment:

I upload an updated bfs.patch. Apply to updated py32 and ignore the error with:

$ patch -fp1 < bfs.patch
$ ./configure


> Please give understandable benchmark numbers, including an explicit 
> comparison with baseline 3.2, and patched 3.2 (e.g. gilinter.patch)

Below.

> Please also measure single-thread performance, because it looks like you are 
> adding significant work inside the core eval loop

Removed most of it now. last bit will be removed soon.

> Do you need a hi-res clock? gettimeofday() already gives you microseconds. It 
> looks like a bit of imprecision shouldn't be detrimental.

I use clock_gettime() to get the thread running time to calculate slice 
depletion. Wall clock can not help with that.

> The magic number DEADLINE_FACTOR looks gratuitous (why 1.1^20 ?) 

To my understanding it controls the CPU load (~6) beyond which threads tend to 
expire. Since expired threads are handled in FIFO order, IO threads do not 
preempt them (IO threads are chronically expired). So beyond that load IO 
threads become less responsive.

> By the way, I would put COND_SIGNAL inside the LOCK_MUTEX / UNLOCK_MUTEX pair 
> in bfs_yield().

Done.

Here are benchmark results of the UDP test as timed with ipython, where 
client.work() is a single run of the client:

System: Core 2 Duo (locked at 2.4 GHz) with Ubuntu Karmic 64 bit.

Vanilla Python 3.2: 

* Note on my system the original problem discussed in this issue report does 
not manifest since conditions wake up threads according to OS scheduling policy.

In [28]: %timeit -n3 client.work()
1.290 seconds (8127084.435 bytes/sec)
1.488 seconds (7045285.926 bytes/sec)
2.449 seconds (4281485.217 bytes/sec)
1.874 seconds (5594303.222 bytes/sec)
1.853 seconds (5659626.496 bytes/sec)
0.872 seconds (12023425.779 bytes/sec)
4.951 seconds (2117942.079 bytes/sec)
0.728 seconds (14409157.126 bytes/sec)
1.743 seconds (6016999.707 bytes/sec)
3 loops, best of 3: 1.53 s per loop

gilinter.patch:

In [31]: %timeit -n3 client.work()
5.192 seconds (2019676.396 bytes/sec)
1.613 seconds (6500071.475 bytes/sec)
3.057 seconds (3429689.199 bytes/sec)
3.486 seconds (3007596.468 bytes/sec)
4.324 seconds (2424791.868 bytes/sec)
0.964 seconds (10872708.606 bytes/sec)
3.510 seconds (2987722.960 bytes/sec)
1.362 seconds (7698999.458 bytes/sec)
1.013 seconds (10353913.920 bytes/sec)
3 loops, best of 3: 1.96 s per loop

PyCON patch:

In [32]: %timeit -n3 client.work()
2.483 seconds (4223256.889 bytes/sec)
1.330 seconds (7882880.263 bytes/sec)
1.737 seconds (6036251.315 bytes/sec)
1.348 seconds (7778296.679 bytes/sec)
0.983 seconds (10670811.638 bytes/sec)
1.419 seconds (7387226.333 bytes/sec)
1.057 seconds (9919412.977 bytes/sec)
2.483 seconds (4223205.791 bytes/sec)
2.121 seconds (4944231.292 bytes/sec)
3 loops, best of 3: 1.25 s per loop

bfs.patch:

In [33]: %timeit -n3 client.work()
0.289 seconds (36341875.356 bytes/sec)
0.271 seconds (38677439.991 bytes/sec)
0.476 seconds (22033958.947 bytes/sec)
0.329 seconds (31872974.070 bytes/sec)
0.478 seconds (21925125.894 bytes/sec)
0.242 seconds (43386204.271 bytes/sec)
0.213 seconds (49195701.418 bytes/sec)
0.309 seconds (33967467.196 bytes/sec)
0.256 seconds (41008076.688 bytes/sec)
3 loops, best of 3: 259 ms per loop


Output of cpued.py test:

Vanilla Python 3.2, gilinter.patch and PyCON patch all starve the pure Python 
threads and output the following:

$ ~/build/python/python32/python cpued.py 
t0 0 False
t1 0 False
t2-interactive 0 True
t2-interactive 1 True
t2-interactive 2 True
t2-interactive 3 True
t2-interactive 4 True
t2-interactive 5 True
t2-interactive 6 True
t2-interactive 7 True
.
.
.


Output from bfs.patch run:

$ ~/build/python/bfs/python cpued.py 
t0 0 False
t1 0 False
t2-interactive 0 True
t0 1 False
t1 1 False
t2-interactive 1 True
t0 2 False
t1 2 False
t2-interactive 2 True
t0 3 False
t1 3 False
t2-interactive 3 True
.
.
.

Note: I have not tested on other Posix systems, and expect to have some 
complications on Windows, since its thread timers are low resolution (10ms+), 
and there are issues with its high-precision wall clock. ...will soon know 
better.

--
Added file: http://bugs.python.org/file16644/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16634/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16567/linux-7946.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Nir Aides  added the comment:

> Ouch. CLOCK_THREAD_CPUTIME_ID is not a required part of the standard. Only 
> CLOCK_REALTIME is guaranteed to exist.

Right, however the man page at kernel.org says the following on 
CLOCK_THREAD_CPUTIME_ID: 
"Sufficiently recent versions of glibc and the Linux kernel support the 
following clocks"
http://www.kernel.org/doc/man-pages/online/pages/man2/clock_getres.2.html

The same statement shows up as early as 2003:
http://www.tin.org/bin/man.cgi?section=3&topic=clock_gettime

However, if this is indeed a problem on some systems (none Linux?), then a fall 
back could be attempted for them. 

There could also be a problem on systems where the counter exists but has low 
resolution 10ms+

What platforms do you think this could be a problem on?

> By the way, it's not obvious cpued tests anything meaningful. I understand 
> the bias you are trying to avoid but constructing artificial test cases is 
> not very useful, because we are playing with heuristics and it's always 
> possible to defeat some expectations. That's why benchmarks should try to 
> model/represent real-world situations.

I came up with cpued.py after reading the patches in an attempt to understand 
how they behave. In this case one thread is pure Python while the other 
occasionally releases the GIL, both CPU bound. I don't claim this is a 
real-world situation. However, it is a case in which bfs.patch behaves as 
expected.

> I've tried ccbench with your patch and there's a clear regression in latency 
> numbers.

Please specify system and test details so I can try to look into it. On my 
system ccbench behaves as expected:

$ ~/build/python/bfs/python ccbench.py
== CPython 3.2a0.0 (py3k) ==
== x86_64 Linux on '' ==

--- Throughput ---

Pi calculation (Python)

threads=1: 1252 iterations/s.
threads=2: 1199 ( 95 %)
threads=3: 1178 ( 94 %)
threads=4: 1173 ( 93 %)

regular expression (C)

threads=1: 491 iterations/s.
threads=2: 478 ( 97 %)
threads=3: 472 ( 96 %)
threads=4: 477 ( 97 %)

SHA1 hashing (C)

threads=1: 2239 iterations/s.
threads=2: 3719 ( 166 %)
threads=3: 3772 ( 168 %)
threads=4: 3464 ( 154 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 1 ms.)
CPU threads=1: 0 ms. (std dev: 1 ms.)
CPU threads=2: 0 ms. (std dev: 1 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 6 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 2 ms.)
CPU threads=3: 1 ms. (std dev: 0 ms.)
CPU threads=4: 5 ms. (std dev: 7 ms.)

Background CPU task: SHA1 hashing (C)

CPU threads=0: 0 ms. (std dev: 1 ms.)
CPU threads=1: 0 ms. (std dev: 1 ms.)
CPU threads=2: 0 ms. (std dev: 1 ms.)
CPU threads=3: 1 ms. (std dev: 1 ms.)
CPU threads=4: 1 ms. (std dev: 0 ms.)

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Nir Aides  added the comment:

> It's a dual-core Linux x86-64 system. But, looking at the patch again, the 
> reason is obvious:
>
> #define CHECK_SLICE_DEPLETION(tstate) (bfs_check_depleted || (tstate
> >tick_counter % 1000 == 0))
>
> `tstate->tick_counter % 1000` is replicating the behaviour of the old GIL, 
> which based its speculative operation on the number of elapsed opcodes (and 
> which also gave bad latency numbers on the regex workload).

The flag_check_depleted is there to work around this problem. It is raised by 
waiters which timeout.

What distribution and version of GNU/Linux are you using?

As for the CLOCK_THREAD_CPUTIME_ID clock, support was added to FreeBSD recently 
in version 7.1, which I guess is not good enough:
http://www.freebsd.org/releases/7.1R/relnotes.html

I did not find yet anything on Solaris. Do you know of an alternative way to 
measure thread running time on Posix?

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Nir Aides  added the comment:

Well, on initial check the scheduler seems to work well with regular 
gettimeofday() wall clock instead of clock_gettime().

:)

/* Return thread running time in seconds (with nsec precision). */
static inline long double get_thread_timestamp(void) {
return get_timestamp(); // wall clock via gettimeofday()
/*struct timespec ts;
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
return (long double) ts.tv_sec + ts.tv_nsec * 0.1;*/
}

Does it make things better on your system?

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16644/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-25 Thread Nir Aides


Nir Aides  added the comment:

Uploaded an updated bfs.patch

The latency problem was related to the --with-computed-gotos flag. I fixed it 
and it seems to work fine now. 

I also switched to gettimeofday() so it should work now on all Posix with high 
resolution timer.

--
Added file: http://bugs.python.org/file16663/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-26 Thread Nir Aides


Nir Aides  added the comment:

> But on a busy system, won't measuring wall clock time rather than CPU time 
> give bogus results?

This was the motivation for using clock_gettime(). I tried the wall clock 
version under load (including on single core system) and it seems to behave. 
Now it remains to rationalize it :)

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-27 Thread Nir Aides


Nir Aides  added the comment:

gilinter.patch has good IO latency in UDP test on my system when built with 
--with-computed-gotos:

In [34]: %timeit -n3 client.work()
0.320 seconds (32782026.509 bytes/sec)
0.343 seconds (30561727.443 bytes/sec)
0.496 seconds (21154075.417 bytes/sec)
0.326 seconds (32171215.998 bytes/sec)
0.462 seconds (22701809.421 bytes/sec)
0.378 seconds (27722146.793 bytes/sec)
0.391 seconds (26826713.409 bytes/sec)
0.315 seconds (5858.720 bytes/sec)
0.281 seconds (37349508.136 bytes/sec)
3 loops, best of 3: 329 ms per loop

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-27 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16663/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-27 Thread Nir Aides


Nir Aides  added the comment:

I update bfs.patch. It now builds on Windows (and Posix).

--
Added file: http://bugs.python.org/file16679/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-27 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16679/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-27 Thread Nir Aides


Changes by Nir Aides :


Added file: http://bugs.python.org/file16680/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-31 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16680/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-03-31 Thread Nir Aides


Nir Aides  added the comment:

I upload a new update to bfs.patch which improves scheduling and reduces 
overhead.

--
Added file: http://bugs.python.org/file16710/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-08 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16710/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-08 Thread Nir Aides


Nir Aides  added the comment:

Uploaded an update.

--
Added file: http://bugs.python.org/file16830/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-16 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16830/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-16 Thread Nir Aides


Nir Aides  added the comment:

I uploaded an update to bfs.patch which improves behavior in particular on 
non-Linux multi-core (4+) machines. 

Hi Charles-Francois, Thanks for taking the time to review this patch!

> - nothing guarantees that you'll get a msec resolution

Right, the code should behave well with low precision clocks as long as short 
(sub-tick) tasks are not synchronized with the slice interval. There is a 
related discussion of this problem in schedulers in the section on sub-tick 
accounting in: http://ck.kolivas.org/patches/bfs/sched-BFS.txt

On which target systems can we expect not to have high precision clock?

> - gettimeofday returns you wall clock time: if a process that modifies time 
> is running, e.g. ntpd, you'll likely to run into trouble. the value returned 
> is _not_ monotonic, but clock_gettime(CLOCK_MONOTONIC) is
> - inline functions are used, but it's not ANSI
> - static inline long double get_timestamp(void) {
>struct timeval tv;
>GETTIMEOFDAY(&tv);
>return (long double) tv.tv_sec + tv.tv_usec * 0.01;
> }

I added timestamp capping to the code. timestamp is used for waiting and 
therefore I think the source should be either CLOCK_REALTIME or gettimeofday().

> > `tstate->tick_counter % 1000` is replicating the behaviour of the old GIL, 
> > which based its speculative operation on the number of elapsed opcodes (and 
> > which also gave bad latency numbers on the regex workload).
>
> I find this suspicous too. I haven't looked at the patch in detail, but what 
> does the number of elapsed opcodes offers you over the timesplice expiration 
> approach ?

More accurate yielding. It is possible a better mechanism can be thought of 
and/or maybe it is indeed redundant.

> It is thus recommended that a condition wait be enclosed in the equivalent of 
> a "while loop" that checks the predicate."

Done.

--
Added file: http://bugs.python.org/file16947/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-17 Thread Nir Aides


Nir Aides  added the comment:

> the scheduling function bfs_find_task returns the first task that 
> has an expired deadline. since an expired deadline probably means 
> that the scheduler hasn't run for a while, it might be worth it to 
> look for the thread with the oldest deadline and serve it first, 
> instead of stopping at the first one

This is by design of BFS as I understand it. Next thread to run is either first 
expired or oldest deadline:

http://ck.kolivas.org/patches/bfs/sched-BFS.txt
"Once a task is descheduled, it is put back on the queue, and an
O(n) lookup of all queued-but-not-running tasks is done to determine which has
the earliest deadline and that task is chosen to receive CPU next. The one
caveat to this is that if a deadline has already passed (jiffies is greater
than the deadline), the tasks are chosen in FIFO (first in first out) order as
the deadlines are old and their absolute value becomes decreasingly relevant
apart from being a flag that they have been asleep and deserve CPU time ahead
of all later deadlines."

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-17 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16947/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-17 Thread Nir Aides


Changes by Nir Aides :


Added file: http://bugs.python.org/file16967/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-17 Thread Nir Aides


Nir Aides  added the comment:

Yet another update to bfs.patch.

I upload a variation on Florent's write test which prints progress of 
background CPU bound threads as: thread-name timestamp progress

Here are some numbers from Windows XP 32bit with Intel q9400 (4 cores). Builds 
produced with VS express (alas - no optimizations, so official builds may 
behave differently).

BFS - 
33% CPU, 37,000 context switches per second

z:\bfs\PCbuild\python.exe y:\writes.py
t1 2.34400010109 0
t2 2.4213134 0
t1 4.6713134 1
t2 4.7963134 1
t1 7.0163242 2
t2 7.2036866 2
t1 9.375 3
t2 9.625 3
t1 11.703962 4
t2 12.0309998989 4
t1 14.046313 5
t2 14.421313 5
t1 16.407648 6
t2 16.7809998989 6
t1 18.782648 7
t2 19.125 7
t1 21.157648 8
t2 21.483676 8
t1 23.5 9
t2 23.858676 9
t1 25.858951 10
t2 26.233676 10
t1 28.2349998951 11
28.2189998627

gilinter - starves both bg threads and high rate of context switches.
45% CPU, 203,000 context switches per second

z:\gilinter\PCbuild\python.exe y:\writes.py
t1 13.0939998627 0
t1 26.421313 1
t1 39.812638 2
t1 53.1559998989 3
57.5470001698

PyCON - starves one bg thread and slow IO thread 
Py32 - starves IO thread as expected.

Note, PyCON, gilinter and py32 starve the bg thread with Dave's original 
buffered write test as well - http://bugs.python.org/issue7946#msg101116

--
Added file: http://bugs.python.org/file16968/writes.py

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-26 Thread Nir Aides


Nir Aides  added the comment:

Dave, there seems to be a bug in your patch on Windows XP. It crashes in 
ccbench.py with the following output:

>python_d.exe y:\ccbench.py
== CPython 3.2a0.0 (py3k) ==
== x86 Windows on 'x86 Family 6 Model 23 Stepping 10, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   840 iterations/s. balance
Fatal Python error: ReleaseMutex(mon_mutex) failed
threads= 2:   704 ( 83%)0.8167
threads= 3:   840 (100%)1.6706
threads= 4:   840 (100%)2.

and the following stack trace:

ntdll.dll!7c90120e()
[Frames below may be incorrect and/or missing, no symbols loaded for 
ntdll.dll] 
python32_d.dll!Py_FatalError(const char * msg)  Line 2033   C
>   python32_d.dll!gil_monitor(void * arg)  Line 314 + 0x24 bytes   C
python32_d.dll!bootstrap(void * call)  Line 122 + 0x7 bytes C
msvcr100d.dll!_callthreadstartex()  Line 314 + 0xf bytesC
msvcr100d.dll!_threadstartex(void * ptd)  Line 297  C
kernel32.dll!7c80b729()

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-27 Thread Nir Aides

Nir Aides  added the comment:

On Tue, Apr 27, 2010 at 12:23 PM, Charles-Francois Natali wrote:

> @nirai
> I have some more remarks on your patch:
> - /* Diff timestamp capping results to protect against clock differences
>  * between cores. */
> _LOCAL(long double) _bfs_diff_ts(long double ts1, long double ts0) {
>
> I'm not sure I understand. You can have problem with multiple cores when 
> reading directly the 
> TSC register, but that doesn't affect gettimeofday. gettimeofday should be 
> reliable and accurate 
> (unless the OS is broken of course), the only issue is that since it's wall 
> clock time, if a process 
> like ntpd is running, then you'll run into problem

I think gettimeofday() might return different results on different cores as 
result of kernel/hardware problems or clock drift issues in VM environments:
http://kbase.redhat.com/faq/docs/DOC-7864
https://bugzilla.redhat.com/show_bug.cgi?id=461640

In Windows the high-precision counter might return different results on 
different cores in some hardware configurations (older multi-core processors). 
I attempted to alleviate these problems by using capping and by using a "python 
time" counter constructed from accumulated slices, with the assumption that IO 
bound threads are unlikely to get migrated often between cores while running. I 
will add references to the patch docs.

> - did you experiment with the time slice ? I tried some higher values and got 
> better results, 
> without penalizing the latency. Maybe it could be interesting to look at it 
> in more detail (and 
> on various platforms).

Can you post more details on your findings? It is possible that by using a 
bigger slice, you helped the OS classify CPU bound threads as such and improved 
"synchronization" between BFS and the OS scheduler.

Notes on optimization of code taken, thanks.

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-28 Thread Nir Aides

Nir Aides  added the comment:

On Wed, Apr 28, 2010 at 12:41 AM, Larry Hastings wrote:

> The simple solution: give up QPC and use timeGetTime() with 
> timeBeginPeriod(1), which is totally 
> reliable but only has millisecond accuracy at best.

It is preferable to use a high precision clock and I think the code addresses 
the multi-core time skew problem (pending testing).

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-28 Thread Nir Aides


Nir Aides  added the comment:

Dave, there seems to be some problem with your patch on Windows:

F:\dev>z:\dabeaz-wcg\PCbuild\python.exe y:\ccbench.py -b
== CPython 3.2a0.0 (py3k) ==
== x86 Windows on 'x86 Family 6 Model 23 Stepping 10, GenuineIntel' ==

--- I/O bandwidth ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 8551.2 packets/s.
CPU threads=1: 26.1 ( 0 %)
CPU threads=2: 26.0 ( 0 %)
CPU threads=3: 37.2 ( 0 %)
CPU threads=4: 33.2 ( 0 %)

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-28 Thread Nir Aides


Nir Aides  added the comment:

On Thu, Apr 29, 2010 at 2:03 AM, David Beazley wrote:

> Wow, that is a *really* intriguing performance result with radically 
> different behavior than Unix.  Do you have any ideas of what might be causing 
> it?

Instrument the code and I'll send you a trace.

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-04-30 Thread Nir Aides


Nir Aides  added the comment:

Dave, 

The behavior of your patch on Windows XP/2003 (and earlier) might be related to 
the way Windows boosts thread priority when it is signaled. 

Try to increase priority of monitor thread and slice size. Another thing to 
look at is how to prevent Python CPU bound threads from (starving) messing up 
scheduling of threads of other processes. Maybe increasing slice significantly 
can help in this too (50ms++ ?).

XP/NT/CE scheduling and thread boosting affect all patches and the current GIL 
undesirably (in different ways). Maybe it is possible to make your patch work 
nicely on these systems:
http://www.sriramkrishnan.com/blog/2006/08/tale-of-two-schedulers-win_115489794858863433.html

Vista and Windows 7 involve CPU cycle counting which results in more sensible 
scheduling:
http://technet.microsoft.com/en-us/magazine/2007.02.vistakernel.aspx

--

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-05-03 Thread Nir Aides


Changes by Nir Aides :


Added file: http://bugs.python.org/file17194/nir-ccbench-xp32.log

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-05-03 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file16967/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-05-03 Thread Nir Aides


Nir Aides  added the comment:

I updated bfs.patch with improvements on Windows XP. 

The update disables priority boosts associated with the scheduler condition on 
Windows for CPU bound threads.

Here is a link to ccbench results:
http://bugs.python.org/file17194/nir-ccbench-xp32.log

Summary:

Windows XP 32bit q9400 2.6GHz Release build (no PG optimizations).
Test runs in background, ccbench modified to run both bz2 and sha1.

bfs.patch - seems to behave.

gilinter2.patch
single core: high latency, low IO bandwidth.

dabeaz_gil.patch 
single core: low IO bandwidth.
4 cores: throughput threads starvation (balance), some latency, low IO 
bandwidth.

--
Added file: http://bugs.python.org/file17195/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-05-14 Thread Nir Aides


Changes by Nir Aides :


Removed file: http://bugs.python.org/file17195/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7946] Convoy effect with I/O bound threads and New GIL

2010-05-14 Thread Nir Aides


Nir Aides  added the comment:

Duck, here comes another update to bfs.patch.

This one with some cleanups which simplify the code and improve behavior (on 
Windows XP), shutdown code, comments, and "experimental" use of TSC for 
timestamps, which eliminates timestamp reading overhead.

TSC (http://en.wikipedia.org/wiki/Time_Stamp_Counter) is a fast way to get high 
precision timing read. On some systems this is what gettimeofday() uses under 
the hood while on other systems it will use HPET or another source which is 
slower, typically ~1usec, but can be higher (e.g. my core 2 duo laptop 
occasionally goes into a few hours of charging 3usec per HPET gettimeofday() 
call - god knows why)

This overhead is incurred twice for every GIL release/acquire pair and can be 
eliminated with:
1) Hack the scheduler not to call gettimeofday() when no other threads are 
waiting to run, or
2) Use TSC on platforms it is available (the Linux BFS scheduler uses TSC).

I took cycle.h pointed by the Wikipedia article on TSC for a spin and it works 
well on my boxes. It is BSD, (un)maintained? and includes implementation for a 
gazillion of platforms (I did not yet modify configure.in as it recommends). 

If it breaks on your system please ping with details.

Some benchmarks running (Florent's) writes.py on Core 2 Quad q9400 Ubuntu 64bit:

bfs.patch - 35K context switches per second, threads balanced, runtime is 3 
times that of running IO thread alone:

~/dev/python$ ~/build/python/bfs/python writes.py
t1 1.60293507576 1
t2 1.78533816338 1
t1 2.88939499855 2
t2 3.19518113136 2
t1 4.38062310219 3
t2 4.70725703239 3
t1 6.26874804497 4
t2 6.4078810215 4
t1 7.83273100853 5
t2 7.92976212502 5
t1 9.4341750145 6
t2 9.57891893387 6
t1 11.077393055 7
t2 11.164755106 7
t2 12.8495900631 8
t1 12.8979620934 8
t1 14.577999115 9
t2 14.5791089535 9
t1 15.9246580601 10
t2 16.1618289948 10
t1 17.365830183 11
t2 17.7345991135 11
t1 18.9782481194 12
t2 19.2790091038 12
t1 20.4994370937 13
t2 20.5710251331 13
21.0179870129


dabeaz_gil.patch - sometimes runs well but sometimes goes into high level of 
context switches (250K/s) and produces output such as this:

~/dev/python$ ~/build/python/dabeaz/python writes.py 
t1 0.742760896683 1
t1 7.50052189827 2
t2 8.63794493675 1
t1 10.1924870014 3
17.9419858456

gilinter2.patch - 300K context switches per second, bg threads starved:

~/dev/python$ ~/build/python/gilinter/python writes.py 
t2 6.1153190136 1
t2 11.7834780216 2
14.5995650291

--
Added file: http://bugs.python.org/file17330/bfs.patch

___
Python tracker 
<http://bugs.python.org/issue7946>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27879] add os.syncfs()

2021-10-10 Thread Nir Soffer



Nir Soffer  added the comment:

Updating python version, this is not relevant to 3.6 now.

On linux users can use "sync --file-system /path" but it would be nice if we 
have something that works on multiple platforms.

--
nosy: +nirs
versions: +Python 3.11 -Python 3.6

___
Python tracker 
<https://bugs.python.org/issue27879>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

1 2 3 >

1 - 100 of 228 matches

Mail list logo