[ python-Bugs-1646068 ] Dict lookups fail if sizeof(Py_ssize_t) < sizeof(long)

2007-02-04 Thread SourceForge.net
Bugs item #1646068, was opened at 2007-01-27 18:23
Message generated for change (Comment added) made by ked-tao
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1646068&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.5
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: ked-tao (ked-tao)
Assigned to: Tim Peters (tim_one)
Summary: Dict lookups fail if sizeof(Py_ssize_t) < sizeof(long)

Initial Comment:
Portation problem.

Include/dictobject.h defines PyDictEntry.me_hash as a Py_ssize_t. Everywhere 
else uses a C 'long' for hashes.

On the system I'm porting to, ints and pointers (and ssize_t) are 32-bit, but 
longs and long longs are 64-bit. Therefore, the assignments to me_hash truncate 
the hash and subsequent lookups fail.

I've changed the definition of me_hash to 'long' and (in Objects/dictobject.c) 
removed the casting from the various assignments and changed the definition of 
'i' in dict_popitem(). This has fixed my immediate problems, but I guess I've 
just reintroduced whatever problem it got changed for. The comment in the 
header says:

/* Cached hash code of me_key.  Note that hash codes are C longs.
 * We have to use Py_ssize_t instead because dict_popitem() abuses
 * me_hash to hold a search finger.
 */

... but that doesn't really explain what it is about dict_popitem() that 
requires the different type.

Thanks. Kev.

--

>Comment By: ked-tao (ked-tao)
Date: 2007-02-04 14:11

Message:
Logged In: YES 
user_id=1703158
Originator: YES

Hi Jim. I understand what the problem is (perhaps I didn't state it
clearly enough) - me_hash is a cache of the dict item's hash which is
compared against the hash of the object being looked up before going any
further with expensive richer comparisons. On my system, me_hash is a
32-bit quantity but hashes in general are declared 'long' which is a
64-bit quantity. Therefore for any object whose hash has any of the top 32
bits set, a dict lookup will fail as it will never get past that first
check (regardless of why that slot is being checked - it has nothing to do
with the perturbation to find the next slot).

The deal is that my system is basically a 32-bit system (sizeof(int) ==
sizeof(void *) == 4, and therefore ssize_t is not unreasonably also
32-bit), but C longs are 64-bit.

You say "popitem assumes it can store a pointer there", but AFAICS it's
just storing an _index_, not a pointer. I was concerned that making that
index a 64-bit quantity might tickle some subtlety in the code, thinking
that perhaps it was changed from 'long' to 'Py_ssize_t' because it had to
be 32-bit for some reason. However, it seems much more likely that it was
defined like that to be more correct on a system with 64-bit addressing
and 32-bit longs (which would be more common). With that in mind, I've
attached a suggested patch which selects a reasonable type according to
the SIZEOF_ configuration defines.

WRT forcing the hashes 32-bit to "save space and time" - that means
inventing a 'Py_hash_t' type and going through the entire python source
looking for 'long's that might be used to store/calculate hashes. I think
I'll pass on that ;)

Regards, Kev.
File Added: dict.diff

--

Comment By: Jim Jewett (jimjjewett)
Date: 2007-02-02 20:20

Message:
Logged In: YES 
user_id=764593
Originator: NO

The whole point of a hash is that if it doesn't match, you can skip an
expensive comparison.  How big to make the hash is a tradeoff between how
much you'll waste calculating and storing it vs how often it will save a
"real" comparison.

The comment means that, as an implementation detail, popitem assumes it
can store a pointer there instead, so hashes need to be at least as big as
a pointer.  

Going to the larger of the two sizes will certainly solve your problem; it
just wastes some space, and maybe some time calculating the hash.  

If you want to get that space back, just make sure the truncation is
correct and consistent.  I *suspect* your problem is that when there is a
collision, either 

(1)  It is comparing a truncated value to an untruncated value, or
(2)  The perturbation to find the next slot is going wrong, because of
when the truncation happens.

--

Comment By: Georg Brandl (gbrandl)
Date: 2007-01-27 19:40

Message:
Logged In: YES 
user_id=849994
Originator: NO

This is your code, Tim.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1646068&group_id=5470
___
Python-bugs-list mailing list 

[ python-Bugs-1646068 ] Dict lookups fail if sizeof(Py_ssize_t) < sizeof(long)

2007-02-04 Thread SourceForge.net
Bugs item #1646068, was opened at 2007-01-27 13:23
Message generated for change (Comment added) made by jimjjewett
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1646068&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.5
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: ked-tao (ked-tao)
Assigned to: Tim Peters (tim_one)
Summary: Dict lookups fail if sizeof(Py_ssize_t) < sizeof(long)

Initial Comment:
Portation problem.

Include/dictobject.h defines PyDictEntry.me_hash as a Py_ssize_t. Everywhere 
else uses a C 'long' for hashes.

On the system I'm porting to, ints and pointers (and ssize_t) are 32-bit, but 
longs and long longs are 64-bit. Therefore, the assignments to me_hash truncate 
the hash and subsequent lookups fail.

I've changed the definition of me_hash to 'long' and (in Objects/dictobject.c) 
removed the casting from the various assignments and changed the definition of 
'i' in dict_popitem(). This has fixed my immediate problems, but I guess I've 
just reintroduced whatever problem it got changed for. The comment in the 
header says:

/* Cached hash code of me_key.  Note that hash codes are C longs.
 * We have to use Py_ssize_t instead because dict_popitem() abuses
 * me_hash to hold a search finger.
 */

... but that doesn't really explain what it is about dict_popitem() that 
requires the different type.

Thanks. Kev.

--

Comment By: Jim Jewett (jimjjewett)
Date: 2007-02-04 11:35

Message:
Logged In: YES 
user_id=764593
Originator: NO

Yes, I'm curious about what system this is ... is it a characteristic of
the whole system, or a compiler choice to get longer ints?

As to using a Py_hash_t -- it probably wouldn't be as bad as you think. 
You might get away with just masking it to throw away the high order bits
in dict and set.  (That might not work with perturbation.)  

Even if you have to change it everywhere at the source, then there is some
prior art (from when hash was allowed to be a python long), and it is
almost certainly limited to methods with "hash" in the name which generate
a hash.  (eq/ne on the same objects may use the hash.)  Consumers of hash
really are limited to dict and derivatives.  I think dict, set, and
defaultdict may be the full list for the default distribution.


--

Comment By: ked-tao (ked-tao)
Date: 2007-02-04 09:11

Message:
Logged In: YES 
user_id=1703158
Originator: YES

Hi Jim. I understand what the problem is (perhaps I didn't state it
clearly enough) - me_hash is a cache of the dict item's hash which is
compared against the hash of the object being looked up before going any
further with expensive richer comparisons. On my system, me_hash is a
32-bit quantity but hashes in general are declared 'long' which is a 64-bit
quantity. Therefore for any object whose hash has any of the top 32 bits
set, a dict lookup will fail as it will never get past that first check
(regardless of why that slot is being checked - it has nothing to do with
the perturbation to find the next slot).

The deal is that my system is basically a 32-bit system (sizeof(int) ==
sizeof(void *) == 4, and therefore ssize_t is not unreasonably also
32-bit), but C longs are 64-bit.

You say "popitem assumes it can store a pointer there", but AFAICS it's
just storing an _index_, not a pointer. I was concerned that making that
index a 64-bit quantity might tickle some subtlety in the code, thinking
that perhaps it was changed from 'long' to 'Py_ssize_t' because it had to
be 32-bit for some reason. However, it seems much more likely that it was
defined like that to be more correct on a system with 64-bit addressing and
32-bit longs (which would be more common). With that in mind, I've attached
a suggested patch which selects a reasonable type according to the SIZEOF_
configuration defines.

WRT forcing the hashes 32-bit to "save space and time" - that means
inventing a 'Py_hash_t' type and going through the entire python source
looking for 'long's that might be used to store/calculate hashes. I think
I'll pass on that ;)

Regards, Kev.
File Added: dict.diff

--

Comment By: Jim Jewett (jimjjewett)
Date: 2007-02-02 15:20

Message:
Logged In: YES 
user_id=764593
Originator: NO

The whole point of a hash is that if it doesn't match, you can skip an
expensive comparison.  How big to make the hash is a tradeoff between how
much you'll waste calculating and storing it vs how often it will save a
"real" comparison.

The comment means that, as an implementation detail, popitem assumes it
can store a pointer there instead, so hashes need to be at least as big as
a poi

[ python-Bugs-1651995 ] sgmllib _convert_ref UnicodeDecodeError exception, new in 2.

2007-02-04 Thread SourceForge.net
Bugs item #1651995, was opened at 2007-02-04 22:34
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1651995&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Nagle (nagle)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib _convert_ref UnicodeDecodeError exception, new in 2.

Initial Comment:
   I'm running a website page through BeautifulSoup.  It parses OK with Python 
2.4, but Python 2.5 fails with an exception:

Traceback (most recent call last):
  File "./sitetruth/InfoSitePage.py", line 268, in httpfetch
self.pagetree = BeautifulSoup.BeautifulSoup(sitetext) # parse into tree form
  File "./sitetruth/BeautifulSoup.py", line 1326, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "./sitetruth/BeautifulSoup.py", line 973, in __init__
self._feed()
  File "./sitetruth/BeautifulSoup.py", line 998, in _feed
SGMLParser.feed(self, markup or "")
  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
  File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
k = self.parse_starttag(i)
  File "/usr/lib/python2.5/sgmllib.py", line 291, in parse_starttag
self.finish_starttag(tag, attrs)
  File "/usr/lib/python2.5/sgmllib.py", line 340, in finish_starttag
self.handle_starttag(tag, method, attrs)
  File "/usr/lib/python2.5/sgmllib.py", line 376, in handle_starttag
method(attrs)
  File "./sitetruth/BeautifulSoup.py", line 1416, in start_meta
self._feed(self.declaredHTMLEncoding)
  File "./sitetruth/BeautifulSoup.py", line 998, in _feed
SGMLParser.feed(self, markup or "")
  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
  File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
k = self.parse_starttag(i)
  File "/usr/lib/python2.5/sgmllib.py", line 285, in parse_starttag
self._convert_ref, attrvalue)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position 0: ordinal 
not in range(128)

The code that's failing is in "_convert_ref", which is new in Python 2.5. 
That function wasn't present in 2.4.  I think the code is trying to handle 
single quotes inside of double quotes in HTML attributes, or something like 
that.

To replicate, run

http://www.bankofamerica.com
or
http://www.gm.com

through BeautifulSoup.  

Something about this code doesn't like big companies. Web sites of smaller 
companies are going through OK.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1651995&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1562193 ] IDLE Hung up after open script by command line...

2007-02-04 Thread SourceForge.net
Bugs item #1562193, was opened at 2006-09-20 09:10
Message generated for change (Comment added) made by kbk
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1562193&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IDLE
Group: Python 2.5
>Status: Pending
>Resolution: Invalid
Priority: 5
Private: No
Submitted By: Marek Nowicki (faramir2)
Assigned to: Kurt B. Kaiser (kbk)
Summary: IDLE Hung up after open script by command line...

Initial Comment:
Hello,

I wrote that code in python and saved as prx.py:
--- CUT ---
from BaseHTTPServer import HTTPServer, 
BaseHTTPRequestHandler
from time import strftime, gmtime
import urllib2
import thread
 from sys import stdout
class RequestHandler(BaseHTTPRequestHandler):
 def serve(self):
 print "%s %s %s\r\n%s" % (self.command, 
self.path,  
self.request_version, self.headers)
 header={}
 header["content-length"]=0
 for i in str(self.headers).split("\r\n"):
 j=i.split(":", 1)
 if len(j)==2:
 header[j[0].strip().lower()] = 
j[1].strip()
 content=self.rfile.read(int(header["content-
length"]))
 print content
 url="http://faramir2.prv.pl";
 u=urllib2.urlopen(url)
 for i,j in u.info().items():
 print "%s: %s" % (i,j)
 self.server_version = "Apache"
 self.sys_version = ""
 self.send_response(200)
 self.send_header("Content-type", "text/html; 
charset=ISO-8859-2")
 self.send_header("Connectin", "close")
 self.end_headers()
 def do_POST(self): self.serve()
 def do_HEAD(self): self.serve()
 def do_GET(self): self.serve()
address = ("", 80)
server = HTTPServer(address, RequestHandler)
thread.start_new_thread(server.serve_forever, () )
--- CUT ---
 
When I right click on that file and select "Edit with 
IDLE" it opens. Then  
when I push F5 the script is running. *Python Shell* 
is restarting. But  
when I try to connect by browser to http://
localhost:80/ IDLE Hung-up. I  
don't see that hung ups when I open IDLE from shortcut 
and then in IDLE  
open file prx.py and run it works normally - good. 
IDLE does't hung up.
 
I don't know why it works like that, but I think that 
it's bug..
 
Python version: 2.5c2
Tk version: 8.4
IDLE version: 1.2c2
OS Version: Microsoft Windows XP Professional with SP2

---
Again:
* Freeze:
> 
"C:\Python25\pythonw.exe" "C:\Python25\Lib\idlelib\idle.pyw"
 -n -e "prx.py"
// then F5 on IDLE
// when run open Browser and try to open page: http://
localhost:80
// IDLE freezes

* Works ok:
> 
"C:\Python25\pythonw.exe" "C:\Python25\Lib\idlelib\idle.pyw"
 -e
// open prx.py in IDLE
// press F5 on IDLE
// run Browwser and try to open page: http://
localhost:80
// all works ok
---

regards,
Marek

--

>Comment By: Kurt B. Kaiser (kbk)
Date: 2007-02-05 01:18

Message:
Logged In: YES 
user_id=149084
Originator: NO

Well, Tal Einat has more patience than I do, and it sounds like he
diagnosed your problem.  I'm setting this pending, it will close in a
couple of weeks if you don't respond with further comments.

--

Comment By: Tal Einat (taleinat)
Date: 2006-12-09 12:54

Message:
Logged In: YES 
user_id=1330769
Originator: NO

Well, the issue obviously only happens when IDLE is running without a
subprocess.

The code you pasted is unindtented so I'm not going to try it out...

My guess would be that your server is blocking in a way that it blocks all
threads. This is why, when it is run in the same process as IDLE's GUI,
IDLE hangs. However, when you run IDLE with a subprocess, it's the
subprocess which is blocked, so IDLE works normally. (this is what the
subprocess is there for :)

In any case, IDLE is behaving just fine here. This isn't a bug in IDLE.


This could be a bug with the thread module, or a bug in BaseHTTPServer, or
several other places. But it is most likely caused by some misunderstanding
on your part of blocking operations, threads, and the interaction between
them.

You should have tried posting this on c.l.py before posting a bug on SF,
and I suggest you do so now.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1562193&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1651995 ] sgmllib _convert_ref UnicodeDecodeError exception, new in 2.

2007-02-04 Thread SourceForge.net
Bugs item #1651995, was opened at 2007-02-04 23:34
Message generated for change (Comment added) made by wrstlprmpft
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1651995&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Nagle (nagle)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib _convert_ref UnicodeDecodeError exception, new in 2.

Initial Comment:
   I'm running a website page through BeautifulSoup.  It parses OK with Python 
2.4, but Python 2.5 fails with an exception:

Traceback (most recent call last):
  File "./sitetruth/InfoSitePage.py", line 268, in httpfetch
self.pagetree = BeautifulSoup.BeautifulSoup(sitetext) # parse into tree form
  File "./sitetruth/BeautifulSoup.py", line 1326, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "./sitetruth/BeautifulSoup.py", line 973, in __init__
self._feed()
  File "./sitetruth/BeautifulSoup.py", line 998, in _feed
SGMLParser.feed(self, markup or "")
  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
  File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
k = self.parse_starttag(i)
  File "/usr/lib/python2.5/sgmllib.py", line 291, in parse_starttag
self.finish_starttag(tag, attrs)
  File "/usr/lib/python2.5/sgmllib.py", line 340, in finish_starttag
self.handle_starttag(tag, method, attrs)
  File "/usr/lib/python2.5/sgmllib.py", line 376, in handle_starttag
method(attrs)
  File "./sitetruth/BeautifulSoup.py", line 1416, in start_meta
self._feed(self.declaredHTMLEncoding)
  File "./sitetruth/BeautifulSoup.py", line 998, in _feed
SGMLParser.feed(self, markup or "")
  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
  File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
k = self.parse_starttag(i)
  File "/usr/lib/python2.5/sgmllib.py", line 285, in parse_starttag
self._convert_ref, attrvalue)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position 0: ordinal 
not in range(128)

The code that's failing is in "_convert_ref", which is new in Python 2.5. 
That function wasn't present in 2.4.  I think the code is trying to handle 
single quotes inside of double quotes in HTML attributes, or something like 
that.

To replicate, run

http://www.bankofamerica.com
or
http://www.gm.com

through BeautifulSoup.  

Something about this code doesn't like big companies. Web sites of smaller 
companies are going through OK.

--

Comment By: wrstl prmpft (wrstlprmpft)
Date: 2007-02-05 08:16

Message:
Logged In: YES 
user_id=801589
Originator: NO

I had a similar problem recently and did not have time to file a
bug-report. Thanks for doing that.

The problem is the code that handles entity and character references in
SGMLParser.parse_starttag. Seems that it is not careful about unicode/str
issues.
(But maybe Beautifulsoup needs to tell it to?)

My quick'n'dirty workaround was to remove the offending char-entity from
the website before feeding it to Beautifulsoup::

  text = text.replace('®', '') # remove rights reserved sign entity

cheers,
stefan


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1651995&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com