string formatter %x and a class instance with __int__ or __long__ cannot handle long
Hi I'm using Python 2.4.4 on 32bit x86 Linux. I have a problem with printing hex string for a value larger than 0x8 when the value is given to % operator via an instance of a class with __int__(). If I pass a long value to % operator it works just fine. Example1 -- pass a long value directly. this works. >>> x=0x8000 >>> x 2147483648L >>> type(x) >>> "%08x" % x '8000' Example2 -- pass an instance of a class with __int__() >>> class X: ... def __init__(self, v): ... self.v = v ... def __int__(self): ... return self.v ... >>> y = X(0x8000) >>> "%08x" % y Traceback (most recent call last): File "", line 1, in ? TypeError: int argument required >>> The behavior looks inconsistent. By the way __int__ actually returned a long type value in the Example2. The "%08x" allows either int or long in the Example1, however it accepts int only in the Example2. Is this a bug or expected? by the way same thing happends on a 64bit system with a value of 0x8000. Regards, Kenji Noguchi -- http://mail.python.org/mailman/listinfo/python-list
Re: string formatter %x and a class instance with __int__ cannot handle long
2007/6/20, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > In your second example y is an instance of class X...not an int. y.v > is an int. Are you hoping it will cast it to an int as needed using > your method? If so, I think you need to do so explicitly...ie "%08x" > % int(y) > > ~Sean I confirmed that "%08x" % int(y) works. And yes, I'm hoping so. It actually works that way if the v is less than or equal to 0x7. Please try the test script. It's essentially the same test with some more print statements. All but test d-3 appears to be ok. 2007/6/20, Gabriel Genellina <[EMAIL PROTECTED]>: > It is a bug, at least for me, and I have half of a patch addressing it. As > a workaround, convert explicitely to long before formatting. I'm interested in your patch. What's the other half still missing? Thanks, Kenji Noguchi --->8>8--->8--- #!/usr/bin/env python class X: def __init__(self, v): self.v = v def __int__(self): print "Hey! I'm waken up!" return self.v def test(arg): print 1,type(int(arg)) print 2,"%08x" % int(arg) print 3,"%08x" % arg a = 0x7fff b = X(0x7fff) c = 0x8000 d = X(0x8000) print "test a" ; test(a) print "test b" ; test(b) print "test c" ; test(c) print "test d" ; test(d) --->8>8--->8--- And here is the result test a 1 2 7fff 3 7fff test b 1 Hey! I'm waken up! 2 Hey! I'm waken up! 7fff 3 Hey! I'm waken up! 7fff test c 1 2 8000 3 8000 test d 1 Hey! I'm waken up! 2 Hey! I'm waken up! 8000 3 Hey! I'm waken up! Traceback (most recent call last): File "", line 23, in ? File "", line 13, in test TypeError: int argument required -- http://mail.python.org/mailman/listinfo/python-list
Re: string formatter %x and a class instance with __int__ cannot handle long
I looked at python2.5.1 source code. I noticed that, in Objects/stringobject.c around line 4684, long type is exceptionally handled, which is hack, and everything else falls to formatint. This explains why explicit converting to long before formatting fixes the problem. I made a patch but this is a hack on a hack. I expect Python3000 won't have such problem as they unify int and long. Thanks Kenji Noguchi --- stringobject.c.org 2007-06-21 13:57:54.745877000 -0700 +++ stringobject.c 2007-06-21 13:59:19.576646000 -0700 @@ -4684,6 +4684,15 @@ case 'X': if (c == 'i') c = 'd'; + /* try to convert objects to number*/ + PyNumberMethods *nb; + if ((nb = v->ob_type->tp_as_number) && + nb->nb_int) { + v = (*nb->nb_int) (v); + if(v == NULL) + goto error; + } + if (PyLong_Check(v)) { int ilen; temp = _PyString_FormatLong(v, flags, 2007/6/21, Kenji Noguchi <[EMAIL PROTECTED]>: > 2007/6/20, Gabriel Genellina <[EMAIL PROTECTED]>: > > It is a bug, at least for me, and I have half of a patch addressing it. As > > a workaround, convert explicitely to long before formatting. > > I'm interested in your patch. What's the other half still missing? -- http://mail.python.org/mailman/listinfo/python-list
Re: Tailing a log file?
something like this? unix tail command does more fancy stuff like it waits for timeout, and check if the file is truncated or depending on incoming data it sleeps seconds , etc etc. #!/usr/bin/env python import sys, select while True: ins, outs, errs = select.select([sys.stdin],[],[]) for i in ins: print i.readline() 2007/6/22, Evan Klitzke <[EMAIL PROTECTED]>: > On 6/22/07, Evan Klitzke <[EMAIL PROTECTED]> wrote: > > Everyone, > > > > I'm interested in writing a python program that reads from a log file > > and then executes actions based on the lines. I effectively want to > > write a loop that does something like this: > > > > while True: > > log_line = log_file.readline() > > do_something(log_line) > > > > Where the readline() method blocks until a new line appears in the > > file, unlike the standard readline() method which returns an empty > > string on EOF. Does anyone have any suggestions on how to do this? > > Thanks in advance! > > I checked the source code for tail and they actually poll the file by > using fstat and sleep to check for changes in the file size. This > didn't seem right so I thought about it more and realized I ought to > be using inotify. So I guess I answered my own question. > > -- > Evan Klitzke <[EMAIL PROTECTED]> > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
Re: developing web spider
Attached is a essence of my crawler. This collects tag in a given URL HTML parsing is not a big deal as "tidy" does all for you. It converts a broken HTML to a valid XHTML. From that point there're wealth of XML libraries. Just write whatever you want such as element handler. I've extended it for multi-thread, limit the number of thread for a specific web host, more flexible element handling, etc, etc. SQLite is nice for making URL db by the way. Kenji Noguchi #!/usr/bin/env python # -*- coding: utf-8 -*- import sys, urllib, urllib2, cookielib import xml.dom.minidom, tidy from urlparse import urlparse, urljoin _ua = "Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12" # I'm not sure if CookieJar() is thread safe cj = cookielib.CookieJar() class SingleCrawler: def __init__(self, seed_url=None): self.seed_url = seed_url self.urls = {} # static def _convert(self, html): if isinstance(html, unicode): html = html.encode('utf-8') options = dict( doctype='strict', drop_proprietary_attributes=True, enclose_text=True, output_xhtml=True, wrap=0, char_encoding='utf8', newline='LF', tidy_mark=False, ) return str(tidy.parseString(html, **options)) def _collect_urls(self, node, nest=0): if node.nodeType == 1 and node.nodeName == 'a': href = node.getAttribute('href') if not href.startswith('#'): p = urlparse(href) if p.scheme in ('', 'http', 'https'): self.urls[node.getAttribute('href')] = True else: # mailto, javascript print p.scheme for i in node.childNodes: self._collect_urls(i, nest+1) def canonicalize(self): d = {} for url in self.urls: d[urljoin(self.seed_url, url).encode('ascii')] = True self.urls = d def crawl(self): opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) opener.addheaders = [('User-agent', _ua)] try: html = opener.open(self.seed_url).read() except urllib2.HTTPError, e: return None except urllib2.URLError, e: print "URL Error:", self.seed_url return None if html.startswith('')+2:] html = self._convert(html) try: dom = xml.dom.minidom.parseString(html) except ExpatError, e: print "ExpatError:", html return None self._collect_urls(dom.childNodes[1]) self.canonicalize() return self.urls.keys() if __name__=='__main__': crawler = SingleCrawler() crawler.seed_url = 'http://www.python.org' next_urls = crawler.crawl() print next_urls -- http://mail.python.org/mailman/listinfo/python-list