Help on regular expression match
Hi, I've met a problem in match a regular expression in python. Hope any of you could help me. Here are the details: I have many tags like this: xxxhttp://xxx.xxx.xxx"; xxx>xxx xx xxxhttp://xxx.xxx.xxx"; xxx>xxx . And I want to find all the "http://xxx.xxx.xxx"; out, so I do it like this: httpPat = re.compile("(http://.*)(\")") result = httpPat.findall(data) I use this to observe my output: for i in result: print i[2] Surprisingly I will get some output like this: http://xxx.xxx.xxx";>xx In fact it's filtered from this kind of source: http://xxx.xxx.xxx";>xx" But some result are right, I wonder how can I get the all the answers clean like "http://xxx.xxx.xxx";? Thanks for your help. Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: Help on regular expression match
Fredrik Lundh wrote: > ".*" gives the longest possible match (you can think of it as searching back- > wards from the right end). if you want to search for "everything until a > given > character", searching for "[^x]*x" is often a better choice than ".*x". > > in this case, I suggest using something like > > print re.findall("href=\"([^\"]+)\"", text) > > or, if you're going to parse HTML pages from many different sources, a > real parser: > > from HTMLParser import HTMLParser > > class MyHTMLParser(HTMLParser): > > def handle_starttag(self, tag, attrs): > if tag == "a": > for key, value in attrs: > if key == "href": > print value > > p = MyHTMLParser() > p.feed(text) > p.close() > > see: > > http://docs.python.org/lib/module-HTMLParser.html > http://docs.python.org/lib/htmlparser-example.html > http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html > > Thanks for your help. I found another solution by just simply adding a '?' after ".*" which makes the it searching for the minimal length to match the regular expression. To the HTMLParser, there is another problem (take my code for example): import urllib import formatter parser = htmllib.HTMLParser(formatter.NullFormatter()) parser.feed(urllib.urlopen(baseUrl).read()) parser.close() for url in parser.anchorlist: if url[0:7] == "http://": print url when the baseUrl="http://www.nba.com";, there will raise an HTMLParseError because of a line of code "". I found that this line of code is inside
A problem while using anygui
Hi, I've met a problem while using anygui to create a GUI. Here is a brief example from Dave: ### def guidialog(): def ok(**kw): win.destroy() app.remove(win) # anygui.link(btn_ok, ok) # app.run() return n #qtgui will NEVER get here ### As you can see, the program will never get the sentence "return n". I googled for the problem but didn't find much help. So any one here could give me a hand? thanks regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
A problem while using urllib
Hi, I was using urllib to grab urls from web. here is the work flow of my program: 1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source of the base url and grab all the urls from "href" property of "a" tag 4. Call filter to validate every url grabbed 5. Continue 3-4 until the number of url grabbed gets the limit In filter there is a method like this: -- # check whether the url can be connected def filteredByConnection(self, url): assert url try: webPage = urllib2.urlopen(url) except urllib2.URLError: self.logGenerator.log("Error: " + url + " ") return False except urllib2.HTTPError: self.logGenerator.log("Error: " + url + " not found") return False self.logGenerator.log("Connecting " + url + " successed") webPage.close() return True But every time when I ran to the 70 to 75 urls (that means 70-75 urls have been tested via this way), the program will crash and all the urls left will raise urllib2.URLError until the program exits. I tried many ways to work it out, using urllib, set a sleep(1) in the filter (I thought it was the massive urls crashed the program). But none works. BTW, if I set the url from which the program crashed to base url, the program will still crashed at the 70-75 url. How can I solve this problem? thanks for your help Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: A problem while using urllib
Alex Martelli wrote: > Johnny Lee <[EMAIL PROTECTED]> wrote: >... > >try: > > webPage = urllib2.urlopen(url) > >except urllib2.URLError: >... > >webPage.close() > >return True > > > > > >But every time when I ran to the 70 to 75 urls (that means 70-75 > > urls have been tested via this way), the program will crash and all the > > urls left will raise urllib2.URLError until the program exits. I tried > > many ways to work it out, using urllib, set a sleep(1) in the filter (I > > thought it was the massive urls crashed the program). But none works. > > BTW, if I set the url from which the program crashed to base url, the > > program will still crashed at the 70-75 url. How can I solve this > > problem? thanks for your help > > Sure looks like a resource leak somewhere (probably leaving a file open > until your program hits some wall of maximum simultaneously open files), > but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and > 2.4.1). What version of Python are you using, and on what platform? > Maybe a simple Python upgrade might fix your problem... > > > Alex Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP. If you want to reproduce the problem, I can send the source to you. This morning I found that this is caused by urllib2. When I use urllib instead of urllib2, it won't crash any more. But the matters is that I want to catch the HTTP 404 Error which is handled by FancyURLopener in urllib.open(). So I can't catch it. Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: A problem while using urllib
Steve Holden wrote: > Johnny Lee wrote: > > Alex Martelli wrote: > > > >>Johnny Lee <[EMAIL PROTECTED]> wrote: > >> ... > >> > >>> try: > >>> webPage = urllib2.urlopen(url) > >>> except urllib2.URLError: > >> > >> ... > >> > >>> webPage.close() > >>> return True > >>> > >>> > >>> But every time when I ran to the 70 to 75 urls (that means 70-75 > >>>urls have been tested via this way), the program will crash and all the > >>>urls left will raise urllib2.URLError until the program exits. I tried > >>>many ways to work it out, using urllib, set a sleep(1) in the filter (I > >>>thought it was the massive urls crashed the program). But none works. > >>>BTW, if I set the url from which the program crashed to base url, the > >>>program will still crashed at the 70-75 url. How can I solve this > >>>problem? thanks for your help > >> > >>Sure looks like a resource leak somewhere (probably leaving a file open > >>until your program hits some wall of maximum simultaneously open files), > >>but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and > >>2.4.1). What version of Python are you using, and on what platform? > >>Maybe a simple Python upgrade might fix your problem... > >> > >> > >>Alex > > > > > > Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP. > > If you want to reproduce the problem, I can send the source to you. > > > > This morning I found that this is caused by urllib2. When I use urllib > > instead of urllib2, it won't crash any more. But the matters is that I > > want to catch the HTTP 404 Error which is handled by FancyURLopener in > > urllib.open(). So I can't catch it. > > > > I'm using exactly that configuration, so if you let me have that source > I could take a look at it for you. > > regards > Steve > -- > Steve Holden +44 150 684 7255 +1 800 494 3119 > Holden Web LLC www.holdenweb.com > PyCon TX 2006 www.python.org/pycon/ I've sent the source, thanks for your help. Regrads, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: A problem while using urllib
Steve Holden wrote: > Steve Holden wrote: > > Johnny Lee wrote: > > [...] > > > >>I've sent the source, thanks for your help. > >> > > > > [...] > > Preliminary result, in case this rings bells with people who use urllib2 > > quite a lot. I modified the error case to report the actual message > > returned with the exception and I'm seeing things like: > > > > http://www.holdenweb.com/./Python/webframeworks.html > > Message: > > Start process > > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20 > > Error: IOError while parsing > > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20 > > Message: > > . > > . > > . > > > > So at least we know now what the error is, and it looks like some sort > > of resource limit (though why only on Cygwin betas me) ... anyone, > > before I start some serious debugging? > > > I realized after this post that WingIDE doesn't run under Cygwin, so I > modified the code further to raise an error and give us a proper > traceback. I also tested the program under the standard Windows 2.4.1 > release, where it didn't fail, so I conclude you have unearthed a Cygwin > socket bug. Here's the traceback: > > End process http://www.holdenweb.com/contact.html > Start process http://freshmeat.net/releases/192449 > Error: IOError while parsing http://freshmeat.net/releases/192449 > Message: > Traceback (most recent call last): >File "Spider_bug.py", line 225, in ? > spider.run() >File "Spider_bug.py", line 143, in run > self.grabUrl(tempUrl) >File "Spider_bug.py", line 166, in grabUrl > webPage = urllib2.urlopen(url).read() >File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen > return _opener.open(url, data) >File "/usr/lib/python2.4/urllib2.py", line 358, in open > response = self._open(req, data) >File "/usr/lib/python2.4/urllib2.py", line 376, in _open > '_open', req) >File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain > result = func(*args) >File "/usr/lib/python2.4/urllib2.py", line 1021, in http_open > return self.do_open(httplib.HTTPConnection, req) >File "/usr/lib/python2.4/urllib2.py", line 996, in do_open > raise URLError(err) > urllib2.URLError: > > Looking at that part of the course of urrllib2 we see: > > headers["Connection"] = "close" > try: > h.request(req.get_method(), req.get_selector(), req.data, > headers) > r = h.getresponse() > except socket.error, err: # XXX what error? > raise URLError(err) > > So my conclusion is that there's something in the Cygwin socket module > that causes problems not seen under other platforms. > > I couldn't find any obviously-related error in the Python bug tracker, > and I have copied this message to the Cygwin list in case someone there > knows what the problem is. > > Before making any kind of bug submission you should really see if you > can build a program shorter that the existing 220+ lines to demonstrate > the bug, but it does look to me like your program should work (as indeed > it does on other platforms). > > regards > Steve > -- > Steve Holden +44 150 684 7255 +1 800 494 3119 > Holden Web LLC www.holdenweb.com > PyCon TX 2006 www.python.org/pycon/ But if you change urllib2 to urllib, it works under cygwin. Are they using different mechanism to connect to the page? -- http://mail.python.org/mailman/listinfo/python-list
Re: A problem while using urllib
Steve Holden 写道: > Good catch, John, I suspect this is a possibility so I've added the > following note: > > """The Windows 2.4.1 build doesn't show this error, but the Cygwin 2.4.1 > build does still have uncollectable objects after a urllib2.urlopen(), > so there may be a platform dependency here. No 2.4.2 on Cygwin yet, so > nothing conclusive as lsof isn't available.""" > > regards > Steve > -- > Steve Holden +44 150 684 7255 +1 800 494 3119 > Holden Web LLC www.holdenweb.com > PyCon TX 2006 www.python.org/pycon/ Maybe it's really a problem of platform dependency. Take a look at this brief example, (not using urllib, but just want to show the platform dependency of python): Here is the snapshot from dos: --- D:\>python ActivePython 2.4.1 Build 247 (ActiveState Corp.) based on Python 2.4.1 (#65, Jun 20 2005, 17:01:55) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> f = open("t", "r") >>> f.tell() 0L >>> f.readline() 'http://cn.realestate.yahoo.com\n' >>> f.tell() 28L -- Here is the a snapshot from cygwin: --- Johnny [EMAIL PROTECTED] /cygdrive/d $ python Python 2.4.1 (#1, May 27 2005, 18:02:40) [GCC 3.3.3 (cygwin special)] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> f = open("t", "r") >>> f.tell() 0L >>> f.readline() 'http://cn.realestate.yahoo.com\n' >>> f.tell() 31L -- http://mail.python.org/mailman/listinfo/python-list
Question on class member in python
Class A: def __init__(self): self.member = 1 def getMember(self): return self.member a = A() So, is there any difference between a.member and a.getMember? thanks for your help. :) Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: Question on class member in python
Peter Otten 写道: > Johnny Lee wrote: > > > Class A: > >def __init__(self): > > self.member = 1 > > > >def getMember(self): > > return self.member > > > > a = A() > > > > So, is there any difference between a.member and a.getMember? thanks > > for your help. :) > > Yes. accessor methods for simple attributes are a Javaism that should be > avoided in Python. You can always turn an attribute into a property if the > need arises to do some calculations behind the scene > > >>> class A(object): > ... def getMember(self): > ... return self.a * self.b > ... member = property(getMember) > ... def __init__(self): > ... self.a = self.b = 42 > ... > >>> A().member > 1764 > > I. e. you are not trapped once you expose a simple attribute. > > Peter Thanks for your help, maybe I should learn how to turn an attibute into a property first. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question on class member in python
But I still wonder what's the difference between the A().getMember and A().member besides the style -- http://mail.python.org/mailman/listinfo/python-list
Re: Question on class member in python
Alex Martelli 写道: > Johnny Lee <[EMAIL PROTECTED]> wrote: > > > But I still wonder what's the difference between the A().getMember and > > A().member besides the style > > Without parentheses after it, getMember is a method. The difference > between a method object and an integer object (which is what member > itself is in your example) are many indeed, so your question is very > strange. You cannot call an integer, you cannot divide methods, etc. > > > Alex Sorry, I didn't express myself clear to you. I mean: b = A().getMember() c = A().member what's the difference between b and c? If they are the same, what's the difference in the two way to get the value besides the style. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question on class member in python
It looks like there isn't a last word of the differrences -- http://mail.python.org/mailman/listinfo/python-list
How to translate python into C
Hi, First, I want to know whether the python interpreter translate the code directly into machine code, or translate it into C then into machine code? Second, if the codes are translated directly into machine codes, how can I translate the codes into C COMPLETELY the same? if the codes are translated first into C, where can I get the C source? Thanks for your help. Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: How to translate python into C
Szabolcs Nagy wrote: > python creates bytecode (like java classes) > > > you cannot translate python directly to c or machine code, but there > are some projects you probably want to look into > > > Pypy is a python implemetation in python and it can be used to > translate a python scrip to c or llvm code. (large project, work in > progress) > http://codespeak.net/pypy/dist/pypy/doc/news.html > > > Shedskin translates python code to c++ (not all language features > supported) > http://shed-skin.blogspot.com/ > > > Pyrex is a nice language where you can use python and c like code and > it translates into c code. (it is useful for creating fast python > extension modules or a python wrapper around an existing c library) > http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ Thanks, Szabolcs. In fact, I want to reproduce a crush on cygwin. I used a session of python code to produce the crush, and want to translate it into C and reproduce it. Is the tools provided by you help with these issues? Of coz, I'll try them first. :) Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: How to translate python into C
Thanks for your tips Niemann:) Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Re: How to translate python into C
Thanks Szabolcs and Laurence, it's not the crash of python but the crash of cygwin. We can locate the line number but when we submit the crash to cygwin's mail list, they told us they don't speak python. So I'm just trying to re-produce the crash in C. Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
Why the nonsense number appears?
Hi, Pls take a look at this code: -- >>> t1 = "1130748744" >>> t2 = "461" >>> t3 = "1130748744" >>> t4 = "500" >>> time1 = t1+"."+t2 >>> time2 = t3+"."+t4 >>> print time1, time2 1130748744.461 1130748744.500 >>> float(time2) - float(time1) 0.03934332275391 >>> Why are there so many nonsense tails? thanks for your help. Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
What's the matter with this code section?
Here is the source: #! /bin/python [EMAIL PROTECTED] This is a xunit test framework for python, see TDD for more details class TestCase: def setUp(self): print "setUp in TestCase" pass def __init__(self, name): print "__init__ in TestCase" self.name = name def run(self): print "run in TestCase" self.setUp() method = getattr(self, self.name) method() class WasRun(TestCase): def __init__(self, name): print "__init__ in WasRun" self.wasRun = None TestCase.__init__(self, name) def testMethod(self): print "testMethod in WasRun" self.wasRun = 1 def run(self): print "run in WasRun" method = getattr(self, self.name) method() def setUp(self): print "in setUp of WasRun" self.wasSetUp = 1 class TestCaseTest(TestCase): def testRunning(self): print "testRunning in TestCaseTest" test = WasRun("testMethod") assert(not test.wasRun) test.run() assert(test.wasRun) def testSetUp(self): print "testSetUp in TestCaseTest" test = WasRun("testMethod") test.run() assert(test.wasSetUp) # the program starts here print "starts TestCaseTest(\"testRunning\").run()" TestCaseTest("testRunning").run() print "starts TestCaseTest(\"testSetUp\").run()" TestCaseTest("testSetUp").run() And here is the result running under cygwin: $ ./xunit.py starts TestCaseTest("testRunning").run() __init__ in TestCase run in TestCase setUp in TestCase testRunning in TestCaseTest __init__ in WasRun __init__ in TestCase run in WasRun testMethod in WasRun starts TestCaseTest("testSetUp").run() __init__ in TestCase run in TestCase setUp in TestCase testSetUp in TestCaseTest __init__ in WasRun __init__ in TestCase run in WasRun testMethod in WasRun Traceback (most recent call last): File "./xunit.py", line 51, in ? TestCaseTest("testSetUp").run() File "./xunit.py", line 16, in run method() File "./xunit.py", line 45, in testSetUp assert(test.wasSetUp) AttributeError: WasRun instance has no attribute 'wasSetUp' -- http://mail.python.org/mailman/listinfo/python-list
What's the difference between VAR and _VAR_?
Hi, I'm new in python and I was wondering what's the difference between the two code section below: (I) class TestResult: _pass_ = "pass" _fail_ = "fail" _exception_ = "exception" (II) class TestResult: pass = "pass" fail = "fail" exception = "exception" Thanks for your help. -- http://mail.python.org/mailman/listinfo/python-list
Re: What's the difference between VAR and _VAR_?
As what you said, the following two code section is totally the same? (I) class TestResult: _passxxx_ = "pass" (II) class TestResult: passxxx = "pass" -- http://mail.python.org/mailman/listinfo/python-list
Re: What's the difference between VAR and _VAR_?
Erik Max Francis wrote: > > No, of course not. One defines a class varaible named `_passxxx_', the > other defines one named `passsxxx'. > I mean besides the difference of name... -- http://mail.python.org/mailman/listinfo/python-list
Re: What's the difference between VAR and _VAR_?
Erik Max Francis wrote: > > You're going to have to be more clear; I don't understand your question. > What's the difference between > > a = 1 > > and > > b = 1 > > besides the difference of name? > I thought there must be something special when you named a VAR with '_' the first character. Maybe it's just a programming style and I had thought too much... -- http://mail.python.org/mailman/listinfo/python-list
Would you pls tell me a tool to step debug python program?
Hi, I've met a problem to understand the code at hand. And I wonder whether there is any useful tools to provide me a way of step debug? Just like the F10 in VC... Thanks for your help. Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list
An interesting python problem
Hi, Look at the follow command in python command line, See what's interesting?:) >>> class A: i = 0 >>> a = A() >>> b = A() >>> a.i = 1 >>> print a.i, b.i 1 0 --- >>> class A: arr = [] >>> a = A() >>> b = A() >>> a <__main__.A instance at 0x00C96698> >>> b <__main__.A instance at 0x00CA0760> >>> A >>> a.arr.append("haha") >>> print a.arr , b.arr ['haha'] ['haha'] >>> a.arr = ["xixi"] >>> print a.arr , b.arr ['xixi'] ['haha'] >>> A.arr ['haha'] >>> A.arr.append("xx") >>> A.arr ['haha', 'xx'] >>> a.arr ['xixi'] >>> b.arr ['haha', 'xx'] >>> b.arr.pop() 'xx' >>> b.arr ['haha'] >>> A.arr ['haha'] - >>> class X: def __init__(self): self.arr = [] >>> m = X() >>> n = X() >>> m.arr.append("haha") >>> print m.arr, n.arr ['haha'] [] -- http://mail.python.org/mailman/listinfo/python-list
Re: An interesting python problem
bruno modulix wrote: > > I dont see anything interesting nor problematic here. If you understand > the difference between class attributes and instance attributes, the > difference between mutating an object and rebinding a name, and the > attribute lookup rules in Python, you'll find that all this is the > normal and expected behavior. > > Or did I miss something ? > No, you didn't miss anything as I can see. Thanks for your help:) -- http://mail.python.org/mailman/listinfo/python-list
Re: No newline using printf
Roy Smith wrote: > > For closer control over output, use the write() function. You want > something like: > > import sys > for i in range(3): >sys.stdout.write (str(i)) here is the output of my machine: >>> import sys >>> for i in range(3): ... sys.stdout.write(str(i)) ... 012>>> Why the prompt followed after the output? Maybe it's not as expected. -- http://mail.python.org/mailman/listinfo/python-list