Help on regular expression match

2005-09-22 Thread Johnny Lee
Hi,
   I've met a problem in match a regular expression in python. Hope
any of you could help me. Here are the details:

   I have many tags like this:
  xxxhttp://xxx.xxx.xxx"; xxx>xxx
  xx
  xxxhttp://xxx.xxx.xxx"; xxx>xxx
  .
   And I want to find all the "http://xxx.xxx.xxx"; out, so I do it
like this:
  httpPat = re.compile("(http://.*)(\")")
  result = httpPat.findall(data)
   I use this to observe my output:
  for i in result:
 print i[2]
   Surprisingly I will get some output like this:
  http://xxx.xxx.xxx";>xx
   In fact it's filtered from this kind of source:
  http://xxx.xxx.xxx";>xx"
   But some result are right, I wonder how can I get the all the
answers clean like "http://xxx.xxx.xxx";? Thanks for your help.


Regards,
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help on regular expression match

2005-09-23 Thread Johnny Lee

Fredrik Lundh wrote:
> ".*" gives the longest possible match (you can think of it as searching back-
> wards from the right end).  if you want to search for "everything until a 
> given
> character", searching for "[^x]*x" is often a better choice than ".*x".
>
> in this case, I suggest using something like
>
> print re.findall("href=\"([^\"]+)\"", text)
>
> or, if you're going to parse HTML pages from many different sources, a
> real parser:
>
> from HTMLParser import HTMLParser
>
> class MyHTMLParser(HTMLParser):
>
> def handle_starttag(self, tag, attrs):
> if tag == "a":
> for key, value in attrs:
> if key == "href":
> print value
>
> p = MyHTMLParser()
> p.feed(text)
> p.close()
>
> see:
>
> http://docs.python.org/lib/module-HTMLParser.html
> http://docs.python.org/lib/htmlparser-example.html
> http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html
>
> 

Thanks for your help.
I found another solution by just simply adding a '?' after ".*" which
makes the it searching for the minimal length to match the regular
expression.
To the HTMLParser, there is another problem (take my code for example):

import urllib
import formatter
parser = htmllib.HTMLParser(formatter.NullFormatter())
parser.feed(urllib.urlopen(baseUrl).read())
parser.close()
for url in parser.anchorlist:
if url[0:7] == "http://":
print url

when the baseUrl="http://www.nba.com";, there will raise an
HTMLParseError because of a line of code "". I found that this line of code is inside 

A problem while using anygui

2005-09-30 Thread Johnny Lee
Hi,
   I've met a problem while using anygui to create a GUI. Here is a
brief example from Dave:

###
def guidialog():
   def ok(**kw):
  win.destroy()
  app.remove(win)
#
   anygui.link(btn_ok, ok)
#
   app.run()
   return n #qtgui will NEVER get here
###

   As you can see, the program will never get the sentence "return n".
I googled for the problem but didn't find much help. So any one here
could give me a hand? thanks

regards, 
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


A problem while using urllib

2005-10-11 Thread Johnny Lee
Hi,
   I was using urllib to grab urls from web. here is the work flow of
my program:

1. Get base url and max number of urls from user
2. Call filter to validate the base url
3. Read the source of the base url and grab all the urls from "href"
property of "a" tag
4. Call filter to validate every url grabbed
5. Continue 3-4 until the number of url grabbed gets the limit

   In filter there is a method like this:

--
# check whether the url can be connected
def filteredByConnection(self, url):
   assert url

   try:
  webPage = urllib2.urlopen(url)
   except urllib2.URLError:
  self.logGenerator.log("Error: " + url + " ")
  return False
   except urllib2.HTTPError:
  self.logGenerator.log("Error: " + url + " not found")
  return False
   self.logGenerator.log("Connecting " + url + " successed")
   webPage.close()
   return True


   But every time when I ran to the 70 to 75 urls (that means 70-75
urls have been tested via this way), the program will crash and all the
urls left will raise urllib2.URLError until the program exits. I tried
many ways to work it out, using urllib, set a sleep(1) in the filter (I
thought it was the massive urls crashed the program). But none works.
BTW, if I set the url from which the program crashed to base url, the
program will still crashed at the 70-75 url. How can I solve this
problem? thanks for your help

Regards,
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A problem while using urllib

2005-10-11 Thread Johnny Lee

Alex Martelli wrote:
> Johnny Lee <[EMAIL PROTECTED]> wrote:
>...
> >try:
> >   webPage = urllib2.urlopen(url)
> >except urllib2.URLError:
>...
> >webPage.close()
> >return True
> > 
> >
> >But every time when I ran to the 70 to 75 urls (that means 70-75
> > urls have been tested via this way), the program will crash and all the
> > urls left will raise urllib2.URLError until the program exits. I tried
> > many ways to work it out, using urllib, set a sleep(1) in the filter (I
> > thought it was the massive urls crashed the program). But none works.
> > BTW, if I set the url from which the program crashed to base url, the
> > program will still crashed at the 70-75 url. How can I solve this
> > problem? thanks for your help
>
> Sure looks like a resource leak somewhere (probably leaving a file open
> until your program hits some wall of maximum simultaneously open files),
> but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and
> 2.4.1).  What version of Python are you using, and on what platform?
> Maybe a simple Python upgrade might fix your problem...
>
>
> Alex

Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP.
If you want to reproduce the problem, I can send the source to you.

This morning I found that this is caused by urllib2. When I use urllib
instead of urllib2, it won't crash any more. But the matters is that I
want to catch the HTTP 404 Error which is handled by FancyURLopener in
urllib.open(). So I can't catch it.

Regards,
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A problem while using urllib

2005-10-12 Thread Johnny Lee

Steve Holden wrote:
> Johnny Lee wrote:
> > Alex Martelli wrote:
> >
> >>Johnny Lee <[EMAIL PROTECTED]> wrote:
> >>   ...
> >>
> >>>   try:
> >>>  webPage = urllib2.urlopen(url)
> >>>   except urllib2.URLError:
> >>
> >>   ...
> >>
> >>>   webPage.close()
> >>>   return True
> >>>
> >>>
> >>>   But every time when I ran to the 70 to 75 urls (that means 70-75
> >>>urls have been tested via this way), the program will crash and all the
> >>>urls left will raise urllib2.URLError until the program exits. I tried
> >>>many ways to work it out, using urllib, set a sleep(1) in the filter (I
> >>>thought it was the massive urls crashed the program). But none works.
> >>>BTW, if I set the url from which the program crashed to base url, the
> >>>program will still crashed at the 70-75 url. How can I solve this
> >>>problem? thanks for your help
> >>
> >>Sure looks like a resource leak somewhere (probably leaving a file open
> >>until your program hits some wall of maximum simultaneously open files),
> >>but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and
> >>2.4.1).  What version of Python are you using, and on what platform?
> >>Maybe a simple Python upgrade might fix your problem...
> >>
> >>
> >>Alex
> >
> >
> > Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP.
> > If you want to reproduce the problem, I can send the source to you.
> >
> > This morning I found that this is caused by urllib2. When I use urllib
> > instead of urllib2, it won't crash any more. But the matters is that I
> > want to catch the HTTP 404 Error which is handled by FancyURLopener in
> > urllib.open(). So I can't catch it.
> >
>
> I'm using exactly that configuration, so if you let me have that source
> I could take a look at it for you.
>
> regards
>   Steve
> --
> Steve Holden   +44 150 684 7255  +1 800 494 3119
> Holden Web LLC www.holdenweb.com
> PyCon TX 2006  www.python.org/pycon/


I've sent the source, thanks for your help.

Regrads,
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A problem while using urllib

2005-10-12 Thread Johnny Lee

Steve Holden wrote:
> Steve Holden wrote:
> > Johnny Lee wrote:
> > [...]
> >
> >>I've sent the source, thanks for your help.
> >>
> >
> > [...]
> > Preliminary result, in case this rings bells with people who use urllib2
> > quite a lot. I modified the error case to report the actual message
> > returned with the exception and I'm seeing things like:
> >
> > http://www.holdenweb.com/./Python/webframeworks.html
> > Message: 
> > Start process
> > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
> > Error: IOError while parsing
> > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
> > Message: 
> > .
> > .
> > .
> >
> > So at least we know now what the error is, and it looks like some sort
> > of resource limit (though why only on Cygwin betas me) ... anyone,
> > before I start some serious debugging?
> >
> I realized after this post that WingIDE doesn't run under Cygwin, so I
> modified the code further to raise an error and give us a proper
> traceback. I also tested the program under the standard Windows 2.4.1
> release, where it didn't fail, so I conclude you have unearthed a Cygwin
> socket bug. Here's the traceback:
>
> End process http://www.holdenweb.com/contact.html
> Start process http://freshmeat.net/releases/192449
> Error: IOError while parsing http://freshmeat.net/releases/192449
> Message: 
> Traceback (most recent call last):
>File "Spider_bug.py", line 225, in ?
>  spider.run()
>File "Spider_bug.py", line 143, in run
>  self.grabUrl(tempUrl)
>File "Spider_bug.py", line 166, in grabUrl
>  webPage = urllib2.urlopen(url).read()
>File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
>  return _opener.open(url, data)
>File "/usr/lib/python2.4/urllib2.py", line 358, in open
>  response = self._open(req, data)
>File "/usr/lib/python2.4/urllib2.py", line 376, in _open
>  '_open', req)
>File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain
>  result = func(*args)
>File "/usr/lib/python2.4/urllib2.py", line 1021, in http_open
>  return self.do_open(httplib.HTTPConnection, req)
>File "/usr/lib/python2.4/urllib2.py", line 996, in do_open
>  raise URLError(err)
> urllib2.URLError: 
>
> Looking at that part of the course of urrllib2 we see:
>
>  headers["Connection"] = "close"
>  try:
>  h.request(req.get_method(), req.get_selector(), req.data,
> headers)
>  r = h.getresponse()
>  except socket.error, err: # XXX what error?
>  raise URLError(err)
>
> So my conclusion is that there's something in the Cygwin socket module
> that causes problems not seen under other platforms.
>
> I couldn't find any obviously-related error in the Python bug tracker,
> and I have copied this message to the Cygwin list in case someone there
> knows what the problem is.
>
> Before making any kind of bug submission you should really see if you
> can build a program shorter that the existing 220+ lines to demonstrate
> the bug, but it does look to me like your program should work (as indeed
> it does on other platforms).
>
> regards
>   Steve
> --
> Steve Holden   +44 150 684 7255  +1 800 494 3119
> Holden Web LLC www.holdenweb.com
> PyCon TX 2006  www.python.org/pycon/

But if you change urllib2 to urllib, it works under cygwin. Are they
using different mechanism to connect to the page?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A problem while using urllib

2005-10-13 Thread Johnny Lee

Steve Holden 写道:
> Good catch, John, I suspect this is a possibility so I've added the
> following note:
>
> """The Windows 2.4.1 build doesn't show this error, but the Cygwin 2.4.1
> build does still have uncollectable objects after a urllib2.urlopen(),
> so there may be a platform dependency here. No 2.4.2 on Cygwin yet, so
> nothing conclusive as lsof isn't available."""
>
> regards
>   Steve
> --
> Steve Holden   +44 150 684 7255  +1 800 494 3119
> Holden Web LLC www.holdenweb.com
> PyCon TX 2006  www.python.org/pycon/

Maybe it's really a problem of platform dependency. Take a look at this
brief example, (not using urllib, but just want to show the platform
dependency of python):

Here is the snapshot from dos:
---
D:\>python
ActivePython 2.4.1 Build 247 (ActiveState Corp.) based on
Python 2.4.1 (#65, Jun 20 2005, 17:01:55) [MSC v.1310 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open("t", "r")
>>> f.tell()
0L
>>> f.readline()
'http://cn.realestate.yahoo.com\n'
>>> f.tell()
28L

--

Here is the a snapshot from cygwin:
---
Johnny [EMAIL PROTECTED] /cygdrive/d
$ python
Python 2.4.1 (#1, May 27 2005, 18:02:40)
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open("t", "r")
>>> f.tell()
0L
>>> f.readline()
'http://cn.realestate.yahoo.com\n'
>>> f.tell()
31L



-- 
http://mail.python.org/mailman/listinfo/python-list

Question on class member in python

2005-10-17 Thread Johnny Lee
Class A:
   def __init__(self):
  self.member = 1

   def getMember(self):
  return self.member

a = A()

So, is there any difference between a.member and a.getMember? thanks
for your help. :)

Regards, 
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question on class member in python

2005-10-17 Thread Johnny Lee

Peter Otten 写道:

> Johnny Lee wrote:
>
> > Class A:
> >def __init__(self):
> >   self.member = 1
> >
> >def getMember(self):
> >   return self.member
> >
> > a = A()
> >
> > So, is there any difference between a.member and a.getMember? thanks
> > for your help. :)
>
> Yes. accessor methods for simple attributes are a Javaism that should be
> avoided in Python. You can always turn an attribute into a property if the
> need arises to do some calculations behind the scene
>
> >>> class A(object):
> ... def getMember(self):
> ... return self.a * self.b
> ... member = property(getMember)
> ... def __init__(self):
> ... self.a = self.b = 42
> ...
> >>> A().member
> 1764
>
> I. e. you are not trapped once you expose a simple attribute.
>
> Peter

Thanks for your help, maybe I should learn how to turn an attibute into
a property first.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Question on class member in python

2005-10-18 Thread Johnny Lee
But I still wonder what's the difference between the A().getMember and
A().member besides the style

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question on class member in python

2005-10-18 Thread Johnny Lee

Alex Martelli 写道:

> Johnny Lee <[EMAIL PROTECTED]> wrote:
>
> > But I still wonder what's the difference between the A().getMember and
> > A().member besides the style
>
> Without parentheses after it, getMember is a method.  The difference
> between a method object and an integer object (which is what member
> itself is in your example) are many indeed, so your question is very
> strange.  You cannot call an integer, you cannot divide methods, etc.
>
>
> Alex

Sorry, I didn't express myself clear to you. I mean:
b = A().getMember()
c = A().member
what's the difference between b and c? If they are the same, what's the
difference in the two way to get the value besides the style.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Question on class member in python

2005-10-20 Thread Johnny Lee
It looks like there isn't a last word of the differrences

-- 
http://mail.python.org/mailman/listinfo/python-list


How to translate python into C

2005-10-28 Thread Johnny Lee
Hi,
   First, I want to know whether the python interpreter translate the
code directly into machine code, or translate it into C then into
machine code?
   Second, if the codes are translated directly into machine codes, how
can I translate the codes into C COMPLETELY the same? if the codes are
translated first into C, where can I get the C source?
   Thanks for your help.

Regards, 
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to translate python into C

2005-10-28 Thread Johnny Lee

Szabolcs Nagy wrote:
> python creates bytecode (like java classes)
>
>
> you cannot translate python directly to c or machine code, but there
> are some projects you probably want to look into
>
>
> Pypy is a python implemetation in python and it can be used to
> translate a python scrip to c or llvm code. (large project, work in
> progress)
> http://codespeak.net/pypy/dist/pypy/doc/news.html
>
>
> Shedskin translates python code to c++ (not all language features
> supported)
> http://shed-skin.blogspot.com/
>
>
> Pyrex is a nice language where you can use python and c like code and
> it translates into c code. (it is useful for creating fast python
> extension modules or a python wrapper around an existing c library)
> http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/

Thanks, Szabolcs. In fact, I want to reproduce a crush on cygwin. I
used a session of python code to produce the crush, and want to
translate it into C and reproduce it. Is the tools provided by you help
with these issues? Of coz, I'll try them first. :)

Regards, 
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to translate python into C

2005-10-28 Thread Johnny Lee
Thanks for your tips Niemann:)

Regards, 
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to translate python into C

2005-10-29 Thread Johnny Lee
Thanks Szabolcs and Laurence, it's not the crash of python but the
crash of cygwin. We can locate the line number but when we submit the
crash to cygwin's mail list, they told us they don't speak python. So
I'm just trying to re-produce the crash in C. 

Regards, 
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


Why the nonsense number appears?

2005-10-31 Thread Johnny Lee
Hi,
   Pls take a look at this code:

--
>>> t1 = "1130748744"
>>> t2 = "461"
>>> t3 = "1130748744"
>>> t4 = "500"
>>> time1 = t1+"."+t2
>>> time2 = t3+"."+t4
>>> print time1, time2
1130748744.461 1130748744.500
>>> float(time2) - float(time1)
0.03934332275391
>>>

   Why are there so many nonsense tails? thanks for your help.

Regards, 
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


What's the matter with this code section?

2005-08-24 Thread Johnny Lee
Here is the source:

#! /bin/python

[EMAIL PROTECTED] This is a xunit test framework for python, see TDD for more
details

class TestCase:
def setUp(self):
print "setUp in TestCase"
pass
def __init__(self, name):
print "__init__ in TestCase"
self.name = name
def run(self):
print "run in TestCase"
self.setUp()
method = getattr(self, self.name)
method()

class WasRun(TestCase):
def __init__(self, name):
print "__init__ in WasRun"
self.wasRun = None
TestCase.__init__(self, name)
def testMethod(self):
print "testMethod in WasRun"
self.wasRun = 1
def run(self):
print "run in WasRun"
method = getattr(self, self.name)
method()
def setUp(self):
print "in setUp of WasRun"
self.wasSetUp = 1

class TestCaseTest(TestCase):
def testRunning(self):
print "testRunning in TestCaseTest"
test = WasRun("testMethod")
assert(not test.wasRun)
test.run()
assert(test.wasRun)
def testSetUp(self):
print "testSetUp in TestCaseTest"
test = WasRun("testMethod")
test.run()
assert(test.wasSetUp)

# the program starts here
print "starts TestCaseTest(\"testRunning\").run()"
TestCaseTest("testRunning").run()
print "starts TestCaseTest(\"testSetUp\").run()"
TestCaseTest("testSetUp").run()



And here is the result running under cygwin:

$ ./xunit.py
starts TestCaseTest("testRunning").run()
__init__ in TestCase
run in TestCase
setUp in TestCase
testRunning in TestCaseTest
__init__ in WasRun
__init__ in TestCase
run in WasRun
testMethod in WasRun
starts TestCaseTest("testSetUp").run()
__init__ in TestCase
run in TestCase
setUp in TestCase
testSetUp in TestCaseTest
__init__ in WasRun
__init__ in TestCase
run in WasRun
testMethod in WasRun
Traceback (most recent call last):
  File "./xunit.py", line 51, in ?
TestCaseTest("testSetUp").run()
  File "./xunit.py", line 16, in run
method()
  File "./xunit.py", line 45, in testSetUp
assert(test.wasSetUp)
AttributeError: WasRun instance has no attribute 'wasSetUp'

-- 
http://mail.python.org/mailman/listinfo/python-list


What's the difference between VAR and _VAR_?

2005-09-08 Thread Johnny Lee
Hi,
   I'm new in python and I was wondering what's the difference between
the two code section below:

(I)
class TestResult:
_pass_ = "pass"
_fail_ = "fail"
_exception_ = "exception"

(II)
class TestResult:
pass = "pass"
fail = "fail"
exception = "exception"

   Thanks for your help.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What's the difference between VAR and _VAR_?

2005-09-08 Thread Johnny Lee
As what you said, the following two code section is totally the same?

(I)
class TestResult:
_passxxx_ = "pass"

(II) 
class TestResult: 
passxxx = "pass"

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What's the difference between VAR and _VAR_?

2005-09-08 Thread Johnny Lee

Erik Max Francis wrote:
>
> No, of course not.  One defines a class varaible named `_passxxx_', the
> other defines one named `passsxxx'.
> 


I mean besides the difference of name...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What's the difference between VAR and _VAR_?

2005-09-08 Thread Johnny Lee

Erik Max Francis wrote:
>
> You're going to have to be more clear; I don't understand your question.
>   What's the difference between
>
>   a = 1
>
> and
>
>   b = 1
>
> besides the difference of name?
>

I thought there must be something special when you named a VAR with '_'
the first character. Maybe it's just a programming style and I had
thought too much...

-- 
http://mail.python.org/mailman/listinfo/python-list


Would you pls tell me a tool to step debug python program?

2005-09-12 Thread Johnny Lee
Hi,
   I've met a problem to understand the code at hand. And I wonder
whether there is any useful tools to provide me a way of step debug?
Just like the F10 in VC...

Thanks for your help.

Regards,
Johnny

-- 
http://mail.python.org/mailman/listinfo/python-list


An interesting python problem

2005-09-14 Thread Johnny Lee
Hi,
   Look at the follow command in python command line, See what's
interesting?:)

>>> class A:
i = 0
>>> a = A()
>>> b = A()
>>> a.i = 1
>>> print a.i, b.i
1 0

---

>>> class A:
arr = []
>>> a = A()
>>> b = A()
>>> a
<__main__.A instance at 0x00C96698>
>>> b
<__main__.A instance at 0x00CA0760>
>>> A

>>> a.arr.append("haha")
>>> print a.arr , b.arr
['haha'] ['haha']
>>> a.arr = ["xixi"]
>>> print a.arr , b.arr
['xixi'] ['haha']
>>> A.arr
['haha']
>>> A.arr.append("xx")
>>> A.arr
['haha', 'xx']
>>> a.arr
['xixi']
>>> b.arr
['haha', 'xx']
>>> b.arr.pop()
'xx'
>>> b.arr
['haha']
>>> A.arr
['haha']

-

>>> class X:
def __init__(self):
self.arr = []
>>> m = X()
>>> n = X()
>>> m.arr.append("haha")
>>> print m.arr, n.arr
['haha'] []

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: An interesting python problem

2005-09-14 Thread Johnny Lee

bruno modulix wrote:
>
> I dont see anything interesting nor problematic here. If you understand
> the difference between class attributes and instance attributes, the
> difference between mutating an object and rebinding a name, and the
> attribute lookup rules in Python, you'll find that all this is the
> normal and expected behavior.
>
> Or did I miss something ?
> 

No, you didn't miss anything as I can see. Thanks for your help:)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: No newline using printf

2005-09-15 Thread Johnny Lee

Roy Smith wrote:
>
> For closer control over output, use the write() function.  You want
> something like:
>
> import sys
> for i in range(3):
>sys.stdout.write (str(i))

here is the output of my machine:

 >>> import sys
 >>> for i in range(3):
 ... sys.stdout.write(str(i))
 ...
 012>>>

Why the prompt followed after the output? Maybe it's not as expected.

-- 
http://mail.python.org/mailman/listinfo/python-list