Logging module gives duplicate log entries

2007-08-21 Thread Shiao
Hi,
I am getting duplicate log entries with the logging module.

The following behaves as expected, leading to one log entry for each
logged event:

logging.basicConfig(level=logging.DEBUG, filename='/tmp/foo.log')

But this results in two entries for each logged event:

applog = logging.getLogger()
applog.setLevel(logging.DEBUG)
hdl = logging.FileHandler('/tmp/foo.log')
applog.addHandler(hdl)


The app is based on the web.py framework, so I guess my problem may
be
connected to be some interaction with other uses of logging within
the
framework. This is not specific to the root logger, the same happens
with logging.getLogger('foo').

Any clue would be more than welcome.

best,
ShiaoBu

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Logging module gives duplicate log entries

2007-08-21 Thread Shiao

>
> You need to remove the handler from the logging object
>
> # remove the handler once you are done
> applog.removeHandler(hdl)
>
> Cheers,
> amit.
>

I'm not sure how this could help.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Logging module gives duplicate log entries

2007-08-21 Thread Shiao
Maybe my question wasn't very clear. What I meant is that these four
lines lead in my case to two entries per logged event:

applog = logging.getLogger()
applog.setLevel(logging.DEBUG)
hdl = logging.FileHandler('/tmp/foo.log')
applog.addHandler(hdl)

However if I REPLACE the above by:

logging.basicConfig(level=logging.DEBUG, filename='/tmp/foo.log')

things work as expected.

-- 
http://mail.python.org/mailman/listinfo/python-list


Unicode regex and Hindi language

2008-11-28 Thread Shiao
The regex below identifies words in all languages I tested, but not in
Hindi:

# -*- coding: utf-8 -*-

import re
pat = re.compile('^(\w+)$', re.U)
langs = ('English', '中文', 'हिन्दी')

for l in langs:
m = pat.search(l.decode('utf-8'))
print l, m and m.group(1)

Output:

English English
中文 中文
हिन्दी None

From this is assumed that the Hindi text contains punctuation or other
characters that prevent the word match. Now, even more alienating is
this:

pat = re.compile('^(\W+)$', re.U) # note: now \W

for l in langs:
m = pat.search(l.decode('utf-8'))
print l, m and m.group(1)

Output:

English None
中文 None
हिन्दी None

How can the Hindi be both not a word and "not not a word"??

Any clue would be much appreciated!

Best.

--
http://mail.python.org/mailman/listinfo/python-list


Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Shiao
Hello,
I'm trying to build a regex in python to identify punctuation
characters in all the languages. Some regex implementations support an
extended syntax \p{P} that does just that. As far as I know, python re
doesn't. Any idea of a possible alternative?

Apart from manually including the punctuation character range for each
and every language, I don't see how this can be done.

Thank in advance for any suggestions.

John
--
http://mail.python.org/mailman/listinfo/python-list


Re: Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Shiao
On Nov 14, 11:27 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > I'm trying to build a regex in python to identify punctuation
> > characters in all the languages. Some regex implementations support an
> > extended syntax \p{P} that does just that. As far as I know, python re
> > doesn't. Any idea of a possible alternative?
>
> You should use character classes. You can generate them automatically
> from the unicodedata module: check whether unicodedata.category(c)
> starts with "P".
>
> Regards,
> Martin

Thanks Martin. I'll do this.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Identifying unicode punctuation characters with Python regex

2008-11-14 Thread Shiao
On Nov 14, 12:30 pm, "Mark Tolonen" <[EMAIL PROTECTED]> wrote:
> "Mark Tolonen" <[EMAIL PROTECTED]> wrote in message
>
> news:[EMAIL PROTECTED]
>
>
>
>
>
> > "Shiao" <[EMAIL PROTECTED]> wrote in message
> >news:[EMAIL PROTECTED]
> >> Hello,
> >> I'm trying to build a regex in python to identify punctuation
> >> characters in all the languages. Some regex implementations support an
> >> extended syntax \p{P} that does just that. As far as I know, python re
> >> doesn't. Any idea of a possible alternative?
>
> >> Apart from manually including the punctuation character range for each
> >> and every language, I don't see how this can be done.
>
> >> Thank in advance for any suggestions.
>
> >> John
>
> > You can always build your own pattern.  Something like (Python 3.0rc2):
>
> >>>> import unicodedata
> > Po=''.join(chr(x) for x in range(65536) if unicodedata.category(chr(x)) ==
> > 'Po')
> >>>> import re
> >>>> r=re.compile('['+Po+']')
> >>>> x='我是美國人。'
> >>>> x
> > '我是美國人。'
> >>>> r.findall(x)
> > ['。']
>
> > -Mark
>
> This was an interesting problem.  Need to escape \ and ] to find all the
> punctuation correctly, and it turns out those characters are sequential in
> the Unicode character set, so ] was coincidentally escaped in my first
> attempt.
>
> IDLE 3.0rc2>>> import unicodedata as u
> >>> A=''.join(chr(i) for i in range(65536))
> >>> P=''.join(chr(i) for i in range(65536) if u.category(chr(i))[0]=='P')
> >>> len(A)
> 65536
> >>> len(P)
> 491
> >>> len(re.findall('['+P+']',A)) # ] was naturally
> >>> escaped
> 490
> >>> set(P)-set(re.findall('['+P+']',A)) # so only missing \
> {'\\'}
> >>> P=P.replace('\\','').replace(']','\\]')   # escape both of them.
> >>> len(re.findall('['+P+']',A))
>
> 491
>
> -Mark

Mark,
Many thanks. I feel almost ashamed I got away with it so easily :-)
--
http://mail.python.org/mailman/listinfo/python-list