Re: regex question

2008-08-05 Thread Fred Mangusta
Chris wrote: Doesn't work for his use case as he wants to keep periods marking the end of a sentence. Exactly. Thanks to all of you anyway, now I have a better understanding on how to go on :) F. -- http://mail.python.org/mailman/listinfo/python-list

regex question

2008-08-05 Thread Fred Mangusta
Hi, I would like to delete all the instances of a '.' into a number. In other words I'd like to replace all the instances of a '.' character with something (say nothing at all) when the '.' is representing a decimal separator. E.g. 500.675 > 500675 but also 1.000.456.344 > 1

Re: Nlp, Python and period

2008-08-04 Thread Fred Mangusta
Hi Paul, thanks for replying. I'm interested in knowing more about your regex approach, but as you point out in your comment, seems like access to the sourceforge mail archive is restricted. Is there any way I can read about it? Would you be so kind to cut and paste it here for instance? Tha

Nlp, Python and period

2008-08-04 Thread Fred Mangusta
Hi, are you aware of any nlp packages or algorithms in Python to spot whether a '.' represents an end of sentence or rather something else (eg Mr., [EMAIL PROTECTED], etc)? Thanks F. -- http://mail.python.org/mailman/listinfo/python-list

Re: Case tagging and python

2008-07-31 Thread Fred Mangusta
Hi, I came up with the following procedure ALLCAPS = "|ALLCAPS" NOCAPS = "|NOCAPS" MIDCAPS = "|MIDCAPS" CAPS = "|CAPS" DIGIT = "|DIGIT" def test_case(w): w_out = '' if w.isalpha(): #se la virgola non ci entra if w.isupper(): w_out = w.lower() + ALLCAPS r

Case tagging and python

2008-07-31 Thread Fred Mangusta
Hi, I'm relatively new to programming in general, and totally new to python, and I've been told that this language is particularly good for what I need to do. Let me explain. I have a large corpus of English text, in the form of several files. First of all I would like to scan each file. Then, f