Paul McGuire wrote: > "Ola K" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > > Hi, > > I am pretty new to Python and I want to make a script that will search > > for the following options: > > 1) words made of uppercase characters -only- (like "YES") > > 2) words made of lowercase character -only- (like "yes") > > 3) and words with only the first letter capitalized (like "Yes") > > * and I need to do all these considering the fact that not all letters > > are indeed English letters. > > > > I went through different documention section but couldn't find a right > > condition, function or method for it. > > Suggestions will be very much appriciated... > > --Ola > > > Ola, > > You may be new to Python, but are you new to regular expressions too? I am > no wiz at them, but here is a script that takes a stab at what you are > trying to do. (For more regular expression info, see > http://www.amk.ca/python/howto/regex/.) > > The script has these steps: > - create strings containing all unicode chars that are considered "lower" > and "upper", using the unicode.is* methods > - use these strings to construct 3 regular expressions (or "re"s), one for > words of all lowercase letters, one for words of all uppercase letters, and > one for words that start with an uppercase letter followed by at least one > lowercase letter. > - use each re to search the string u"YES yes Yes", and print the found > matches > > I've used unicode strings throughout, so this should be applicable to your > text consisting of letters beyond the basic Latin set (since Outlook Express > is trying to install Israeli fonts when reading your post, I assume these > are the characters you are trying to handle).
I'd guessed the OP was in Israel from his e-mail address. If that's what Outlook Express is doing, then that's conclusive proof :-) An aside to the OP: Pardon my ignorance, but does Hebrew have upper and lower case? > You may have to do some setup > of your locale for proper handling of unicode.isupper, etc., Whatever gave you that impression? > but I hope this > gives you a jump start on your problem. > > -- Paul > > > import sys > import re > > uppers = u"".join( unichr(i) for i in range(sys.maxunicode) > if unichr(i).isupper() ) > lowers = u"".join( unichr(i) for i in range(sys.maxunicode) > if unichr(i).islower() ) Just in case the OP is running a 32-bit unicode implementation, you might want to make that xrange, not range :-) > > allUpperRe = ur"\b[%s]+\b" % uppers > allLowerRe = ur"\b[%s]+\b" % lowers > capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers) > > regexes = [ > (allUpperRe, "all upper"), > (allLowerRe, "all lower"), > (capWordRe, "title case"), > ] > for reString,label in regexes: > reg = re.compile(reString) > result = reg.findall(u" YES yes Yes ") > print label,":",result > > Prints: > all upper : [u'YES'] > all lower : [u'yes'] > title case : [u'Yes'] Cheers, John -- http://mail.python.org/mailman/listinfo/python-list