"Ola K" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > I am pretty new to Python and I want to make a script that will search > for the following options: > 1) words made of uppercase characters -only- (like "YES") > 2) words made of lowercase character -only- (like "yes") > 3) and words with only the first letter capitalized (like "Yes") > * and I need to do all these considering the fact that not all letters > are indeed English letters. > > I went through different documention section but couldn't find a right > condition, function or method for it. > Suggestions will be very much appriciated... > --Ola > Ola,
You may be new to Python, but are you new to regular expressions too? I am no wiz at them, but here is a script that takes a stab at what you are trying to do. (For more regular expression info, see http://www.amk.ca/python/howto/regex/.) The script has these steps: - create strings containing all unicode chars that are considered "lower" and "upper", using the unicode.is* methods - use these strings to construct 3 regular expressions (or "re"s), one for words of all lowercase letters, one for words of all uppercase letters, and one for words that start with an uppercase letter followed by at least one lowercase letter. - use each re to search the string u"YES yes Yes", and print the found matches I've used unicode strings throughout, so this should be applicable to your text consisting of letters beyond the basic Latin set (since Outlook Express is trying to install Israeli fonts when reading your post, I assume these are the characters you are trying to handle). You may have to do some setup of your locale for proper handling of unicode.isupper, etc., but I hope this gives you a jump start on your problem. -- Paul import sys import re uppers = u"".join( unichr(i) for i in range(sys.maxunicode) if unichr(i).isupper() ) lowers = u"".join( unichr(i) for i in range(sys.maxunicode) if unichr(i).islower() ) allUpperRe = ur"\b[%s]+\b" % uppers allLowerRe = ur"\b[%s]+\b" % lowers capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers) regexes = [ (allUpperRe, "all upper"), (allLowerRe, "all lower"), (capWordRe, "title case"), ] for reString,label in regexes: reg = re.compile(reString) result = reg.findall(u" YES yes Yes ") print label,":",result Prints: all upper : [u'YES'] all lower : [u'yes'] title case : [u'Yes'] -- http://mail.python.org/mailman/listinfo/python-list