On Wed, 27 Apr 2016 01:38 am, subhabangal...@gmail.com wrote:
> I am trying to send you a revised example. > list1=[u"('koteeswaram/BHPERSN engaged/NA ','class1')", > u"('koteeswaram/BHPERSN is/NA ','class1')"] Please don't use generic names that mean nothing like "list1". We can see it is a list, but what is it for? Use a name that describes what the purpose of the list is. Even "input" and "output" are better names. > [('koteeswaram/BHPERSN engaged/NA ','class1'), > ('koteeswaram/BHPERSN is/NA ','class1')] What is this? The output? Don't make us guess what things are. My *guess* is that you have a list of Unicode strings that look like this: u"('aaa/TAG bbb/TAG ','class1')" and you want to do six things: - normalise the string; - convert the Unicode string to ASCII, ignoring anything that isn't ASCII; - delete the parentheses in the string; - delete the leading and trailing single quotes; - split the string on the comma; - combine them into a tuple. So let's make some functions: # Untested def remove_parentheses(string): if string.startswith("(") and string.endswith(")"): string = string[1:-1] return string def remove_single_quotes(string): if string.startswith("'") and string.endswith("'"): string = string[1:-1] return string def convert(string): if not isinstance(string, unicode): raise TypeError("expected unicode, but got %s" % type(string).__name__) string = unicodedata.normalize('NFKD', string) string = string.encode('ascii','ignore') string = remove_parentheses(string) first_part, second_part = string.split(",") first_part = remove_single_quotes(first_part) second_part = remove_single_quotes(second_part) return (first_part, second_part) input = [ ... ] # your input strings output = [] for string in input: output.append(convert(string)) -- Steven -- https://mail.python.org/mailman/listinfo/python-list