Peter Otten wrote: > gesh...@gmail.com wrote: >> how to write a function taking a string parameter, which returns it after >> you delete the spaces, punctuation marks, accented characters in python ? > > Looks like you want to remove more characters than you want to keep. In > this case I'd decide what characters too keep first, e. g. (assuming > Python 3)
However, with *that* approach (which is different from the OP’s request), regular expression matching might turn out to be more efficient: ----------------------------------------------------------- import re print("".join(re.findall(r'[a-z]+', "...", re.IGNORECASE))) ----------------------------------------------------------- With the OP’s original request, they may still be the better approach. For example: ---------------------------------------------------------------------- import re print("".join(re.sub(r'[\s,;.?!ÀÁÈÉÌÍÒÓÙÚÝ]+', "", "...", flags=re.IGNORECASE))) ---------------------------------------------------------------------- or ---------------------------------------------------------------------- import re print("".join(re.findall(r'[^\s,;.?!ÀÁÈÉÌÍÒÓÙÚÝ]+', "", "...", flags=re.IGNORECASE))) ---------------------------------------------------------------------- >>>> import string >>>> keep = string.ascii_letters + string.digits >>>> keep > 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' > > Now you can iterate over the characters and check if you want to preserve > it for each of them: The good thing about this part of the approach you suggested is that you can build regular expressions from strings, too: keep = '[' + 'a-z' + r'\d' + ']' >>>> def clean(s, keep): > ... return "".join(c for c in s if c in keep) > ... Why would one prefer this over "".filter(lambda: c in keep, s)? >>>> clean("<alpha> äöü ::42", keep) > 'alpha42' >>>> clean("<alpha> äöü ::42", string.ascii_letters) > 'alpha' > > If you are dealing with a lot of text you can make this a bit more > efficient with the str.translate() method. Create a mapping that maps all > characters that you want to keep to themselves > >>>> m = str.maketrans(keep, keep) >>>> m[ord("a")] > 97 >>>> m[ord(">")] > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > KeyError: 62 > > and all characters that you want to discard to None Why would creating a *larger* list for *more* operations be *more* efficient? -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. -- https://mail.python.org/mailman/listinfo/python-list