On 02/04/2019 01:52, Steven D'Aprano wrote:
Here's a partial list of English prefixes that somebody doing text
processing might want to remove to get at the root word:

     a an ante anti auto circum co com con contra contro de dis
     en ex extra hyper il im in ir inter intra intro macro micro
     mono non omni post pre pro sub sym syn tele un uni up

I count fourteen clashes:

     a: an ante anti
     an: ante anti
     co: com con contra contro
     ex: extra
     in: inter intra intro
     un: uni

(That's over a third of this admittedly incomplete list of prefixes.)

I can think of at least one English suffix pair that clash: -ify, -fy.

You're beginning to persuade me that cut/trim methods/functions aren't a good idea :-)

So far we have two slightly dubious use-cases.

1. Stripping file extensions. Personally I find that treating filenames like filenames (i.e. using os.path or (nowadays) pathlib) results in me thinking more appropriately about what I'm doing.

2. Stripping prefixes and suffixes to get to root words. Python has been used for natural language work for over a decade, and I don't think I've heard any great call from linguists for the functionality. English isn't a girl who puts out like that on a first date :-) There are too many common exception cases for such a straightforward approach not to cause confusion.

3. My most common use case (not very common at that) is for stripping annoying prompts off text-based APIs. I'm happy using .startswith() and string slicing for that, though your point about the repeated use of the string to be stripped off (or worse, hard-coding its length) is well made.

I am beginning to worry slightly that actually there are usually more appropriate things to do than simply cutting off affixes, and that in providing these particular batteries we might be encouraging poor practise.

--
Rhodri James *-* Kynesim Ltd
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to