On Tue, Mar 23, 2021 at 5:16 AM Karen Shaeffer via Python-list <python-list@python.org> wrote: > > Hi Chris, > Thanks for your comment. > > > Python doesn't work with UTF-8 encoded code points; it works with > > Unicode code points. Are you looking for something that checks whether > > something is a palindrome, or locates palindromes within it? > > > > def is_palindrome(txt): > > return txt == txt[::-1] > > > > Easy. > > Of course, its easy. Its a pythonic idiom! But it doesn’t work. And you know > that. You even explained a few reasons why it doesn’t work below. There are > many more instances of strings that do not work. Here are two: > > idx = 6 A man, a plan, a canal: Panama is_palindrome() = False > idx = 17 ab́cdeedcb́a is_palindrome() = False > > The palindrome isn’t worth any more time. It isn’t even a good example. > > In my experience processing unstructured, multilingual text, you encounter a > wide array of variances in both the text and in the encoding details, > including outright errors. You have to account for all of them, because > 99.99% of that text is valuable to you. > > The key idea: If you care about the details, working with unstructured > multi-lingual text is complicated. There are no easy solutions. > > > > > > Efficiently finding substring palindromes would be a bit harder, but > > that'd be true even if you restricted it to ASCII. The advantage of > > Python's way of doing it is that, if you have a method that would work > > with ASCII bytes, the exact same thing will work with a Unicode > > string. > > > > There's another big wrinkle not touched here, and that's what to do > > with combining characters. Python makes it easy to normalize text as > > much as is possible, and an NFC normalization would help a lot, but > > it's not going to do everything. So you may want to first define a > > proper way to split a string into whatever you're defining a character > > to be, and that's a very difficult problem, regardless of programming > > language. For example, Arabic text changes in visual shape when > > letters are next to each other, and Greek has two different forms for > > the letter sigma (U+03C2 and U+03C3) - should those distinctions > > affect palindromminess? What about ligatures - is U+FB01 "fi" a single > > character, or should it be matched by "if" on the other end? > > > > What part of this is trivial in Go? > > Go is simpler than Python. Both languages have the capabilities to solve any > text processing problem. I’m still learning Go, so I can’t really say more. > > Personally, I like Python for text processing. You can usually get > satisfactory results very quickly for most of the input space. And if you > don’t care about all the gotchas, then you are good to go. > > I have no more time for this. Thanks for your comment. I learned a little > reading the long thread dealing with .title(). (chuckles ;) >
Hey, you're the one who brought up palindrome testing as a difficult problem in Python :) Your post implied that it was easier in Go, and I can't see that that's possible. ChrisA -- https://mail.python.org/mailman/listinfo/python-list