> On 8 Jun 2022, at 18:01, Dave <d...@looktowindward.com> wrote: > > Hi, > > This is a tool I’m using on my own files to save me time. Basically or most > of the tracks were imported with different version iTunes over the years. > There are two problems: > > 1. File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file > name). ok > 2. Smart Quotes were added at some point, these need to replaced. ok > 3. Other character based of name being of a non-english origin. Why is this a problem? Its only if the chars are confusing/will not compare that there is something to fix? All modern OS allow unicode filenames.
Barry > > If find others I’ll add them. > > I’m using MusicBrainz to do a fuzzy match and get the correct name. > > it’s not perfect, but works for 99% of files which is good enough for me! > > Cheers > Dave > > >> On 8 Jun 2022, at 18:23, Avi Gross via Python-list <python-list@python.org> >> wrote: >> >> Dave, >> >> Your goal is to compare titles and there can be endless replacements needed >> if you allow the text to contain anything but ASCII. >> >> Have you considered stripping out things instead? I mean remove lots of >> stuff that is not ASCII in the first place and perhaps also remove lots of >> extra punctuation likesingle quotes or question marks or redundant white >> space and compare the sort of skeletons of the two? >> >> And even if that fails, could you have a measure of how different they are >> and tolerate if they were say off by one letter albeit "My desert" matching >> "My Dessert" might not be a valid match with one being a song about an arid >> environment and the other about food you don't need! >> >> Your seemingly simple need can expand into a fairly complex project. There >> may be many ideas on how to deal with it but not anything perfect enough to >> catch all cases as even a trained human may have to make decisions at times >> and not match what other humans do. We have examples like the TV show >> "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is >> often written when I look it up as NUMBERS. You have obvious cases where >> titles of songs may contain composite symbols like "œ" which will not >> compare to one where it is written out as "oe" so the idea of comparing is >> quite complex and the best you might do is heuristic. >> >> UNICODE has many symbols that are almost the same or even look the same or >> maybe in one font versus another. There are libraries of functions that >> allow some kinds of comparisons or conversions that you could look into but >> the gain for you may not be worth it. Nothing stops a person from naming a >> song any way they want and I speak many languages and often see a song >> re-titled in the local language and using the local alphabet mixed often >> with another. >> >> Your original question is perhaps now many questions, depending on what you >> choose. You started by wanting to know how to compare and it is moving on to >> how to delete parts or make substitutions or use regular expressions and it >> can get worse. You can, for example, take a string and identify the words >> within it and create a regular expression that inserts sequences between the >> words that match any zero or one or more non-word characters such as spaces, >> tabs, punctuation or non-ASCII, so that song titles with the same words in a >> sequence match no matter what is between them. The possibilities are endless >> but consider some of the techniques that are used by some programs that >> parse text and suggest alternate spellings or even programs like Google >> Translate that can take a sentence and then suggest you may mean a slightly >> altered sentence with one word changed to fit better. >> >> You need to decide what you want to deal with and what will be >> mis-classified by your program. Some of us have suggested folding the case >> of the words but that means asong about a dark skinned person in Poland >> called "Black Polish" would match a song about keeping your shoes dark with >> "black polish" so I keep repeating it is very hard or frankly impossible, to >> catch every case I can imagine and the many I can't! >> >> But the emphasis here is not your overall problem. It is about whether and >> how the computer language called python, and perhaps some add-on modules, >> can be used to solve each smaller need such as recognizing a pattern or >> replacing text. It can do quite a bit but only when the specification of the >> problem is exact. >> >> >> >> >> -----Original Message----- >> From: Dave <d...@looktowindward.com> >> To: python-list@python.org >> Sent: Wed, Jun 8, 2022 5:09 am >> Subject: Re: How to replace characters in a string? >> >> Hi, >> >> Thanks for this! >> >> So, is there a copy function/method that returns a MutableString like in >> objective-C? I’ve solved this problems before in a number of languages like >> Objective-C and AppleScript. >> >> Basically there is a set of common characters that need “normalizing” and I >> have a method that replaces them in a string, so: >> >> myString = [myString normalizeCharacters]; >> >> Would return a new string with all the “common” replacements applied. >> >> Since the following gives an error : >> >> myString = 'Hello' >> myNewstring = myString.replace(myString,'e','a’) >> >> TypeError: 'str' object cannot be interpreted as an integer >> >> I can’t see of a way to do this in Python? >> >> All the Best >> Dave >> >> >>> On 8 Jun 2022, at 10:14, Chris Angelico <ros...@gmail.com> wrote: >>> >>> On Wed, 8 Jun 2022 at 18:12, Dave <d...@looktowindward.com> wrote: >>> >>>> I tried the but it doesn’t seem to work? >>>> myCompareFile1 = ascii(myTitleName) >>>> myCompareFile1.replace("\u2019", "'") >>> >>> Strings in Python are immutable. When you call ascii(), you get back a >>> new string, but it's one that has actual backslashes and such in it. >>> (You probably don't need this step, other than for debugging; check >>> the string by printing out the ASCII version of it, but stick to the >>> original for actual processing.) The same is true of the replace() >>> method; it doesn't change the string, it returns a new string. >>> >>>>>> word = "spam" >>>>>> print(word.replace("sp", "h")) >>> ham >>>>>> print(word) >>> spam >>> >>> ChrisA >>> -- >>> https://mail.python.org/mailman/listinfo/python-list >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- >> https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list