Hi, This is a tool I’m using on my own files to save me time. Basically or most of the tracks were imported with different version iTunes over the years. There are two problems:
1. File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file name). 2. Smart Quotes were added at some point, these need to replaced. 3. Other character based of name being of a non-english origin. If find others I’ll add them. I’m using MusicBrainz to do a fuzzy match and get the correct name. it’s not perfect, but works for 99% of files which is good enough for me! Cheers Dave > On 8 Jun 2022, at 18:23, Avi Gross via Python-list <python-list@python.org> > wrote: > > Dave, > > Your goal is to compare titles and there can be endless replacements needed > if you allow the text to contain anything but ASCII. > > Have you considered stripping out things instead? I mean remove lots of stuff > that is not ASCII in the first place and perhaps also remove lots of extra > punctuation likesingle quotes or question marks or redundant white space and > compare the sort of skeletons of the two? > > And even if that fails, could you have a measure of how different they are > and tolerate if they were say off by one letter albeit "My desert" matching > "My Dessert" might not be a valid match with one being a song about an arid > environment and the other about food you don't need! > > Your seemingly simple need can expand into a fairly complex project. There > may be many ideas on how to deal with it but not anything perfect enough to > catch all cases as even a trained human may have to make decisions at times > and not match what other humans do. We have examples like the TV show > "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is > often written when I look it up as NUMBERS. You have obvious cases where > titles of songs may contain composite symbols like "œ" which will not compare > to one where it is written out as "oe" so the idea of comparing is quite > complex and the best you might do is heuristic. > > UNICODE has many symbols that are almost the same or even look the same or > maybe in one font versus another. There are libraries of functions that allow > some kinds of comparisons or conversions that you could look into but the > gain for you may not be worth it. Nothing stops a person from naming a song > any way they want and I speak many languages and often see a song re-titled > in the local language and using the local alphabet mixed often with another. > > Your original question is perhaps now many questions, depending on what you > choose. You started by wanting to know how to compare and it is moving on to > how to delete parts or make substitutions or use regular expressions and it > can get worse. You can, for example, take a string and identify the words > within it and create a regular expression that inserts sequences between the > words that match any zero or one or more non-word characters such as spaces, > tabs, punctuation or non-ASCII, so that song titles with the same words in a > sequence match no matter what is between them. The possibilities are endless > but consider some of the techniques that are used by some programs that parse > text and suggest alternate spellings or even programs like Google Translate > that can take a sentence and then suggest you may mean a slightly altered > sentence with one word changed to fit better. > > You need to decide what you want to deal with and what will be mis-classified > by your program. Some of us have suggested folding the case of the words but > that means asong about a dark skinned person in Poland called "Black Polish" > would match a song about keeping your shoes dark with "black polish" so I > keep repeating it is very hard or frankly impossible, to catch every case I > can imagine and the many I can't! > > But the emphasis here is not your overall problem. It is about whether and > how the computer language called python, and perhaps some add-on modules, can > be used to solve each smaller need such as recognizing a pattern or replacing > text. It can do quite a bit but only when the specification of the problem is > exact. > > > > > -----Original Message----- > From: Dave <d...@looktowindward.com> > To: python-list@python.org > Sent: Wed, Jun 8, 2022 5:09 am > Subject: Re: How to replace characters in a string? > > Hi, > > Thanks for this! > > So, is there a copy function/method that returns a MutableString like in > objective-C? I’ve solved this problems before in a number of languages like > Objective-C and AppleScript. > > Basically there is a set of common characters that need “normalizing” and I > have a method that replaces them in a string, so: > > myString = [myString normalizeCharacters]; > > Would return a new string with all the “common” replacements applied. > > Since the following gives an error : > > myString = 'Hello' > myNewstring = myString.replace(myString,'e','a’) > > TypeError: 'str' object cannot be interpreted as an integer > > I can’t see of a way to do this in Python? > > All the Best > Dave > > >> On 8 Jun 2022, at 10:14, Chris Angelico <ros...@gmail.com> wrote: >> >> On Wed, 8 Jun 2022 at 18:12, Dave <d...@looktowindward.com> wrote: >> >>> I tried the but it doesn’t seem to work? >>> myCompareFile1 = ascii(myTitleName) >>> myCompareFile1.replace("\u2019", "'") >> >> Strings in Python are immutable. When you call ascii(), you get back a >> new string, but it's one that has actual backslashes and such in it. >> (You probably don't need this step, other than for debugging; check >> the string by printing out the ASCII version of it, but stick to the >> original for actual processing.) The same is true of the replace() >> method; it doesn't change the string, it returns a new string. >> >>>>> word = "spam" >>>>> print(word.replace("sp", "h")) >> ham >>>>> print(word) >> spam >> >> ChrisA >> -- >> https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list