Hi,

This is a tool I’m using on my own files to save me time. Basically or most of 
the tracks were imported with different version iTunes over the years. There 
are two problems:

1.   File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file 
name).
2.   Smart Quotes were added at some point, these need to replaced.
3.   Other character based of name being of a non-english origin.

If find others I’ll add them.

I’m using MusicBrainz to do a fuzzy match and get the correct name.

it’s not perfect, but works for 99% of files which is good enough for me!

Cheers
Dave


> On 8 Jun 2022, at 18:23, Avi Gross via Python-list <python-list@python.org> 
> wrote:
> 
> Dave,
> 
> Your goal is to compare titles and there can be endless replacements needed 
> if you allow the text to contain anything but ASCII.
> 
> Have you considered stripping out things instead? I mean remove lots of stuff 
> that is not ASCII in the first place and perhaps also remove lots of extra 
> punctuation likesingle quotes or question marks or redundant white space and 
> compare the sort of skeletons of the two? 
> 
> And even if that fails, could you have a measure of how different they are 
> and tolerate if they were say off by one letter albeit "My desert" matching 
> "My Dessert" might not be a valid match with one being a song about an arid 
> environment and the other about food you don't need!
> 
> Your seemingly simple need can expand into a fairly complex project. There 
> may be many ideas on how to deal with it but not anything perfect enough to 
> catch all cases as even a trained human may have to make decisions at times 
> and not match what other humans do. We have examples like the TV show 
> "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is 
> often written when I look it up as NUMBERS. You have obvious cases where 
> titles of songs may contain composite symbols like "œ" which will not compare 
> to one where it is written out as "oe" so the idea of comparing is quite 
> complex and the best you might do is heuristic.
> 
> UNICODE has many symbols that are almost the same or even look the same or 
> maybe in one font versus another. There are libraries of functions that allow 
> some kinds of comparisons or conversions that you could look into but the 
> gain for you may not be worth it. Nothing stops a person from naming a song 
> any way they want and I speak many languages and often see a song re-titled 
> in the local language and using the local alphabet mixed often with another.
> 
> Your original question is perhaps now many questions, depending on what you 
> choose. You started by wanting to know how to compare and it is moving on to 
> how to delete parts or make substitutions or use regular expressions and it 
> can get worse. You can, for example, take a string and identify the words 
> within it and create a regular expression that inserts sequences between the 
> words that match any zero or one or more non-word characters such as spaces, 
> tabs, punctuation or non-ASCII, so that song titles with the same words in a 
> sequence match no matter what is between them. The possibilities are endless 
> but consider some of the techniques that are used by some programs that parse 
> text and suggest alternate spellings  or even programs like Google Translate 
> that can take a sentence and then suggest you may mean a slightly altered 
> sentence with one word changed to fit better. 
> 
> You need to decide what you want to deal with and what will be mis-classified 
> by your program. Some of us have suggested folding the case of the words but 
> that means asong about a dark skinned person in Poland called "Black Polish" 
> would match a song about keeping your shoes dark with "black polish" so I 
> keep repeating it is very hard or frankly impossible, to catch every case I 
> can imagine and the many I can't!
> 
> But the emphasis here is not your overall problem. It is about whether and 
> how the computer language called python, and perhaps some add-on modules, can 
> be used to solve each smaller need such as recognizing a pattern or replacing 
> text. It can do quite a bit but only when the specification of the problem is 
> exact. 
> 
> 
> 
> 
> -----Original Message-----
> From: Dave <d...@looktowindward.com>
> To: python-list@python.org
> Sent: Wed, Jun 8, 2022 5:09 am
> Subject: Re: How to replace characters in a string?
> 
> Hi,
> 
> Thanks for this! 
> 
> So, is there a copy function/method that returns a MutableString like in 
> objective-C? I’ve solved this problems before in a number of languages like 
> Objective-C and AppleScript.
> 
> Basically there is a set of common characters that need “normalizing” and I 
> have a method that replaces them in a string, so:
> 
> myString = [myString normalizeCharacters];
> 
> Would return a new string with all the “common” replacements applied.
> 
> Since the following gives an error :
> 
> myString = 'Hello'
> myNewstring = myString.replace(myString,'e','a’)
> 
> TypeError: 'str' object cannot be interpreted as an integer
> 
> I can’t see of a way to do this in Python? 
> 
> All the Best
> Dave
> 
> 
>> On 8 Jun 2022, at 10:14, Chris Angelico <ros...@gmail.com> wrote:
>> 
>> On Wed, 8 Jun 2022 at 18:12, Dave <d...@looktowindward.com> wrote:
>> 
>>> I tried the but it doesn’t seem to work?
>>> myCompareFile1 = ascii(myTitleName)
>>> myCompareFile1.replace("\u2019", "'")
>> 
>> Strings in Python are immutable. When you call ascii(), you get back a
>> new string, but it's one that has actual backslashes and such in it.
>> (You probably don't need this step, other than for debugging; check
>> the string by printing out the ASCII version of it, but stick to the
>> original for actual processing.) The same is true of the replace()
>> method; it doesn't change the string, it returns a new string.
>> 
>>>>> word = "spam"
>>>>> print(word.replace("sp", "h"))
>> ham
>>>>> print(word)
>> spam
>> 
>> ChrisA
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to