Re: How to replace characters in a string?

Barry Scott Wed, 08 Jun 2022 10:43:56 -0700


> On 8 Jun 2022, at 18:01, Dave <d...@looktowindward.com> wrote:
> 
> Hi,
> 
> This is a tool I’m using on my own files to save me time. Basically or most 
> of the tracks were imported with different version iTunes over the years. 
> There are two problems:
> 
> 1.   File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file 
> name).
ok
> 2.   Smart Quotes were added at some point, these need to replaced.
ok
> 3.   Other character based of name being of a non-english origin.
Why is this a problem? Its only if the chars are confusing/will not compare 
that there is something to fix?
All modern OS allow unicode filenames.


Barry


> 
> If find others I’ll add them.
> 
> I’m using MusicBrainz to do a fuzzy match and get the correct name.
> 
> it’s not perfect, but works for 99% of files which is good enough for me!
> 
> Cheers
> Dave
> 
> 
>> On 8 Jun 2022, at 18:23, Avi Gross via Python-list <python-list@python.org> 
>> wrote:
>> 
>> Dave,
>> 
>> Your goal is to compare titles and there can be endless replacements needed 
>> if you allow the text to contain anything but ASCII.
>> 
>> Have you considered stripping out things instead? I mean remove lots of 
>> stuff that is not ASCII in the first place and perhaps also remove lots of 
>> extra punctuation likesingle quotes or question marks or redundant white 
>> space and compare the sort of skeletons of the two? 
>> 
>> And even if that fails, could you have a measure of how different they are 
>> and tolerate if they were say off by one letter albeit "My desert" matching 
>> "My Dessert" might not be a valid match with one being a song about an arid 
>> environment and the other about food you don't need!
>> 
>> Your seemingly simple need can expand into a fairly complex project. There 
>> may be many ideas on how to deal with it but not anything perfect enough to 
>> catch all cases as even a trained human may have to make decisions at times 
>> and not match what other humans do. We have examples like the TV show 
>> "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is 
>> often written when I look it up as NUMBERS. You have obvious cases where 
>> titles of songs may contain composite symbols like "œ" which will not 
>> compare to one where it is written out as "oe" so the idea of comparing is 
>> quite complex and the best you might do is heuristic.
>> 
>> UNICODE has many symbols that are almost the same or even look the same or 
>> maybe in one font versus another. There are libraries of functions that 
>> allow some kinds of comparisons or conversions that you could look into but 
>> the gain for you may not be worth it. Nothing stops a person from naming a 
>> song any way they want and I speak many languages and often see a song 
>> re-titled in the local language and using the local alphabet mixed often 
>> with another.
>> 
>> Your original question is perhaps now many questions, depending on what you 
>> choose. You started by wanting to know how to compare and it is moving on to 
>> how to delete parts or make substitutions or use regular expressions and it 
>> can get worse. You can, for example, take a string and identify the words 
>> within it and create a regular expression that inserts sequences between the 
>> words that match any zero or one or more non-word characters such as spaces, 
>> tabs, punctuation or non-ASCII, so that song titles with the same words in a 
>> sequence match no matter what is between them. The possibilities are endless 
>> but consider some of the techniques that are used by some programs that 
>> parse text and suggest alternate spellings  or even programs like Google 
>> Translate that can take a sentence and then suggest you may mean a slightly 
>> altered sentence with one word changed to fit better. 
>> 
>> You need to decide what you want to deal with and what will be 
>> mis-classified by your program. Some of us have suggested folding the case 
>> of the words but that means asong about a dark skinned person in Poland 
>> called "Black Polish" would match a song about keeping your shoes dark with 
>> "black polish" so I keep repeating it is very hard or frankly impossible, to 
>> catch every case I can imagine and the many I can't!
>> 
>> But the emphasis here is not your overall problem. It is about whether and 
>> how the computer language called python, and perhaps some add-on modules, 
>> can be used to solve each smaller need such as recognizing a pattern or 
>> replacing text. It can do quite a bit but only when the specification of the 
>> problem is exact. 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Dave <d...@looktowindward.com>
>> To: python-list@python.org
>> Sent: Wed, Jun 8, 2022 5:09 am
>> Subject: Re: How to replace characters in a string?
>> 
>> Hi,
>> 
>> Thanks for this! 
>> 
>> So, is there a copy function/method that returns a MutableString like in 
>> objective-C? I’ve solved this problems before in a number of languages like 
>> Objective-C and AppleScript.
>> 
>> Basically there is a set of common characters that need “normalizing” and I 
>> have a method that replaces them in a string, so:
>> 
>> myString = [myString normalizeCharacters];
>> 
>> Would return a new string with all the “common” replacements applied.
>> 
>> Since the following gives an error :
>> 
>> myString = 'Hello'
>> myNewstring = myString.replace(myString,'e','a’)
>> 
>> TypeError: 'str' object cannot be interpreted as an integer
>> 
>> I can’t see of a way to do this in Python? 
>> 
>> All the Best
>> Dave
>> 
>> 
>>> On 8 Jun 2022, at 10:14, Chris Angelico <ros...@gmail.com> wrote:
>>> 
>>> On Wed, 8 Jun 2022 at 18:12, Dave <d...@looktowindward.com> wrote:
>>> 
>>>> I tried the but it doesn’t seem to work?
>>>> myCompareFile1 = ascii(myTitleName)
>>>> myCompareFile1.replace("\u2019", "'")
>>> 
>>> Strings in Python are immutable. When you call ascii(), you get back a
>>> new string, but it's one that has actual backslashes and such in it.
>>> (You probably don't need this step, other than for debugging; check
>>> the string by printing out the ASCII version of it, but stick to the
>>> original for actual processing.) The same is true of the replace()
>>> method; it doesn't change the string, it returns a new string.
>>> 
>>>>>> word = "spam"
>>>>>> print(word.replace("sp", "h"))
>>> ham
>>>>>> print(word)
>>> spam
>>> 
>>> ChrisA
>>> -- 
>>> https://mail.python.org/mailman/listinfo/python-list
>> 
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to replace characters in a string?

Reply via email to