Re: Filtering XArray Datasets?

2022-06-07 Thread Martin Di Paola

Hi, I'm not an expert on this so this is an educated guess:

You are calling drop=True and I presume that you want to delete the rows
of your dataset that satisfy a condition.

That's a problem.

If the underlying original data is stored in a dense contiguous array,
deleting chunks of it will leave it with "holes". Unless the backend
supports sparse implementations, it is likely that it will go for the
easiest solution: copy the non-deleted rows in a new array.

I don't know the details of you particular problem but most of the time
the trick is in not letting the whole data to be loaded.

Try to see if instead of loading all the dataset and then performing the
filtering/selection, you can do the filtering during the loading.

An alternative could use filtering "before" doing the real work. For
example, if you have a CSV of >100GB you could write a program X that
copies the dataset into a new CSV but doing the filtering. Then, you
load the filtered dataset and do the real work in a program Y.

I explicitly named X and Y as, in principle, they are 2 different programs using
even 2 different technologies.

I hope this email can give you hints of how to fix it. In my last
project I had a similar problem and I ended up doing the filtering on
Python and the "real work" in Julia.

Thanks!
Martin.


On Mon, Jun 06, 2022 at 02:28:41PM -0800, Israel Brewster wrote:

I have some large (>100GB) datasets loaded into memory in a two-dimensional (X 
and Y) NumPy array backed XArray dataset. At one point I want to filter the data 
using a boolean array created by performing a boolean operation on the dataset 
that is, I want to filter the dataset for all points with a longitude value 
greater than, say, 50 and less than 60, just to give an example (hopefully that 
all makes sense?).

Currently I am doing this by creating a boolean array (data[‘latitude’]>50, for 
example), and then applying that boolean array to the dataset using .where(), with 
drop=True. This appears to work, but has two issues:

1) It’s slow. On my large datasets, applying where can take several minutes 
(vs. just seconds to use a boolean array to index a similarly sized numpy array)
2) It uses large amounts of memory (which is REALLY a problem when the array is 
already using 100GB+)

What it looks like is that values corresponding to True in the boolean array 
are copied to a new XArray object, thereby potentially doubling memory usage 
until it is complete, at which point the original object can be dropped, 
thereby freeing the memory.

Is there any solution for these issues? Some way to do an in-place filtering?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list


Re: Filtering XArray Datasets?

2022-06-07 Thread Peter Otten

On 07/06/2022 00:28, Israel Brewster wrote:

I have some large (>100GB) datasets loaded into memory in a two-dimensional (X 
and Y) NumPy array backed XArray dataset. At one point I want to filter the data 
using a boolean array created by performing a boolean operation on the dataset 
that is, I want to filter the dataset for all points with a longitude value 
greater than, say, 50 and less than 60, just to give an example (hopefully that 
all makes sense?).

Currently I am doing this by creating a boolean array (data[‘latitude’]>50, for 
example), and then applying that boolean array to the dataset using .where(), with 
drop=True. This appears to work, but has two issues:

1) It’s slow. On my large datasets, applying where can take several minutes 
(vs. just seconds to use a boolean array to index a similarly sized numpy array)
2) It uses large amounts of memory (which is REALLY a problem when the array is 
already using 100GB+)

What it looks like is that values corresponding to True in the boolean array 
are copied to a new XArray object, thereby potentially doubling memory usage 
until it is complete, at which point the original object can be dropped, 
thereby freeing the memory.

Is there any solution for these issues? Some way to do an in-place filtering?


Can XArray-s be sorted, resized  in-place? If so, you can sort by
longitude <= 50, search the index of the first row with longitude <= 50
and then resize the array.

(If the order of rows matters the sort algorithme has to be stable)
--
https://mail.python.org/mailman/listinfo/python-list


How to test characters of a string

2022-06-07 Thread Dave
Hi,

I’m new to Python and have a simple problem that I can’t seem to find the 
answer.

I want to test the first two characters of a string to check if the are numeric 
(00 to 99) and if so remove the fist three chars from the string. 

Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want 
“Trinket”. I can’t for the life of work out how to do it in Python?

All the Best
Dave

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread dn
On 08/06/2022 07.35, Dave wrote:
> Hi,
> 
> I’m new to Python and have a simple problem that I can’t seem to find the 
> answer.

> I want to test the first two characters of a string to check if the are 
> numeric (00 to 99) and if so remove the fist three chars from the string. 
> 
> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want 
> “Trinket”. I can’t for the life of work out how to do it in Python?

This sounds like an assignment or quiz-question. We could provide an/the
answer - but then you wouldn't learn how to solve it for yourself...


There is a gentle introduction to "slicing" (taking the first two
characters) at
https://docs.python.org/3/tutorial/introduction.html?highlight=slicing

Characters can be turned into int[egers], as discussed at
https://docs.python.org/3/library/stdtypes.html#typesnumeric

Of course precautions must be taken in case the string is not an
integer, eg https://docs.python.org/3/tutorial/errors.html?highlight=except

Another approach might be to use isnumeric(),
https://docs.python.org/3/library/stdtypes.html?highlight=isnum


NB the first URL-pointer is a page in the Python Tutorial. Reading that
in its entirety may be a good investment of time!

Are you aware of the Python Tutor list?

-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread De ongekruisigde
On 2022-06-07, Dave  wrote:
> Hi,
>
> I’m new to Python and have a simple problem that I can’t seem to find the 
> answer.
>
> I want to test the first two characters of a string to check if the are 
> numeric (00 to 99) and if so remove the fist three chars from the string. 
>
> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want 
> “Trinket”. I can’t for the life of work out how to do it in Python?


  s[3:] if s[0:2].isdigit() else s


> All the Best
> Dave
>

-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
Thanks a lot for this! isDigit was the method I was looking for and couldn’t 
find.

I have another problem related to this, the following code uses the code you 
just sent. I am getting a files ID3 tags using eyed3, this part seems to work 
and I get expected values in this case myTitleName (Track name) is set to 
“Deadlock Holiday” and myCompareFileName is set to “01 Deadlock Holiday” (File 
Name with the Track number prepended). The is digit test works and 
myCompareFileName is set to  “Deadlock Holiday”, so they should match, right? 

However the if myCompareFileName != myTitleName always gives a mismatch! What 
could cause two string that look the fail to not match properly?
myCompareFileName = myFile
if myCompareFileName[0].isdigit() and myCompareFileName[1].isdigit():
myCompareFileName = myCompareFileName[3:]

if myCompareFileName != myTitleName:
print('File Name Mismatch - Artist: ',myArtistName,'  Album: 
',myAlbumName,'  Track:',myTitleName,'  File: ',myFile)
Thanks a lot
Dave

> On 7 Jun 2022, at 21:58, De ongekruisigde 
>  wrote:
> 
> On 2022-06-07, Dave  wrote:
>> Hi,
>> 
>> I’m new to Python and have a simple problem that I can’t seem to find the 
>> answer.
>> 
>> I want to test the first two characters of a string to check if the are 
>> numeric (00 to 99) and if so remove the fist three chars from the string. 
>> 
>> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want 
>> “Trinket”. I can’t for the life of work out how to do it in Python?
> 
> 
>  s[3:] if s[0:2].isdigit() else s
> 
> 
>> All the Best
>> Dave
>> 
> 
> -- 
>  You're rewriting parts of Quake in *Python*?
>  MUAHAHAHA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread 2QdxY4RzWzUUiLuE
On 2022-06-07 at 21:35:43 +0200,
Dave  wrote:

> I’m new to Python and have a simple problem that I can’t seem to find
> the answer.

> I want to test the first two characters of a string to check if the
> are numeric (00 to 99) and if so remove the fist three chars from the
> string.

> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want
> “Trinket”. I can’t for the life of work out how to do it in Python?

How would you do it without Python?

Given that if the string is called x, then x[y] is the y'th character
(where what you would call "the first character," Python calls "the
zeroth character"), describe the steps you would take *as a person* (or
in some other programming language, if you know one) to carry out this
task.

Translating that algorithm to Python is the next step.  Perhaps
https://docs.python.org/3/library/string.html can help.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread De ongekruisigde
On 2022-06-07, Stefan Ram  wrote:
> Dave  writes:
>>Example: if "05 Trinket" I want "Trinket"
>
>   We're not supposed to write complete solutions, 

Okay, wasn't aware of this group policy; will keep it in mind.

-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread De ongekruisigde
On 2022-06-07, Dave  wrote:
> Thanks a lot for this! isDigit was the method I was looking for and couldn’t 
> find.
>
> I have another problem related to this, the following code uses the code you 
> just sent. I am getting a files ID3 tags using eyed3, this part seems to work 
> and I get expected values in this case myTitleName (Track name) is set to 
> “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock Holiday” 
> (File Name with the Track number prepended). The is digit test works and 
> myCompareFileName is set to  “Deadlock Holiday”, so they should match, right? 
>
> However the if myCompareFileName != myTitleName always gives a mismatch! What 
> could cause two string that look the fail to not match properly?

Possibly leading or trailing spaces, or upper/lower case differences?


> myCompareFileName = myFile
> if myCompareFileName[0].isdigit() and myCompareFileName[1].isdigit():
> myCompareFileName = myCompareFileName[3:]
>
> if myCompareFileName != myTitleName:
> print('File Name Mismatch - Artist: ',myArtistName,'  Album: 
> ',myAlbumName,'  Track:',myTitleName,'  File: ',myFile)
> Thanks a lot
> Dave
>
>> On 7 Jun 2022, at 21:58, De ongekruisigde 
>>  wrote:
>> 
>> On 2022-06-07, Dave  wrote:
>>> Hi,
>>> 
>>> I’m new to Python and have a simple problem that I can’t seem to find the 
>>> answer.
>>> 
>>> I want to test the first two characters of a string to check if the are 
>>> numeric (00 to 99) and if so remove the fist three chars from the string. 
>>> 
>>> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want 
>>> “Trinket”. I can’t for the life of work out how to do it in Python?
>> 
>> 
>>  s[3:] if s[0:2].isdigit() else s
>> 
>> 
>>> All the Best
>>> Dave
>>> 
>> 
>> -- 
>>  You're rewriting parts of Quake in *Python*?
>>  MUAHAHAHA
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>


-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
It depends on the language I’m using, in Objective C, I’d use isNumeric, just 
wanted to know what the equivalent is in Python.

If you know the answer why don’t you just tell me and if you don’t, don’t post!


> On 7 Jun 2022, at 22:08, 2qdxy4rzwzuui...@potatochowder.com wrote:
> 
> On 2022-06-07 at 21:35:43 +0200,
> Dave  wrote:
> 
>> I’m new to Python and have a simple problem that I can’t seem to find
>> the answer.
> 
>> I want to test the first two characters of a string to check if the
>> are numeric (00 to 99) and if so remove the fist three chars from the
>> string.
> 
>> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want
>> “Trinket”. I can’t for the life of work out how to do it in Python?
> 
> How would you do it without Python?
> 
> Given that if the string is called x, then x[y] is the y'th character
> (where what you would call "the first character," Python calls "the
> zeroth character"), describe the steps you would take *as a person* (or
> in some other programming language, if you know one) to carry out this
> task.
> 
> Translating that algorithm to Python is the next step.  Perhaps
> https://docs.python.org/3/library/string.html can help.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
Hi,

No, I’ve checked leading/trailing whitespace, it seems to be related to the 
variables that are returned from eyed3 in this case, for instance, I added a 
check for None:
myTitleName = myID3.tag.title
if myTitleName is None:
continue
Seems like it can return a null object (or none?).

 
> On 7 Jun 2022, at 22:35, De ongekruisigde 
>  wrote:
> 
> On 2022-06-07, Dave  > wrote:
>> Thanks a lot for this! isDigit was the method I was looking for and couldn’t 
>> find.
>> 
>> I have another problem related to this, the following code uses the code you 
>> just sent. I am getting a files ID3 tags using eyed3, this part seems to 
>> work and I get expected values in this case myTitleName (Track name) is set 
>> to “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock Holiday” 
>> (File Name with the Track number prepended). The is digit test works and 
>> myCompareFileName is set to  “Deadlock Holiday”, so they should match, 
>> right? 
>> 
>> However the if myCompareFileName != myTitleName always gives a mismatch! 
>> What could cause two string that look the fail to not match properly?
> 
> Possibly leading or trailing spaces, or upper/lower case differences?
> 
> 
>> myCompareFileName = myFile
>> if myCompareFileName[0].isdigit() and myCompareFileName[1].isdigit():
>>myCompareFileName = myCompareFileName[3:]
>> 
>> if myCompareFileName != myTitleName:
>>print('File Name Mismatch - Artist: ',myArtistName,'  Album: 
>> ',myAlbumName,'  Track:',myTitleName,'  File: ',myFile)
>> Thanks a lot
>> Dave
>> 
>>> On 7 Jun 2022, at 21:58, De ongekruisigde 
>>>  wrote:
>>> 
>>> On 2022-06-07, Dave  wrote:
 Hi,
 
 I’m new to Python and have a simple problem that I can’t seem to find the 
 answer.
 
 I want to test the first two characters of a string to check if the are 
 numeric (00 to 99) and if so remove the fist three chars from the string. 
 
 Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want 
 “Trinket”. I can’t for the life of work out how to do it in Python?
>>> 
>>> 
>>> s[3:] if s[0:2].isdigit() else s
>>> 
>>> 
 All the Best
 Dave
 
>>> 
>>> -- 
>>>  You're rewriting parts of Quake in *Python*?
>>>  MUAHAHAHA
>>> -- 
>>> https://mail.python.org/mailman/listinfo/python-list
>> 
> 
> 
> -- 
>  You're rewriting parts of Quake in *Python*?
>  MUAHAHAHA
> -- 
> https://mail.python.org/mailman/listinfo/python-list 
> 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
Hi,

Found it! The files name had .mp3 at the end, the problem was being masked by 
null objects (or whatever) being returned by eyed3.

Checked for null objects and then stripped off the .mp3 and its mostly working 
now. I’ve got a few other eyed3 errors to do with null objects but I can sort 
those out tomorrow.

Thanks for your help - All the Best
Dave

> On 7 Jun 2022, at 23:01, Dave  wrote:
> 
> Hi,
> 
> No, I’ve checked leading/trailing whitespace, it seems to be related to the 
> variables that are returned from eyed3 in this case, for instance, I added a 
> check for None:
> myTitleName = myID3.tag.title
> if myTitleName is None:
>continue
> Seems like it can return a null object (or none?).
> 
> 
>> On 7 Jun 2022, at 22:35, De ongekruisigde 
>> > > wrote:
>> 
>> On 2022-06-07, Dave >  > >> wrote:
>>> Thanks a lot for this! isDigit was the method I was looking for and 
>>> couldn’t find.
>>> 
>>> I have another problem related to this, the following code uses the code 
>>> you just sent. I am getting a files ID3 tags using eyed3, this part seems 
>>> to work and I get expected values in this case myTitleName (Track name) is 
>>> set to “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock 
>>> Holiday” (File Name with the Track number prepended). The is digit test 
>>> works and myCompareFileName is set to  “Deadlock Holiday”, so they should 
>>> match, right? 
>>> 
>>> However the if myCompareFileName != myTitleName always gives a mismatch! 
>>> What could cause two string that look the fail to not match properly?
>> 
>> Possibly leading or trailing spaces, or upper/lower case differences?
>> 
>> 
>>> myCompareFileName = myFile
>>> if myCompareFileName[0].isdigit() and myCompareFileName[1].isdigit():
>>>   myCompareFileName = myCompareFileName[3:]
>>> 
>>> if myCompareFileName != myTitleName:
>>>   print('File Name Mismatch - Artist: ',myArtistName,'  Album: 
>>> ',myAlbumName,'  Track:',myTitleName,'  File: ',myFile)
>>> Thanks a lot
>>> Dave
>>> 
 On 7 Jun 2022, at 21:58, De ongekruisigde 
  wrote:
 
 On 2022-06-07, Dave  wrote:
> Hi,
> 
> I’m new to Python and have a simple problem that I can’t seem to find the 
> answer.
> 
> I want to test the first two characters of a string to check if the are 
> numeric (00 to 99) and if so remove the fist three chars from the string. 
> 
> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want 
> “Trinket”. I can’t for the life of work out how to do it in Python?
 
 
 s[3:] if s[0:2].isdigit() else s
 
 
> All the Best
> Dave
> 
 
 -- 
  You're rewriting parts of Quake in *Python*?
  MUAHAHAHA
 -- 
 https://mail.python.org/mailman/listinfo/python-list
>>> 
>> 
>> 
>> -- 
>>  You're rewriting parts of Quake in *Python*?
>>  MUAHAHAHA
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list 
>>  
>> > >
> -- 
> https://mail.python.org/mailman/listinfo/python-list 
> 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Barry


> On 7 Jun 2022, at 22:04, Dave  wrote:
> 
> It depends on the language I’m using, in Objective C, I’d use isNumeric, 
> just wanted to know what the equivalent is in Python.
> 
> If you know the answer why don’t you just tell me and if you don’t, don’t 
> post!

People ask home work questions here and we try to teach a student with hints 
not finished answers.
Your post was confused with a home work question.

Barry

> 
> 
>> On 7 Jun 2022, at 22:08, 2qdxy4rzwzuui...@potatochowder.com wrote:
>> 
>>> On 2022-06-07 at 21:35:43 +0200,
>>> Dave  wrote:
>>> 
>>> I’m new to Python and have a simple problem that I can’t seem to find
>>> the answer.
>> 
>>> I want to test the first two characters of a string to check if the
>>> are numeric (00 to 99) and if so remove the fist three chars from the
>>> string.
>> 
>>> Example: if “05 Trinket” I want “Trinket”, but “Trinket” I still want
>>> “Trinket”. I can’t for the life of work out how to do it in Python?
>> 
>> How would you do it without Python?
>> 
>> Given that if the string is called x, then x[y] is the y'th character
>> (where what you would call "the first character," Python calls "the
>> zeroth character"), describe the steps you would take *as a person* (or
>> in some other programming language, if you know one) to carry out this
>> task.
>> 
>> Translating that algorithm to Python is the next step.  Perhaps
>> https://docs.python.org/3/library/string.html can help.
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Chris Angelico
On Wed, 8 Jun 2022 at 07:24, Barry  wrote:
>
>
>
> > On 7 Jun 2022, at 22:04, Dave  wrote:
> >
> > It depends on the language I’m using, in Objective C, I’d use isNumeric, 
> > just wanted to know what the equivalent is in Python.
> >
> > If you know the answer why don’t you just tell me and if you don’t, don’t 
> > post!
>
> People ask home work questions here and we try to teach a student with hints 
> not finished answers.
> Your post was confused with a home work question.
>

In the future, to make it look less like a homework question, show
your current code, which would provide context. Last I checked,
homework questions don't usually involve ID3 tags in MP3 files :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
A, ok will do, was just trying to be a brief as possible, will post more 
fully in future.

> On 7 Jun 2022, at 23:29, Chris Angelico  wrote:
> 
> On Wed, 8 Jun 2022 at 07:24, Barry  wrote:
>> 
>> 
>> 
>>> On 7 Jun 2022, at 22:04, Dave  wrote:
>>> 
>>> It depends on the language I’m using, in Objective C, I’d use isNumeric, 
>>> just wanted to know what the equivalent is in Python.
>>> 
>>> If you know the answer why don’t you just tell me and if you don’t, don’t 
>>> post!
>> 
>> People ask home work questions here and we try to teach a student with hints 
>> not finished answers.
>> Your post was confused with a home work question.
>> 
> 
> In the future, to make it look less like a homework question, show
> your current code, which would provide context. Last I checked,
> homework questions don't usually involve ID3 tags in MP3 files :)
> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread 2QdxY4RzWzUUiLuE
On 2022-06-08 at 07:29:03 +1000,
Chris Angelico  wrote:

> On Wed, 8 Jun 2022 at 07:24, Barry  wrote:
> >
> >
> >
> > > On 7 Jun 2022, at 22:04, Dave  wrote:
> > >
> > > It depends on the language I’m using, in Objective C, I’d use isNumeric, 
> > > just wanted to know what the equivalent is in Python.
> > >
> > > If you know the answer why don’t you just tell me and if you don’t, don’t 
> > > post!
> >
> > People ask home work questions here and we try to teach a student with 
> > hints not finished answers.
> > Your post was confused with a home work question.
> >
> 
> In the future, to make it look less like a homework question, show
> your current code, which would provide context. Last I checked,
> homework questions don't usually involve ID3 tags in MP3 files :)

The original question in this thread didn't say anything about MP3
files.  Jumping to that conclusion from strings like '05 Trinket' was
left as an exercise for the interested reader.  :-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread MRAB

On 2022-06-07 21:23, Dave wrote:

Thanks a lot for this! isDigit was the method I was looking for and couldn’t 
find.

I have another problem related to this, the following code uses the code you 
just sent. I am getting a files ID3 tags using eyed3, this part seems to work 
and I get expected values in this case myTitleName (Track name) is set to 
“Deadlock Holiday” and myCompareFileName is set to “01 Deadlock Holiday” (File 
Name with the Track number prepended). The is digit test works and 
myCompareFileName is set to  “Deadlock Holiday”, so they should match, right?

OT, but are you sure about that name? Isn't it "Dreadlock Holiday" (by 
10cc)?


[snip]
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread 2QdxY4RzWzUUiLuE
On 2022-06-07 at 23:07:42 +0100,
Regarding "Re: How to test characters of a string,"
MRAB  wrote:

> On 2022-06-07 21:23, Dave wrote:
> > Thanks a lot for this! isDigit was the method I was looking for and 
> > couldn’t find.
> > 
> > I have another problem related to this, the following code uses the code 
> > you just sent. I am getting a files ID3 tags using eyed3, this part seems 
> > to work and I get expected values in this case myTitleName (Track name) is 
> > set to “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock 
> > Holiday” (File Name with the Track number prepended). The is digit test 
> > works and myCompareFileName is set to  “Deadlock Holiday”, so they should 
> > match, right?
> > 
> OT, but are you sure about that name? Isn't it "Dreadlock Holiday" (by
> 10cc)?

Edsger Dijkstra originally wrote Deadlock Holiday for his band, The
Semaphores.  10cc lost the race condition and had to change the lyrics.

Sorry.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Christian Gollwitzer

Am 07.06.22 um 21:56 schrieb Dave:

It depends on the language I’m using, in Objective C, I’d use isNumeric, just 
wanted to know what the equivalent is in Python.



Your problem is also a typical case for regular expressions. You can 
create an expression for "starts with any number of digits plus optional 
whitespace" and then replace this with nothing:



chris@linux-tb9f:~> ipython
Python 3.6.15 (default, Sep 23 2021, 15:41:43) [GCC]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import re

In [2]: s='05 Trinket'   

In [3]: re.sub(r'^\d+\s*', '', s)
Out[3]: 'Trinket'




If it doesn't match, it will do nothing:

In [4]: s='Es geht los'  

In [5]: re.sub(r'^\d+\s*', '', s)
Out[5]: 'Es geht los'


Some people on this list don't like regexes but for tasks like this they 
are made and working well.


^ is "starts with"
\d is any digit
\s is any space
+ is at least one
* is nothing or one of

Christian




--
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Christian Gollwitzer

Am 07.06.22 um 23:01 schrieb Christian Gollwitzer:


In [3]: re.sub(r'^\d+\s*', '', s) Out[3]: 'Trinket'



that RE does match what you intended to do, but not exactly what you 
wrote in the OP. that would be '^\d\d.'  start with exactly two digits 
followed by any character.


Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread De ongekruisigde
On 2022-06-08, Christian Gollwitzer  wrote:
> Am 07.06.22 um 21:56 schrieb Dave:
>> It depends on the language I’m using, in Objective C, I’d use isNumeric, 
>> just wanted to know what the equivalent is in Python.
>> 
>
> Your problem is also a typical case for regular expressions. You can 
> create an expression for "starts with any number of digits plus optional 
> whitespace" and then replace this with nothing:

Regular expressions are overkill for this and much slower than the
simple isdigit based solution.


>> chris@linux-tb9f:~> ipython
>> Python 3.6.15 (default, Sep 23 2021, 15:41:43) [GCC]
>> Type 'copyright', 'credits' or 'license' for more information
>> IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
>> 
>> In [1]: import re
>>  
>>
>> 
>> In [2]: s='05 Trinket'   
>>  
>>
>> 
>> In [3]: re.sub(r'^\d+\s*', '', s)
>>  
>>
>> Out[3]: 'Trinket'
>> 
>
> If it doesn't match, it will do nothing:
>
>> In [4]: s='Es geht los'  
>>  
>>
>> 
>> In [5]: re.sub(r'^\d+\s*', '', s)
>>  
>>
>> Out[5]: 'Es geht los'
>
> Some people on this list don't like regexes but for tasks like this they 
> are made and working well.

Regular expressions are indeeed extremely powerful and useful but I tend
to avoid them when there's a (faster) normal solution.


> ^ is "starts with"
> \d is any digit
> \s is any space
> + is at least one
> * is nothing or one of
>
> Christian

-- 
 You're rewriting parts of Quake in *Python*?
 MUAHAHAHA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread dn
 It depends on the language I’m using, in Objective C, I’d use isNumeric, 
 just wanted to know what the equivalent is in Python.

 If you know the answer why don’t you just tell me and if you don’t, don’t 
 post!
>>>
>>> People ask home work questions here and we try to teach a student with 
>>> hints not finished answers.
>>> Your post was confused with a home work question.
>>
>> In the future, to make it look less like a homework question, show
>> your current code, which would provide context. Last I checked,
>> homework questions don't usually involve ID3 tags in MP3 files :)

Ah, so that's where I've seen it before!
(thanks for scratching my head @Chris - but watch-out for splinters!)

Yes, the problem has been used as a training exercise, eg same song but
in different albums/play-lists, different capitalisation, and such-like;
ie 'data cleaning' and harmonisation - good for use at the intersection
of Python and SQL (or NoSQL).


Knowing the background, and thus the particular need, would have saved a
lot of time - giving the answer as code (per one of the contributions)
would have taken considerably less effort than looking-up and citing the
docs.

Perhaps then, the 'learning-opportunity' is that when such questions pop
into one's mind, 'the docs' is *the* recommended first-call?


> The original question in this thread didn't say anything about MP3
> files.  Jumping to that conclusion from strings like '05 Trinket' was
> left as an exercise for the interested reader.  :-)

This reader's interest was to figure-out why "trinket" didn't refer to
some small decoration or 'bling', nor to a Python training tool
(https://trinket.io/), but to a music group/video series.
(even more-surprising: that this grey-beard recognised one of their tracks).


On the other side of the relationship, writers are expected to follow
the PSF Code of Conduct (https://www.python.org/psf/conduct/), eg
respect, acknowledgement, grace...

Such also encourages (positive) responses when asking future questions...


Now that you (@Dave) have revealed yourself as more than a raw-beginner,
and to have skills transferable to the Python world, it'll be great to
see you 'here', contributing to others' posts...
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
Yes, it was probably just a typeo on my part.

I’ve now fixed the majority of cases but still got two strings that look 
identical but fail to match, this time (again by 10cc), “I’m Mandy Fly Me”.

I’m putting money on it being a utf8 problem but I’m stuck on how to handle it. 
It’s probably the single quote in I’m, although it has worked with other songs.

Any ideas?

All the Best
Cheers
Dave

Here is the whole function/method or whatever it’s called in Python:


#
#   checkMusicFiles
#

def checkMusicFiles(theBaseMusicLibraryFolder):
myArtistDict = []

#
#  Loop thru Artists Folder
#
myArtistsFoldlerList = getFolderList(theBaseMusicLibraryFolder)
myArtistCount = 0
for myArtistFolder in myArtistsFoldlerList:
print('Artist: ' + myArtistFolder)
#
#  Loop thru Albums Folder
#
myAlbumList = getFolderList(theBaseMusicLibraryFolder + myArtistFolder)
for myAlbum in myAlbumList:
print('Album: ' + myAlbum)

#
#  Loop thru Tracks (Files) Folder
#
myAlbumPath = theBaseMusicLibraryFolder + myArtistFolder + '/' + 
myAlbum + '/'
myFilesList = getFileList(myAlbumPath)
for myFile in myFilesList:
myFilePath = myAlbumPath + myFile
myID3 = eyed3.load(myFilePath)
if myID3 is None:
continue

myArtistName = myID3.tag.artist
if myArtistName is None:
continue

myAlbumName = myID3.tag.album
if myAlbumName is None:
continue

myTitleName = myID3.tag.title
if myTitleName is None:
continue

myCompareFileName = myFile[0:-4]
if myCompareFileName[0].isdigit() and 
myCompareFileName[1].isdigit():
myCompareFileName = myFile[3:-4]

if myCompareFileName != myTitleName:
myLength1 = len(myCompareFileName)
myLength2 = len(myTitleName)
print('File Name Mismatch - Artist: [' + myArtistName + ']  
Album: ['+ myAlbumName + ']  Track: [' + myTitleName + ']  File: [' + 
myCompareFileName + ']')
if (myLength1 == myLength2):
print('lengths match: ',myLength1)
else:
print('lengths mismatch: ',myLength1,'  ',myLength2)

print(' ')




return myArtistsFoldlerList






> On 8 Jun 2022, at 00:07, MRAB  wrote:
> 
> On 2022-06-07 21:23, Dave wrote:
>> Thanks a lot for this! isDigit was the method I was looking for and couldn’t 
>> find.
>> I have another problem related to this, the following code uses the code you 
>> just sent. I am getting a files ID3 tags using eyed3, this part seems to 
>> work and I get expected values in this case myTitleName (Track name) is set 
>> to “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock Holiday” 
>> (File Name with the Track number prepended). The is digit test works and 
>> myCompareFileName is set to  “Deadlock Holiday”, so they should match, right?
> OT, but are you sure about that name? Isn't it "Dreadlock Holiday" (by 10cc)?
> 
> [snip]
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
rotfl! Nice one! 

> On 8 Jun 2022, at 00:24, 2qdxy4rzwzuui...@potatochowder.com wrote:
> 
> On 2022-06-07 at 23:07:42 +0100,
> Regarding "Re: How to test characters of a string,"
> MRAB  wrote:
> 
>> On 2022-06-07 21:23, Dave wrote:
>>> Thanks a lot for this! isDigit was the method I was looking for and 
>>> couldn’t find.
>>> 
>>> I have another problem related to this, the following code uses the code 
>>> you just sent. I am getting a files ID3 tags using eyed3, this part seems 
>>> to work and I get expected values in this case myTitleName (Track name) is 
>>> set to “Deadlock Holiday” and myCompareFileName is set to “01 Deadlock 
>>> Holiday” (File Name with the Track number prepended). The is digit test 
>>> works and myCompareFileName is set to  “Deadlock Holiday”, so they should 
>>> match, right?
>>> 
>> OT, but are you sure about that name? Isn't it "Dreadlock Holiday" (by
>> 10cc)?
> 
> Edsger Dijkstra originally wrote Deadlock Holiday for his band, The
> Semaphores.  10cc lost the race condition and had to change the lyrics.
> 
> Sorry.
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread dn
On 08/06/2022 10.18, De ongekruisigde wrote:
> On 2022-06-08, Christian Gollwitzer  wrote:
>> Am 07.06.22 um 21:56 schrieb Dave:
>>> It depends on the language I’m using, in Objective C, I’d use isNumeric, 
>>> just wanted to know what the equivalent is in Python.
>>>
>>
>> Your problem is also a typical case for regular expressions. You can 
>> create an expression for "starts with any number of digits plus optional 
>> whitespace" and then replace this with nothing:
> 
> Regular expressions are overkill for this and much slower than the
> simple isdigit based solution.

...

> Regular expressions are indeeed extremely powerful and useful but I tend
> to avoid them when there's a (faster) normal solution.

Yes, simple solutions are (likely) easier to read.

RegEx-s are more powerful (and well worth learning for this reason), but
are only 'readable' to those who use them frequently.

Has either of you performed a timeit comparison?
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Dave
I hate regEx and avoid it whenever possible, I’ve never found something that 
was impossible to do without it.

> On 8 Jun 2022, at 00:49, dn  wrote:
> 
> On 08/06/2022 10.18, De ongekruisigde wrote:
>> On 2022-06-08, Christian Gollwitzer  wrote:
>>> Am 07.06.22 um 21:56 schrieb Dave:
 It depends on the language I’m using, in Objective C, I’d use isNumeric, 
 just wanted to know what the equivalent is in Python.
 
>>> 
>>> Your problem is also a typical case for regular expressions. You can 
>>> create an expression for "starts with any number of digits plus optional 
>>> whitespace" and then replace this with nothing:
>> 
>> Regular expressions are overkill for this and much slower than the
>> simple isdigit based solution.
> 
> ...
> 
>> Regular expressions are indeeed extremely powerful and useful but I tend
>> to avoid them when there's a (faster) normal solution.
> 
> Yes, simple solutions are (likely) easier to read.
> 
> RegEx-s are more powerful (and well worth learning for this reason), but
> are only 'readable' to those who use them frequently.
> 
> Has either of you performed a timeit comparison?
> -- 
> Regards,
> =dn
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread MRAB

On 2022-06-07 23:24, Dave wrote:

Yes, it was probably just a typeo on my part.


You've misspelled "typo"!


I’ve now fixed the majority of cases but still got two strings that look 
identical but fail to match, this time (again by 10cc), “I’m Mandy Fly Me”.


Try printing the asciified string:

>>> print(ascii("I’m Mandy Fly Me"))
'I\u2019m Mandy Fly Me'

What you typed above has "smart quotes". Maybe that's also the problem 
in the program: straight single quote/apodtrophe vs smart single quote.



I’m putting money on it being a utf8 problem but I’m stuck on how to handle it. 
It’s probably the single quote in I’m, although it has worked with other songs.

Any ideas?

All the Best
Cheers
Dave

Here is the whole function/method or whatever it’s called in Python:


#
#   checkMusicFiles
#

def checkMusicFiles(theBaseMusicLibraryFolder):
 myArtistDict = []

#
#  Loop thru Artists Folder
#
 myArtistsFoldlerList = getFolderList(theBaseMusicLibraryFolder)
 myArtistCount = 0
 for myArtistFolder in myArtistsFoldlerList:
 print('Artist: ' + myArtistFolder)
#
#  Loop thru Albums Folder
#
 myAlbumList = getFolderList(theBaseMusicLibraryFolder + myArtistFolder)
 for myAlbum in myAlbumList:
 print('Album: ' + myAlbum)

#
#  Loop thru Tracks (Files) Folder
#
 myAlbumPath = theBaseMusicLibraryFolder + myArtistFolder + '/' + 
myAlbum + '/'
 myFilesList = getFileList(myAlbumPath)
 for myFile in myFilesList:
 myFilePath = myAlbumPath + myFile
 myID3 = eyed3.load(myFilePath)
 if myID3 is None:
 continue

 myArtistName = myID3.tag.artist
 if myArtistName is None:
 continue

 myAlbumName = myID3.tag.album
 if myAlbumName is None:
 continue

 myTitleName = myID3.tag.title
 if myTitleName is None:
 continue

 myCompareFileName = myFile[0:-4]
 if myCompareFileName[0].isdigit() and 
myCompareFileName[1].isdigit():
 myCompareFileName = myFile[3:-4]

 if myCompareFileName != myTitleName:
 myLength1 = len(myCompareFileName)
 myLength2 = len(myTitleName)
 print('File Name Mismatch - Artist: [' + myArtistName + '] 
 Album: ['+ myAlbumName + ']  Track: [' + myTitleName + ']  File: [' + 
myCompareFileName + ']')
 if (myLength1 == myLength2):
 print('lengths match: ',myLength1)
 else:
 print('lengths mismatch: ',myLength1,'  ',myLength2)

 print(' ')




 return myArtistsFoldlerList


"myArtistsFoldlerList"?

And so many variables starting with "my". Not wrong; just ... :-)








On 8 Jun 2022, at 00:07, MRAB  wrote:

On 2022-06-07 21:23, Dave wrote:

Thanks a lot for this! isDigit was the method I was looking for and couldn’t 
find.
I have another problem related to this, the following code uses the code you 
just sent. I am getting a files ID3 tags using eyed3, this part seems to work 
and I get expected values in this case myTitleName (Track name) is set to 
“Deadlock Holiday” and myCompareFileName is set to “01 Deadlock Holiday” (File 
Name with the Track number prepended). The is digit test works and 
myCompareFileName is set to  “Deadlock Holiday”, so they should match, right?

OT, but are you sure about that name? Isn't it "Dreadlock Holiday" (by 10cc)?

[snip]




--
https://mail.python.org/mailman/listinfo/python-list


Re: How to test characters of a string

2022-06-07 Thread Avi Gross via Python-list
Amazing how some people bring out the heavy artillery, first! LOL!

If the question was how to remove any initial digits and perhaps whitespace in 
a string, it is fairly easy to do without any functions to test if there are 
digits before the title. I mean look at initial characters and move forward if 
it is between '0' and '9' or a space. Duh!

Sure, a regular expression that matches anything following a run of digits and 
whitespace and before a ".MPG" or the end of the entry will be easy to extract 
and compare after removing any left/right whitespace in both things being 
compared and coercing both to the same case.

But the solution may be doomed to failure when it sees things like:
"100 Letters" 
"1+1" 
"10,000 hours"
"1 Trillion Dollar$" 
"2,000 Light Years From Home"


So is it necessary to insist on an exact pattern of two digits followed by a 
space? 


That would fail on "44 Minutes", "40 Oz. Dream", "50 Mission Cap", "50 Ways to 
Say Goodbye", "99 Ways to Die" 

It looks to me like you need to compare TWICE just in case. If it matches in 
the original (perhaps with some normalization of case and whitespace, fine. If 
not will they match if one or both have something to remove as a prefix such as 
"02 ". And if you are comparing items where the same song is in two different 
numeric sequences on different disks, ...




-Original Message-
From: Christian Gollwitzer 
To: python-list@python.org
Sent: Tue, Jun 7, 2022 6:01 pm
Subject: Re: How to test characters of a string

Am 07.06.22 um 21:56 schrieb Dave:
> It depends on the language I’m using, in Objective C, I’d use isNumeric, just 
> wanted to know what the equivalent is in Python.
> 

Your problem is also a typical case for regular expressions. You can 
create an expression for "starts with any number of digits plus optional 
whitespace" and then replace this with nothing:

> chris@linux-tb9f:~> ipython
> Python 3.6.15 (default, Sep 23 2021, 15:41:43) [GCC]
> Type 'copyright', 'credits' or 'license' for more information
> IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
> 
> In [1]: import re                                                             
>                                                                               
>                              
> 
> In [2]: s='05 Trinket'                                                        
>                                                                               
>                             
> 
> In [3]: re.sub(r'^\d+\s*', '', s)                                             
>                                                                               
>                              
> Out[3]: 'Trinket'
> 

If it doesn't match, it will do nothing:

> In [4]: s='Es geht los'                                                       
>                                                                               
>                              
> 
> In [5]: re.sub(r'^\d+\s*', '', s)                                             
>                                                                               
>                              
> Out[5]: 'Es geht los'

Some people on this list don't like regexes but for tasks like this they 
are made and working well.

^ is "starts with"
\d is any digit
\s is any space
+ is at least one
* is nothing or one of

Christian




-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list