Re: python3, regular expression and bytes text

2019-10-14 Thread Eko palypse
Am Montag, 14. Oktober 2019 13:56:09 UTC+2 schrieb Chris Angelico: > > (My apologies for saying this in reply to an unrelated post, but I > also don't see those posts, so it's not easy to reply to them.) > > ChrisA Nothing to apologize and thank you for clarification, I was already checking my s

Re: python3, regular expression and bytes text

2019-10-14 Thread Chris Angelico
On Mon, Oct 14, 2019 at 10:41 PM Eko palypse wrote: > > Am Sonntag, 13. Oktober 2019 21:20:26 UTC+2 schrieb moi: > > [Do not know why I spent hours with this...] > > > > To answer you question. > > Yes, I confirm. > > It seems that as soon as one works with bytes and when > > a char is encoded in

Re: python3, regular expression and bytes text

2019-10-14 Thread Eko palypse
Am Sonntag, 13. Oktober 2019 21:20:26 UTC+2 schrieb moi: > [Do not know why I spent hours with this...] > > To answer you question. > Yes, I confirm. > It seems that as soon as one works with bytes and when > a char is encoded in more than 1 byte, "re" goes into > troubles. > First, sorry for a

Re: python3, regular expression and bytes text

2019-10-12 Thread Eko palypse
First of all many thanks to everyone for the active participation. @Chris Angelico I think I understand what you illustrated with the byte example, makes sense. As it was developed for 8-bit encoding only, it cannot be used for mulitbyte encoding. @Richard Damon and @MRAB thank you very much for

Re: python3, regular expression and bytes text

2019-10-12 Thread MRAB
On 2019-10-12 20:57, Eko palypse wrote: You cannot. First, \w in re.LOCALE works only when the text is encoded with the locale encoding (cp1252 in your case). Second, re.LOCALE supports only 8-bit charsets. So even if you set the utf-8 locale, it would not help. Regular expressions with re.LO

Re: python3, regular expression and bytes text

2019-10-12 Thread Chris Angelico
On Sun, Oct 13, 2019 at 7:16 AM Richard Damon wrote: > > On 10/12/19 3:46 PM, Eko palypse wrote: > > Thank you very much for your answer. > > > >> You have to be able to match bytes, not strings. > > May I ask you to elaborate on this, sorry non-native English speaker. > > The buffer I receive is

Re: python3, regular expression and bytes text

2019-10-12 Thread MRAB
On 2019-10-12 20:48, Serhiy Storchaka wrote: 12.10.19 21:08, Eko palypse пише: So how can I make it work with utf8 encoded text? You cannot. First, \w in re.LOCALE works only when the text is encoded with the locale encoding (cp1252 in your case). Second, re.LOCALE supports only 8-bit charsets

Re: python3, regular expression and bytes text

2019-10-12 Thread Richard Damon
On 10/12/19 3:46 PM, Eko palypse wrote: > Thank you very much for your answer. > >> You have to be able to match bytes, not strings. > May I ask you to elaborate on this, sorry non-native English speaker. > The buffer I receive is a byte-like buffer. > >> I don't think you'll be able to 100% reliab

Re: python3, regular expression and bytes text

2019-10-12 Thread Chris Angelico
On Sun, Oct 13, 2019 at 6:54 AM Eko palypse wrote: > > Thank you very much for your answer. > > > You have to be able to match bytes, not strings. > > May I ask you to elaborate on this, sorry non-native English speaker. > The buffer I receive is a byte-like buffer. When you're matching text (the

Re: python3, regular expression and bytes text

2019-10-12 Thread Eko palypse
> You cannot. First, \w in re.LOCALE works only when the text is encoded > with the locale encoding (cp1252 in your case). Second, re.LOCALE > supports only 8-bit charsets. So even if you set the utf-8 locale, it > would not help. > > Regular expressions with re.LOCALE are slow. It may be more

Re: python3, regular expression and bytes text

2019-10-12 Thread Eko palypse
Thank you very much for your answer. > You have to be able to match bytes, not strings. May I ask you to elaborate on this, sorry non-native English speaker. The buffer I receive is a byte-like buffer. > I don't think you'll be able to 100% reliably match bytes in this way. > You're asking it to

Re: python3, regular expression and bytes text

2019-10-12 Thread Serhiy Storchaka
12.10.19 21:08, Eko palypse пише: So how can I make it work with utf8 encoded text? You cannot. First, \w in re.LOCALE works only when the text is encoded with the locale encoding (cp1252 in your case). Second, re.LOCALE supports only 8-bit charsets. So even if you set the utf-8 locale, it w

Re: python3, regular expression and bytes text

2019-10-12 Thread Chris Angelico
On Sun, Oct 13, 2019 at 5:11 AM Eko palypse wrote: > > What needs to be set in order to be able to use a re search within > utf8 encoded bytes? You have to be able to match bytes, not strings. > So how can I make it work with utf8 encoded text? > Note, decoding it to a string isn't preferred as

python3, regular expression and bytes text

2019-10-12 Thread Eko palypse
What needs to be set in order to be able to use a re search within utf8 encoded bytes? My test, being on a windows PC with cp1252 setup, looks like this import re import locale cp1252 = 'Ärger im Paradies'.encode('cp1252') utf8 = 'Ärger im Paradies'.encode('utf-8') print('cp1252:', cp1252) pr