Am Montag, 14. Oktober 2019 13:56:09 UTC+2 schrieb Chris Angelico:
>
> (My apologies for saying this in reply to an unrelated post, but I
> also don't see those posts, so it's not easy to reply to them.)
>
> ChrisA
Nothing to apologize and thank you for clarification,
I was already checking my s
On Mon, Oct 14, 2019 at 10:41 PM Eko palypse wrote:
>
> Am Sonntag, 13. Oktober 2019 21:20:26 UTC+2 schrieb moi:
> > [Do not know why I spent hours with this...]
> >
> > To answer you question.
> > Yes, I confirm.
> > It seems that as soon as one works with bytes and when
> > a char is encoded in
Am Sonntag, 13. Oktober 2019 21:20:26 UTC+2 schrieb moi:
> [Do not know why I spent hours with this...]
>
> To answer you question.
> Yes, I confirm.
> It seems that as soon as one works with bytes and when
> a char is encoded in more than 1 byte, "re" goes into
> troubles.
>
First, sorry for a
First of all many thanks to everyone for the active participation.
@Chris Angelico
I think I understand what you illustrated with the byte example,
makes sense. As it was developed for 8-bit encoding only,
it cannot be used for mulitbyte encoding.
@Richard Damon and @MRAB
thank you very much for
On 2019-10-12 20:57, Eko palypse wrote:
You cannot. First, \w in re.LOCALE works only when the text is encoded
with the locale encoding (cp1252 in your case). Second, re.LOCALE
supports only 8-bit charsets. So even if you set the utf-8 locale, it
would not help.
Regular expressions with re.LO
On Sun, Oct 13, 2019 at 7:16 AM Richard Damon wrote:
>
> On 10/12/19 3:46 PM, Eko palypse wrote:
> > Thank you very much for your answer.
> >
> >> You have to be able to match bytes, not strings.
> > May I ask you to elaborate on this, sorry non-native English speaker.
> > The buffer I receive is
On 2019-10-12 20:48, Serhiy Storchaka wrote:
12.10.19 21:08, Eko palypse пише:
So how can I make it work with utf8 encoded text?
You cannot. First, \w in re.LOCALE works only when the text is encoded
with the locale encoding (cp1252 in your case). Second, re.LOCALE
supports only 8-bit charsets
On 10/12/19 3:46 PM, Eko palypse wrote:
> Thank you very much for your answer.
>
>> You have to be able to match bytes, not strings.
> May I ask you to elaborate on this, sorry non-native English speaker.
> The buffer I receive is a byte-like buffer.
>
>> I don't think you'll be able to 100% reliab
On Sun, Oct 13, 2019 at 6:54 AM Eko palypse wrote:
>
> Thank you very much for your answer.
>
> > You have to be able to match bytes, not strings.
>
> May I ask you to elaborate on this, sorry non-native English speaker.
> The buffer I receive is a byte-like buffer.
When you're matching text (the
> You cannot. First, \w in re.LOCALE works only when the text is encoded
> with the locale encoding (cp1252 in your case). Second, re.LOCALE
> supports only 8-bit charsets. So even if you set the utf-8 locale, it
> would not help.
>
> Regular expressions with re.LOCALE are slow. It may be more
Thank you very much for your answer.
> You have to be able to match bytes, not strings.
May I ask you to elaborate on this, sorry non-native English speaker.
The buffer I receive is a byte-like buffer.
> I don't think you'll be able to 100% reliably match bytes in this way.
> You're asking it to
12.10.19 21:08, Eko palypse пише:
So how can I make it work with utf8 encoded text?
You cannot. First, \w in re.LOCALE works only when the text is encoded
with the locale encoding (cp1252 in your case). Second, re.LOCALE
supports only 8-bit charsets. So even if you set the utf-8 locale, it
w
On Sun, Oct 13, 2019 at 5:11 AM Eko palypse wrote:
>
> What needs to be set in order to be able to use a re search within
> utf8 encoded bytes?
You have to be able to match bytes, not strings.
> So how can I make it work with utf8 encoded text?
> Note, decoding it to a string isn't preferred as
What needs to be set in order to be able to use a re search within
utf8 encoded bytes?
My test, being on a windows PC with cp1252 setup, looks like this
import re
import locale
cp1252 = 'Ärger im Paradies'.encode('cp1252')
utf8 = 'Ärger im Paradies'.encode('utf-8')
print('cp1252:', cp1252)
pr
14 matches
Mail list logo