On Sat, Jan 18, 2014 at 8:41 PM, Gregory Ewing
wrote:
> Chris Angelico wrote:
>>
>> On Fri, Jan 17, 2014 at 8:10 PM, Mark Lawrence
>> wrote:
>>
>> Every time I see it I picture Inspector
>>>
>>> Clouseau, "A BOM!!!" :)
>>
>>
>> Special delivery, a berm! Were you expecting one?
>
>
> A berm? Is th
Chris Angelico wrote:
On Fri, Jan 17, 2014 at 8:10 PM, Mark Lawrence wrote:
Every time I see it I picture Inspector
Clouseau, "A BOM!!!" :)
Special delivery, a berm! Were you expecting one?
A berm? Is that anything like a shrubbery?
--
Greg
--
https://mail.python.org/mailman/listinfo/pyth
On 17/01/2014 18:43, Tim Chase wrote:
On 2014-01-17 09:10, Mark Lawrence wrote:
Slight aside, any chance of changing the subject of this thread, or
even ending the thread completely? Why? Every time I see it I
picture Inspector Clouseau, "A BOM!!!" :)
In discussions regarding BOMs, I regular
On 2014-01-17 09:10, Mark Lawrence wrote:
> Slight aside, any chance of changing the subject of this thread, or
> even ending the thread completely? Why? Every time I see it I
> picture Inspector Clouseau, "A BOM!!!" :)
In discussions regarding BOMs, I regularly get the "All your base"
meme from
On 01/17/2014 08:46 AM, Pete Forman wrote:
Chris Angelico writes:
On Fri, Jan 17, 2014 at 8:10 PM, Mark Lawrence wrote:
Slight aside, any chance of changing the subject of this thread, or even
ending the thread completely? Why? Every time I see it I picture Inspector
Clouseau, "A BOM!!!" :)
On Sat, Jan 18, 2014 at 3:30 AM, Rustom Mody wrote:
> If you or I break a standard then, well, we broke a standard.
> If Microsoft breaks a standard the standard is obliged to change.
>
> Or as the saying goes, everyone is equal though some are more equal.
https://en.wikipedia.org/wiki/800_pound_
Chris Angelico writes:
> On Fri, Jan 17, 2014 at 8:10 PM, Mark Lawrence
> wrote:
>> Slight aside, any chance of changing the subject of this thread, or even
>> ending the thread completely? Why? Every time I see it I picture Inspector
>> Clouseau, "A BOM!!!" :)
>
> Special delivery, a berm! We
On Sat, Jan 18, 2014 at 3:26 AM, Pete Forman wrote:
> It would have been nice if there was an eighth encoding scheme defined
> there UTF-8NB which would be UTF-8 with BOM not allowed.
Or call that one UTF-8, and the one with the marker can be UTF-8-MS-NOTEPAD.
ChrisA
--
https://mail.python.org/
On Friday, January 17, 2014 9:56:28 PM UTC+5:30, Pete Forman wrote:
> Rustom Mody writes:
> > On Friday, January 17, 2014 7:10:05 AM UTC+5:30, Tim Chase wrote:
> >> On 2014-01-17 11:14, Chris Angelico wrote:
> >> > UTF-8 specifies the byte order
> >> > as part of the protocol, so you don't need t
Rustom Mody writes:
> On Friday, January 17, 2014 7:10:05 AM UTC+5:30, Tim Chase wrote:
>> On 2014-01-17 11:14, Chris Angelico wrote:
>> > UTF-8 specifies the byte order
>> > as part of the protocol, so you don't need to mark it.
>
>> You don't need to mark it when writing, but some idiots use it
On Fri, Jan 17, 2014 at 8:47 PM, Mark Lawrence wrote:
> On 17/01/2014 09:43, Chris Angelico wrote:
>>
>> On Fri, Jan 17, 2014 at 8:10 PM, Mark Lawrence
>> wrote:
>>>
>>> Slight aside, any chance of changing the subject of this thread, or even
>>> ending the thread completely? Why? Every time I
On 17/01/2014 09:43, Chris Angelico wrote:
On Fri, Jan 17, 2014 at 8:10 PM, Mark Lawrence wrote:
Slight aside, any chance of changing the subject of this thread, or even
ending the thread completely? Why? Every time I see it I picture Inspector
Clouseau, "A BOM!!!" :)
Special delivery, a be
On Fri, Jan 17, 2014 at 8:10 PM, Mark Lawrence wrote:
> Slight aside, any chance of changing the subject of this thread, or even
> ending the thread completely? Why? Every time I see it I picture Inspector
> Clouseau, "A BOM!!!" :)
Special delivery, a berm! Were you expecting one?
ChrisA
--
h
On 17/01/2014 01:40, Tim Chase wrote:
On 2014-01-17 11:14, Chris Angelico wrote:
UTF-8 specifies the byte order
as part of the protocol, so you don't need to mark it.
You don't need to mark it when writing, but some idiots use it
anyway. If you're sniffing a file for purposes of reading, you
On Friday, January 17, 2014 7:10:05 AM UTC+5:30, Tim Chase wrote:
> On 2014-01-17 11:14, Chris Angelico wrote:
> > UTF-8 specifies the byte order
> > as part of the protocol, so you don't need to mark it.
> You don't need to mark it when writing, but some idiots use it
> anyway. If you're sniffin
On 2014-01-17 11:14, Chris Angelico wrote:
> UTF-8 specifies the byte order
> as part of the protocol, so you don't need to mark it.
You don't need to mark it when writing, but some idiots use it
anyway. If you're sniffing a file for purposes of reading, you need
to look for it and remove it from
On Thu, 16 Jan 2014 11:37:29 -0800, Albert-Jan Roskam wrote:
> On Thu, 1/16/14, Chris
> Angelico wrote:
>
> Subject: Re: Guessing the encoding from a BOM To:
> Cc: "python-list@python.org" Date: Thursday,
> January 16,
On Fri, Jan 17, 2014 at 6:37 AM, Albert-Jan Roskam wrote:
> Can you elaborate on that? Unless your utf-8 files will only contain ascii
> characters I do not understand why you would not want a bom utf-8.
It's completely unnecessary, and could cause problems (the BOM is
actually whitespace, albei
On Thu, 1/16/14, Chris Angelico wrote:
Subject: Re: Guessing the encoding from a BOM
To:
Cc: "python-list@python.org"
Date: Thursday, January 16, 2014, 7:06 PM
On Fri, Jan 17, 2014 at 5:01 AM,
Björn Lindqvist
wrote:
> 201
On 2014-01-17 05:06, Chris Angelico wrote:
> > You might want to add the utf8 bom too: '\xEF\xBB\xBF'.
>
> I'd actually rather not. It would tempt people to pollute UTF-8
> files with a BOM, which is not necessary unless you are MS Notepad.
If the intent is to just sniff and parse the file acco
On Fri, Jan 17, 2014 at 5:01 AM, Björn Lindqvist wrote:
> 2014/1/16 Steven D'Aprano :
>> def guess_encoding_from_bom(filename, default):
>> with open(filename, 'rb') as f:
>> sig = f.read(4)
>> if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
>> return 'utf_16'
>> elif si
2014/1/16 Steven D'Aprano :
> def guess_encoding_from_bom(filename, default):
> with open(filename, 'rb') as f:
> sig = f.read(4)
> if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
> return 'utf_16'
> elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')):
>
On 01/15/2014 10:55 PM, Steven D'Aprano wrote:
On Thu, 16 Jan 2014 14:47:00 +1100, Ben Finney wrote:
+1. I'd like a custom exception class, sub-classed from ValueError.
Why ValueError? It's not really a "invalid value" error, it's more "my
heuristic isn't good enough" failure. (Maybe the file
On Thu, 16 Jan 2014 14:47:00 +1100, Ben Finney wrote:
> Steven D'Aprano writes:
>
>> enc = guess_encoding_from_bom("filename") if enc == something:
>> # Can't guess, fall back on an alternative strategy ...
>> else:
>> f = open("filename", encoding=enc)
>>
>>
>> If I forget to check th
On Thu, 16 Jan 2014 16:01:56 +1100, Chris Angelico wrote:
> On Thu, Jan 16, 2014 at 1:13 PM, Steven D'Aprano
> wrote:
>> if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
>> return 'utf_16'
>> elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')):
>> return 'utf_32'
On 01/15/2014 07:47 PM, Ben Finney wrote:
Steven D'Aprano writes:
(4) Don't return anything, but raise an exception. (But
which exception?)
+1. I'd like a custom exception class, sub-classed from ValueError.
+1
--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-lis
On Thu, Jan 16, 2014 at 1:13 PM, Steven D'Aprano
wrote:
> if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')):
> return 'utf_16'
> elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')):
> return 'utf_32'
I'd swap the order of these two checks. If the file starts FF FE
Steven D'Aprano writes:
> enc = guess_encoding_from_bom("filename")
> if enc == something:
> # Can't guess, fall back on an alternative strategy
> ...
> else:
> f = open("filename", encoding=enc)
>
>
> If I forget to check the returned result, I should get an explicit
> failure as
I have a function which guesses the likely encoding used by text files by
reading the BOM (byte order mark) at the beginning of the file. A
simplified version:
def guess_encoding_from_bom(filename, default):
with open(filename, 'rb') as f:
sig = f.read(4)
if sig.startswith((b'\x
29 matches
Mail list logo