On Tue, Feb 10, 2015 at 5:52 AM, Skip Montanaro
wrote:
>
> This snapshot was taken against a running LibreOffice instance here at work
> (on Linux). It would appear the fancy schmancy apostrophe was hosed up before
> the data ever got to me. Had a guy here with Windows pop up the original file
On Mon, Feb 9, 2015 at 11:54 AM, Matthew Ruffalo wrote:
> I think it's most likely that the encoding issues happened in the export
> from XLSX to CSV (unless the data is malformed in the original XLSX
> file, of course).
Aha! Lookee here... (my apologies to all you HTML mail haters - sometimes
it
On Mon, Feb 9, 2015 at 2:38 PM, Skip Montanaro
wrote:
> On Mon, Feb 9, 2015 at 2:05 PM, Zachary Ware
> wrote:
> > If all else fails, you can try ftfy to fix things:
> > http://ftfy.readthedocs.org/en/latest/
>
> Thanks for the pointer. I would prefer to not hand-mangle this stuff
> in case I get
On Mon, Feb 9, 2015 at 11:32 AM, Skip Montanaro
wrote:
> LibreOffice spit out a CSV file
> (with those three odd bytes). My script sucked in the CSV file and
> inserted data into my SQLite db.
If all else fails, you can try ftfy to fix things:
http://ftfy.readthedocs.org/en/latest/
>>> import
On 09/02/2015 03:44, Skip Montanaro wrote:
I am trying to process a CSV file using Python 3.5 (CPython tip as of a
week or so ago). According to chardet[1], the file is encoded as utf-8:
>>> s = open("data/meets-usms.csv", "rb").read()
>>> len(s)
562272
>>> import chardet
>>> chardet.detect(
On 02/09/2015 12:30 PM, Skip Montanaro wrote:
> Thanks, Chris. Are you telling me I should have defined the input file
> encoding for my CSV file as CP-1252, or that something got hosed on
> the export from XLSX to CSV? Or something else?
>
> Skip
Hi Skip-
I think it's most likely that the encodi
On Tue, Feb 10, 2015 at 4:32 AM, Skip Montanaro
wrote:
> On Sun, Feb 8, 2015 at 10:51 PM, Steven D'Aprano
> wrote:
>> The second question is, are you
>> using Windows?
>
> No, I'm on a Mac (as, I think I indicated in my original note). All
> transformations occurred on a Mac. LibreOffice spit out
On Tue, Feb 10, 2015 at 4:30 AM, Skip Montanaro
wrote:
> On Sun, Feb 8, 2015 at 9:58 PM, Chris Angelico wrote:
>> Those three characters are the CP-1252 decode of the bytes for U+2019
>> in UTF-8 (E2 80 99). Not sure if that helps any, but given that it was
>> an XLSX file, Windows codepages are
On Sun, Feb 8, 2015 at 10:51 PM, Steven D'Aprano
wrote:
> The second question is, are you
> using Windows?
No, I'm on a Mac (as, I think I indicated in my original note). All
transformations occurred on a Mac. LibreOffice spit out a CSV file
(with those three odd bytes). My script sucked in the C
On Sun, Feb 8, 2015 at 9:58 PM, Chris Angelico wrote:
> Those three characters are the CP-1252 decode of the bytes for U+2019
> in UTF-8 (E2 80 99). Not sure if that helps any, but given that it was
> an XLSX file, Windows codepages are reasonably likely to show up.
Thanks, Chris. Are you telling
Skip Montanaro wrote:
> sqlite> select meetname from swimmeet where meetname like
> '%Barracuda%Patrick%';
> Anderson Barracudas St. Patrick's Day Swim Meet
> Anderson Barracuda Masters - 2010 St. Patrick’s Day Swim Meet
> Anderson Barracuda Masters 2011 St. Patrick’s Day Swim Meet
> Anderson
On Mon, Feb 9, 2015 at 2:44 PM, Skip Montanaro wrote:
> Anderson Barracuda Masters - 2010 St. Patrick’s Day Swim Meet
Those three characters are the CP-1252 decode of the bytes for U+2019
in UTF-8 (E2 80 99). Not sure if that helps any, but given that it was
an XLSX file, Windows codepages are
I am trying to process a CSV file using Python 3.5 (CPython tip as of a
week or so ago). According to chardet[1], the file is encoded as utf-8:
>>> s = open("data/meets-usms.csv", "rb").read()
>>> len(s)
562272
>>> import chardet
>>> chardet.detect(s)
{'encoding': 'utf-8', 'confidence': 0.99}
so
13 matches
Mail list logo