On Mon, 20 Aug 2012 00:44:22 -0400, Roy Smith wrote:
> In article <5031bb2f$0$29972$c3e8da3$54964...@news.astraweb.com>,
> Steven D'Aprano wrote:
>
>> > So it may be with utf-8 someday.
>>
>> Only if you believe that people's ability to generate data will remain
>> lower than people's ability
On 08/19/2012 11:51 AM, wxjmfa...@gmail.com wrote:
> Five minutes after a closed my interactive interpreters windows,
> the day I tested this stuff. I though:
> "Too bad I did not noted the extremely bad cases I found, I'm pretty
> sure, this problem will arrive on the table".
Reading through this
On Aug 19, 11:11 pm, wxjmfa...@gmail.com wrote:
> Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit :
>
>
>
> > But they are not ascii pages, they are (as stated) MOSTLY ascii.
>
> > E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses
>
> > a much more memory-expensive enco
Steven D'Aprano writes:
> Paul Rubin already told you about his experience using OCR to generate
> multiple terrabytes of text, and how he would not be happy if that was
> stored in UCS-4.
That particular text was stored on disk as compressed XML that had UTF-8
in the data fields, but I think R
101 - 104 of 104 matches
Mail list logo