Re: The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

BartC Sat, 12 Mar 2016 03:57:07 -0800

On 12/03/2016 02:20, Chris Angelico wrote:

On Sat, Mar 12, 2016 at 12:16 PM, BartC <[email protected]> wrote:

'Switch' testing benchmark. The little program show below reads a text file
(I used the entire CPython C sources, 6MB), and counts the number of
characters of each category in upper, lower, digit and other.

(Note there are other ways to approach this task, but a proper 'lexer'
usually does more than count. 'Switch' then becomes invaluable.)


Are you assuming that the files are entirely ASCII? (They're not.) Or
are you simply declaring that all non-ASCII characters count as
"other"?

Once again, you cannot ignore Unicode and pretend that everything's
ASCII, or eight-bit characters, or something. Asking if a character is
upper/lower/digit/other is best done with the unicodedata module.

If you're looking at fast processing of language source code (in athread partly about efficiency), then you cannot ignore the fact thatthe vast majority of characters being processed are going to have ASCIIcodes.

Language syntax could anyway stipulate that certain tokens can onlyconsist of characters within the ASCII range.


So I'm not ignoring Unicode, but being realistic.

(My benchmark was anyway just demonstrating a possible use for 'switch'that more or less matched your own example!)


--
Bartc

--
https://mail.python.org/mailman/listinfo/python-list

Re: The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Reply via email to