Balanced trees (was: Re: Tuples and immutability)

2014-03-08 Thread Marko Rauhamaa
Ian Kelly :

> I already mentioned this earlier in the thread, but a balanced binary
> tree might implement += as node insertion and then return a different
> object if the balancing causes the root node to change.

True.

Speaking of which, are there plans to add a balanced tree to the
"batteries" of Python? Timers, cache aging and the like need it. I'm
using my own AVL tree implementation, but I'm wondering why Python
still doesn't have one.

In fact, since asyncio has timers but Python doesn't have balanced
trees, I'm led to wonder how good the asyncio implementation can be.

Note that Java "batteries" include TreeMap.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Balanced trees (was: Re: Tuples and immutability)

2014-03-08 Thread Ian Kelly
On Sat, Mar 8, 2014 at 1:34 AM, Marko Rauhamaa  wrote:
> Speaking of which, are there plans to add a balanced tree to the
> "batteries" of Python? Timers, cache aging and the like need it. I'm
> using my own AVL tree implementation, but I'm wondering why Python
> still doesn't have one.

None currently that I'm aware of.  If you want to propose adding one,
I suggest reading:

http://docs.python.org/devguide/stdlibchanges.html

> In fact, since asyncio has timers but Python doesn't have balanced
> trees, I'm led to wonder how good the asyncio implementation can be.

Peeking at the code, it appears to use a heapq-based priority queue.
Why would a balanced binary tree be better?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Balanced trees

2014-03-08 Thread Marko Rauhamaa
Ian Kelly :

> Peeking at the code, it appears to use a heapq-based priority queue.
> Why would a balanced binary tree be better?

AFAIK, a heap queue doesn't allow for the deletion of a random element
forcing you to leave the canceled timers in the queue to be deleted
later.

In a very typical scenario, networking entities start timers very
frequently (depending on the load, maybe at 100..1000 Hz) but cancel
virtually every one of them, leading to some wakeup churn and extra
memory load. I don't know if the churn is better or worse than the tree
balancing overhead.

Imagine a web server that received HTTP connections. You might want to
specify a 10-minute idle timeout for the connections. In the heapq timer
implementation, your connection objects are kept in memory for 10
minutes even if they are closed gracefully because the canceled timer
maintains a reference to the object.

Of course, it may be that the heapq implementation sets the callback to
None leaving only the minimal timer object lingering and waiting to come
out of the digestive tract.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


How to recovery the default "Library/Python/" folder on Mac?

2014-03-08 Thread Harry Wood

How to recovery the default "Library/Python/"  folder on Mac? 


I delete it by some mistakes..., I have tried the following steps:

- Step 1. Download and install Python DMG from Python.org .

   Result: There are no Python folders under Library after I installed the 
Python DMG.

- Step 2. I tried to use " brew install python " in Terminal.
   Result: $ brew uninstall python 
 Error: No such keg: 


What should I need to do now? Please give me some specific directions, Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Critic: New Python Front Page

2014-03-08 Thread Nils-Hero Lindemann
Hi,

(Please forgive (or correct) my mistakes, i am non native)

http://www.python.org/community/sigs/retired/parser-sig/towards-standard/
* Formatting bug
* Needs breadcrumb navigation ("Where am i?")
  (i came there via http://theory.stanford.edu/~amitp/yapps/)
* blue background makes top links invisible when i look on laptop
  screen from a 60% angle. Prefer white.
* 1 third of space on y-axis is used for 15 navigation
  links and a search bar.
* 1 third of space on x-axis is used for one sentence about and a link
  to PSF.
* I can not easily get rid of this by using e.g. HackTheWeb
  (https://addons.mozilla.org/de/firefox/addon/hack-the-web/).
  Please compare with e.g. Wikipedia, it is easy there to isolate the
  main content.

http://www.python.org/community/
That picture is scary.

Regards, Nils

-- 
Nils-Hero Lindemann 
-- 
https://mail.python.org/mailman/listinfo/python-list


Python performance

2014-03-08 Thread JCosta
I did some work in c# and java and I converted some application to Python; I 
noticed Python is much slower than the other languages.

Is this normal ?
Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


spam (wasRe: extract from json)

2014-03-08 Thread Mark Lawrence

On 08/03/2014 03:49, Chris Angelico wrote:

On Sat, Mar 8, 2014 at 2:21 PM,   wrote:

I think it's better if you (CENSORED) off.


Teddybubu, please understand that the above comment is from a spammer
and does not reflect the prevailing attitude of this list. I don't
like to make content-free posts like this, but as you already have the
answer you need, there's not a lot for me to add :)

ChrisA



This particular PITA of a spammer is one of the very few that I see on 
Thunderbird via gmane.  I believe that Terry Reedy amongst others does a 
good job of keeping us relatively spam free.  Thanks all.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: spam (wasRe: extract from json)

2014-03-08 Thread Chris Angelico
On Sun, Mar 9, 2014 at 12:13 AM, Mark Lawrence  wrote:
> On 08/03/2014 03:49, Chris Angelico wrote:
>>
>> On Sat, Mar 8, 2014 at 2:21 PM,   wrote:
>>>
>>> I think it's better if you (CENSORED) off.
>>
>>
>> Teddybubu, please understand that the above comment is from a spammer
>> and does not reflect the prevailing attitude of this list. I don't
>> like to make content-free posts like this, but as you already have the
>> answer you need, there's not a lot for me to add :)
>>
>> ChrisA
>>
>
> This particular PITA of a spammer is one of the very few that I see on
> Thunderbird via gmane.  I believe that Terry Reedy amongst others does a
> good job of keeping us relatively spam free.  Thanks all.

Normally I ignore him (and yes, I see those posts too, in Gmail).
Stand-alone posts aren't an issue. I just didn't want a new poster to
see that reply go through unchallenged.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python performance

2014-03-08 Thread Chris Angelico
On Sat, Mar 8, 2014 at 11:53 PM, JCosta  wrote:
> I did some work in c# and java and I converted some application to Python; I 
> noticed Python is much slower than the other languages.
>
> Is this normal ?
> Thanks

The first thing to look at is the conversion. If you convert idiomatic
Java code into the nearest-equivalent Python, it won't be idiomatic
Python, and it'll probably underperform. (This is especially true if
you create a whole lot of objects, use long chains of classes with
dots, and so on. Java follows dotted name chains at compile time,
Python does at run time.)

Another thing to consider is that Python, while very convenient, isn't
always the fastest at heavy numerical computation. For that, there are
some dedicated libraries, like NumPy, which can do that for you.

But it's also worth checking whether the speed difference even
matters. Are you able to see a real difference, as a human, or is this
just benchmarks? It's not a problem for something to take 5ms in
Python that would take 2ms in Java, if that time is spent responding
to a user's click - the user won't see that difference!

If you post a bit of code, we can help you to see what's going on.
Best bit of code to post would be the slowest - and since you're
talking about performance, you _have_ profiled your code and found
which bit's the slowest, right? :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Critic: New Python Front Page

2014-03-08 Thread Ned Batchelder

On 3/8/14 7:44 AM, Nils-Hero Lindemann wrote:

Hi,

(Please forgive (or correct) my mistakes, i am non native)

http://www.python.org/community/sigs/retired/parser-sig/towards-standard/
* Formatting bug
* Needs breadcrumb navigation ("Where am i?")
   (i came there via http://theory.stanford.edu/~amitp/yapps/)
* blue background makes top links invisible when i look on laptop
   screen from a 60% angle. Prefer white.
* 1 third of space on y-axis is used for 15 navigation
   links and a search bar.
* 1 third of space on x-axis is used for one sentence about and a link
   to PSF.
* I can not easily get rid of this by using e.g. HackTheWeb
   (https://addons.mozilla.org/de/firefox/addon/hack-the-web/).
   Please compare with e.g. Wikipedia, it is easy there to isolate the
   main content.

http://www.python.org/community/
That picture is scary.

Regards, Nils



The source for the site is on github, and they are taking and resolving 
issues there: https://github.com/python/pythondotorg/issues


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: gdb unable to read python frame information

2014-03-08 Thread Wesley
python debuginfo is installed...
Still,py-bt, py-locals.etc cannot read python frame
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: gdb unable to read python frame information

2014-03-08 Thread Wesley
1. install gdb from source with configure option --with-python

2. install python from source with configure option --with-pydebug

3. Got error in gdb here:
2.6.6 (r266:84292, Jan 22 2014, 09:42:36) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
(gdb) py-bt
Undefined command: "py-bt".  Try "help".
(gdb) python
>import libpython
>end
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/share/gdb/python/libpython.py", line 49, in 
_type_size_t = gdb.lookup_type('size_t')
gdb.error: No type named size_t.
Error while executing Python code.
(gdb) 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python performance

2014-03-08 Thread Tim Chase
On 2014-03-08 04:53, JCosta wrote:
> I did some work in c# and java and I converted some application to
> Python; I noticed Python is much slower than the other languages.
> 
> Is this normal ?

It depends.

Did you write C#/Java in Python (i.e., use C# or Java idioms in
Python), or did you write Pythonic code?

Check your algorithms and storage classes for performance
characteristics (if you used an O(1) algorithm/container in C#/Java
but used an O(N) algorithm/container in Python) and make sure they
match.

What sorts of operations are you doing?  Are you CPU-bound, I/O
bound, or memory-bound?  Have you profiled to see where the hot-spots
are?

Personally, I've found that most of my code is I/O-bound (disk or
network), and that very rarely has CPU been much of a problem
(usually checking my algorithm if there's trouble; occasionally I'm
stuck with an O(N^2) algorithm and no language-choice.  For some
folks, using one of the specialty-math libraries can speed up numeric
processing.  If I know that memory could be an issue, I tend to switch
to a disk-based data-stores to head off any trouble.

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Critic: New Python Front Page

2014-03-08 Thread Ned Batchelder

On 3/8/14 8:31 AM, Ned Batchelder wrote:

On 3/8/14 7:44 AM, Nils-Hero Lindemann wrote:

Hi,

(Please forgive (or correct) my mistakes, i am non native)

http://www.python.org/community/sigs/retired/parser-sig/towards-standard/
* Formatting bug
* Needs breadcrumb navigation ("Where am i?")
   (i came there via http://theory.stanford.edu/~amitp/yapps/)
* blue background makes top links invisible when i look on laptop
   screen from a 60% angle. Prefer white.
* 1 third of space on y-axis is used for 15 navigation
   links and a search bar.
* 1 third of space on x-axis is used for one sentence about and a link
   to PSF.
* I can not easily get rid of this by using e.g. HackTheWeb
   (https://addons.mozilla.org/de/firefox/addon/hack-the-web/).
   Please compare with e.g. Wikipedia, it is easy there to isolate the
   main content.

http://www.python.org/community/
That picture is scary.

Regards, Nils



The source for the site is on github, and they are taking and resolving
issues there: https://github.com/python/pythondotorg/issues



Also, I agree with you about the picture on the community page: 
https://github.com/python/pythondotorg/issues/265


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: gdb unable to read python frame information

2014-03-08 Thread Mark Lawrence

On 08/03/2014 13:32, Wesley wrote:

python debuginfo is installed...
Still,py-bt, py-locals.etc cannot read python frame



If you don't provide context people are less likely to help you.

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Assertions are bad, m'kay?

2014-03-08 Thread Steven D'Aprano
On Fri, 07 Mar 2014 16:15:36 -0800, Dan Stromberg wrote:

> On Fri, Mar 7, 2014 at 3:11 AM, Steven D'Aprano
>  wrote:
> 
> 
>> Assertions are not bad! They're just misunderstood and abused.
> 
>> You should read this guy's blog post on when to use assert:
>>
>> http://import-that.dreamwidth.org/676.html
> 
> Nice article.
> 
> BTW, what about:
> 
> if value >= 3:
>raise AssertionError('value must be >= 3')
> 
> ?

The error message is misleading. But you've probably noticed that by 
now :-)

What about it? Since it's missing any context, it could be a good use of 
an exception or a terrible use. Where does value come from? Why is there 
a restriction on the value?

As I see it, there are likely two reasons for writing such a test:

1) You're testing a value that comes from the user, or some 
   library you don't control; or

2) You're testing some internal invariant, a contract between 
   two parts of your own code, a piece of internal logic, etc.


In the first case, I don't think you should raise AssertionError. A 
ValueError would be more appropriate.

In the second case, using an assert might be better, since that gives you 
the opportunity to remove it at compile-time, if you choose.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: gdb unable to read python frame information

2014-03-08 Thread Wesley
Now I use gdb python -p 
then, import libpython
py-bt is null, py-locals raise here:
Unable to locate python frame

What's going on...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: gdb unable to read python frame information

2014-03-08 Thread Wesley
So, let me clarify here, in order to try, I get a clean machine.

Centos 6.5 64bit.
Now , I try this:
1. install gdb 7.7 from source , with configure option --with-python

2. install python 2.6.6 from source, with configure option --with-pydebug

3. run a python script

4. from command line, gdb python -p  to attach the running script

5. within gdb, issue python, import libpython, end 
  no errors
6. py-bt outputs nothing, py-locals says Unable to locate python frame
here is the snippet:
[root@localhost Python-2.6.6]# gdb python 52315
GNU gdb (GDB) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
Attaching to program: /home/nipen/test/Python-2.6.6/python, process 52315
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x0030a98e15c3 in ?? ()
(gdb) bt
#0  0x0030a98e15c3 in ?? ()
#1  0x7f4cf68d1219 in ?? ()
#2  0x in ?? ()
(gdb) py-bt
Undefined command: "py-bt".  Try "help".
(gdb) python
>import libpython
>end
(gdb) py-bt
(gdb) 
(gdb) py-locals
Unable to locate python frame
(gdb) 
Unable to locate python frame
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Critic: New Python Front Page

2014-03-08 Thread Nils-Hero Lindemann
Hello,

> The source for the site is on github, and they are taking and resolving
> issues there: https://github.com/python/pythondotorg/issues

Thanks for pointing me to the right place. I copypasted my mail to ...
https://github.com/python/pythondotorg/issues/266

also i added a comment to ...
https://github.com/python/pythondotorg/issues/265

Regards, Nils


-- 
Nils-Hero Lindemann 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to extract contents of inner text of html tag?

2014-03-08 Thread Jason Friedman
> for line in all_kbd:
>if line.string == None:

I modified your code slightly:
for line in all_kbd:
print(line)
sys.exit()
if line.string == None:

Running the new script yields:
$ python shibly.py

cp -v --remove-destination /usr/share/zoneinfo/

   \
/etc/localtime


Meaning that
all_kbd=soup.find_all('kbd')

yields only a single string, not multiple strings as I'm guessing you expected.

You might also consider running your program as:

python -m pdb your_program.py
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python performance

2014-03-08 Thread Marko Rauhamaa
JCosta :

> I did some work in c# and java and I converted some application to
> Python; I noticed Python is much slower than the other languages.
>
> Is this normal ?

Yes. The main reason is the dot notation, which in C through Java is
implemented by the compiler as a fixed offset to a memory structure.
High-level programming languages such as Python implement it through a
hash table lookup.

That's the price of keeping everything dynamic: the structural content
is free to change any time during the execution of the program. I have
heard (but not experienced first-hand) that some ingenious heuristic
optimizations have made Common Lisp code come close to C-style
performance. Google was gung ho about repeating the feat on Python, but
seem to have given up.

The second costly specialty of Python is the way objects are
instantiated. Each object is given a "personalized" dispatch table. That
costs time and memory but is extremely nice for the programmer.

In a word, Python is a godsend if its performance is good enough for
your needs. For other needs, you have other programming languages, and
you buy the performance dearly.

Java is a great programming language, as C# must also be. However, for
the needs where you need to drop out of Python, one must ask if you
weren't better off writing some core parts in C and integrating them
with Python.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python performance

2014-03-08 Thread JCosta
Sábado, 8 de Março de 2014 12:53:57 UTC, JCosta escreveu:
> I did some work in c# and java and I converted some application to Python; I 
> noticed Python is much slower than the other languages.
> 
> 
> 
> Is this normal ?
> 
> Thanks

...

Thanks for the help (Chris, Tim and Marko) and  it´s clear now for me ...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python performance

2014-03-08 Thread Mark Lawrence

On 08/03/2014 18:30, JCosta wrote:

Sábado, 8 de Março de 2014 12:53:57 UTC, JCosta escreveu:

I did some work in c# and java and I converted some application to Python; I 
noticed Python is much slower than the other languages.



Is this normal ?

Thanks


...

Thanks for the help (Chris, Tim and Marko) and  it´s clear now for me ...



You might like to check this out 
https://wiki.python.org/moin/PythonSpeed/PerformanceTips


Would you also please read and action this 
https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the 
double line spacing and single line paragraphs above, thanks.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Python performance

2014-03-08 Thread Ned Batchelder

On 3/8/14 7:53 AM, JCosta wrote:

I did some work in c# and java and I converted some application to Python; I 
noticed Python is much slower than the other languages.

Is this normal ?
Thanks



Your question, and the replies so far in this thread, have overlooked 
the difference between language and implementation.  Python as a 
language has no inherent speed.  Your question is really about CPython, 
the reference and most-common implementation of the language.  It 
interprets virtual-machine bytecode, and so will pay a penalty for 
compute-bound code.


But PyPy is another implementation of Python.  It uses a JIT to produce 
native code automatically, and can impressively speed up the execution 
of Python programs.  You should give it a try to see if it will help in 
your situation.


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: Balanced trees (was: Re: Tuples and immutability)

2014-03-08 Thread Dan Stromberg
On Sat, Mar 8, 2014 at 12:34 AM, Marko Rauhamaa  wrote:
> Ian Kelly :
>
>> I already mentioned this earlier in the thread, but a balanced binary
>> tree might implement += as node insertion and then return a different
>> object if the balancing causes the root node to change.
>
> True.
>
> Speaking of which, are there plans to add a balanced tree to the
> "batteries" of Python? Timers, cache aging and the like need it. I'm
> using my own AVL tree implementation, but I'm wondering why Python
> still doesn't have one.

I think it'd probably be a good idea to add one or more balanced
binary trees to the standard library.  But I suspect it's been tried
before, and didn't happen.  It might be good to add an _un_balanced
tree too, since they do quite well with random keys.

Here's a performance comparison I did of a bunch of tree types in Python:
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-01/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Balanced trees

2014-03-08 Thread Mark Lawrence

On 08/03/2014 19:58, Dan Stromberg wrote:

On Sat, Mar 8, 2014 at 12:34 AM, Marko Rauhamaa  wrote:

Ian Kelly :


I already mentioned this earlier in the thread, but a balanced binary
tree might implement += as node insertion and then return a different
object if the balancing causes the root node to change.


True.

Speaking of which, are there plans to add a balanced tree to the
"batteries" of Python? Timers, cache aging and the like need it. I'm
using my own AVL tree implementation, but I'm wondering why Python
still doesn't have one.


I think it'd probably be a good idea to add one or more balanced
binary trees to the standard library.  But I suspect it's been tried
before, and didn't happen.  It might be good to add an _un_balanced
tree too, since they do quite well with random keys.

Here's a performance comparison I did of a bunch of tree types in Python:
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-01/



I've found this link useful http://kmike.ru/python-data-structures/

I also don't want all sorts of data structures added to the Python 
library.  I believe that there are advantages to leaving specialist data 
structures on pypi or other sites, plus it means Python in a Nutshell 
can still fit in your pocket and not a 40 ton articulated lorry, unlike 
the Java equivalent.


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Balanced trees

2014-03-08 Thread Marko Rauhamaa
Mark Lawrence :

> I believe that there are advantages to leaving specialist data
> structures on pypi or other sites, plus it means Python in a Nutshell
> can still fit in your pocket and not a 40 ton articulated lorry,
> unlike the Java equivalent.

An ordered map is a foundational data structure as opposed to, say, a
priority queue, let alone something like urllib2.

If I had to choose between a hash table and AVL (or RB) tree in the
standard library, it would definitely have to be the latter. It is more
generally usable, has fewer corner cases and probably has an equal
performance even in hash tables' sweet spot.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python programming

2014-03-08 Thread John Ladasky
On Friday, March 7, 2014 4:38:54 PM UTC-8, Dennis Lee Bieber wrote:
> On Fri, 7 Mar 2014 10:03:35 -0800 (PST), John Ladasky
>  declaimed the following:
> 
>>   More than once, I have queried Google with the phrase "Why isn't FORTRAN
>> dead yet?"  For some reason, it lives on.  I can't say that I understand
>> why.  
>
>   Well, for one thing, no one can justify rewriting all the numerics
> libraries... LAPACK http://en.wikipedia.org/wiki/LAPACK , NEC-2
> http://en.wikipedia.org/wiki/Numerical_Electromagnetics_Code (and likely
> NEC-4).

I have used Numpy for years, and I'm pretty sure that Numpy calls LAPACK under 
the hood.  But if that is true, then I get LAPACK as a pre-compiled binary.  I 
didn't need a FORTRAN compiler until last week.

If one or two specialized applications are the only reason we are keeping a 50 
year-old programming language around, I would be tempted to rewrite those 
applications -- in C, at least.  C's not dead yet!  (It's just resting!)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Balanced trees

2014-03-08 Thread Roy Smith
In article <87eh2ctmht@elektro.pacujo.net>,
 Marko Rauhamaa  wrote:

> If I had to choose between a hash table and AVL (or RB) tree in the
> standard library, it would definitely have to be the latter. It is more
> generally usable, has fewer corner cases and probably has an equal
> performance even in hash tables' sweet spot.

The C++ folks made that decision, and people spent the next 10 years 
complaining, "Why is there no hash table in STL?"
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Tuples and immutability

2014-03-08 Thread Gregory Ewing

Ian Kelly wrote:

class LessThanFilter:

def __init__(self, the_list):
self._the_list = the_list

def __getitem__(self, bound):
return [x for x in self._the_list if x < bound]


filter = LessThanFilter([10, 20, 30, 40, 50])
filter[25] += [15, 17, 23]

Should that last line not raise an exception?


In this case it will fail to catch what is probably an error,
but you can't expect the language to find all your bugs for
you. If you wrote the same bug this way:

   filter[25].extend([15, 17, 23])

it wouldn't be caught either.

What's happening is that we're trying to use the syntax
a += b to mean two different things:

1) Shorthand for a = a + b

2) A way of expressing an in-place modification, such
   as a.extend(b)

Case (2) is not really an assignment at all, so arguably
it shouldn't require the LHS to support assignment.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Tuples and immutability

2014-03-08 Thread Gregory Ewing

Ian Kelly wrote:


I already mentioned this earlier in the thread, but a balanced binary
tree might implement += as node insertion and then return a different
object if the balancing causes the root node to change.


That would be a really bad way to design a binary tree
implementation. What if there is another reference to
the tree somewhere? It's still going to be referring to
the old root object, and will have an incoherent view
of the data -- partly old and partly new.

If you're going to have a mutable tree, it needs to be
encapsulated in a stable top-level object.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Balanced trees

2014-03-08 Thread Dan Stromberg
On Sat, Mar 8, 2014 at 1:21 PM, Marko Rauhamaa  wrote:
> If I had to choose between a hash table and AVL (or RB) tree in the
> standard library, it would definitely have to be the latter. It is more
> generally usable, has fewer corner cases and probably has an equal
> performance even in hash tables' sweet spot.

Actually, in the performance comparison I mentioned previously, I
compared Python dict's to a bunch of different balanced trees and one
unbalanced tree. The dictionary was much faster, though granted, it
was the only one in C.

That URL again:
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-01/
-- 
https://mail.python.org/mailman/listinfo/python-list


How is unicode implemented behind the scenes?

2014-03-08 Thread Dan Stromberg
OK, I know that Unicode data is stored in an encoding on disk.

But how is it stored in RAM?

I realize I shouldn't write code that depends on any relevant
implementation details, but knowing some of the more common
implementation options would probably help build an intuition for
what's going on internally.

I've heard that characters are no longer all c bytes wide internally,
so is it sometimes utf-8?

Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread MRAB

On 2014-03-09 02:08, Dan Stromberg wrote:

OK, I know that Unicode data is stored in an encoding on disk.

But how is it stored in RAM?

I realize I shouldn't write code that depends on any relevant
implementation details, but knowing some of the more common
implementation options would probably help build an intuition for
what's going on internally.

I've heard that characters are no longer all c bytes wide internally,
so is it sometimes utf-8?


No.

From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint.

In Python terms:

if all(c <= '\xFF' for c in string):
use 1 byte per codepoint
elif all(c <= '\x' for c in string):
use 2 bytes per codepoint
else:
use 4 bytes per codepoint

--
https://mail.python.org/mailman/listinfo/python-list


Re: Tuples and immutability

2014-03-08 Thread Ian Kelly
On Sat, Mar 8, 2014 at 5:40 PM, Gregory Ewing
 wrote:
> Ian Kelly wrote:
>>
>> class LessThanFilter:
>>
>> def __init__(self, the_list):
>> self._the_list = the_list
>>
>> def __getitem__(self, bound):
>> return [x for x in self._the_list if x < bound]
>>
>>
>> filter = LessThanFilter([10, 20, 30, 40, 50])
>> filter[25] += [15, 17, 23]
>>
>> Should that last line not raise an exception?
>
>
> In this case it will fail to catch what is probably an error,
> but you can't expect the language to find all your bugs for
> you. If you wrote the same bug this way:
>
>filter[25].extend([15, 17, 23])
>
> it wouldn't be caught either.
>
> What's happening is that we're trying to use the syntax
> a += b to mean two different things:
>
> 1) Shorthand for a = a + b
>
> 2) A way of expressing an in-place modification, such
>as a.extend(b)
>
> Case (2) is not really an assignment at all, so arguably
> it shouldn't require the LHS to support assignment.

In my view the second one is wrong.  a += b should be understood as
being equivalent to a = a + b, but with the *possible* and by no means
guaranteed optimization that the operation may be performed in-place.

In fact, if you read the documentation for lists, you may notice that
while they clearly cover the + operator and the extend method, they do
not explicitly document the list class's += operator.  So although I'm
not entirely sure whether it is intentional or not, and I would be
quite surprised if some implementation were actually to differ on this
point, the language does *not* from what I can see guarantee that the
+= operator on lists is equivalent to calling .extend.

That having been said, code that uses += and relies on the operation
to be performed in-place should be considered buggy.  If you need the
operation to be performed in-place, then use in-place methods like
list.extend.  If you need the operation not to be performed in-place,
then use a = a + b.  If you're ambivalent on the in-place issue and
just want to write polymorphic code, that's when you should consider
using +=.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread MRAB

On 2014-03-09 02:40, MRAB wrote:

On 2014-03-09 02:08, Dan Stromberg wrote:

OK, I know that Unicode data is stored in an encoding on disk.

But how is it stored in RAM?

I realize I shouldn't write code that depends on any relevant
implementation details, but knowing some of the more common
implementation options would probably help build an intuition for
what's going on internally.

I've heard that characters are no longer all c bytes wide internally,
so is it sometimes utf-8?


No.

  From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint.

In Python terms:

if all(c <= '\xFF' for c in string):
  use 1 byte per codepoint
elif all(c <= '\x' for c in string):
  use 2 bytes per codepoint
else:
  use 4 bytes per codepoint


Oops! That should, of course, be:

if all(c <= '\xFF' for c in string):
use 1 byte per codepoint
elif all(c <= '\u' for c in string):
use 2 bytes per codepoint
else:
use 4 bytes per codepoint

--
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread Steven D'Aprano
On Sat, 08 Mar 2014 18:08:38 -0800, Dan Stromberg wrote:

> OK, I know that Unicode data is stored in an encoding on disk.
> 
> But how is it stored in RAM?

There are various common ways to store Unicode strings in RAM.

The first, UTF-16, treats every character [aside: technically, a code 
point] as a double byte rather than a single byte. So the letter "A" is 
stored as two bytes 0x0041 (or 0x4100 depending on your platform's byte 
order). Using two bytes allows for a maximum of 65536 different 
characters, *way* too few for the whole Unicode character set, so UTF-16 
has an escaping mechanism where characters beyond ordinal 0x are 
stored as *two* "characters" (again, actually, code points) called 
surrogate pairs.

That means that a sequence of (say) four human-readable characters may, 
depending on those characters, take up anything from eight bytes to 
sixteen bytes, and you cannot tell which until you walk through the 
sequence inspecting each pair of bytes:

while there are still pairs of bytes to inspect:
c = get_next_pair()
if is_low_surrogate(c):
error
elif is_high_surrogate(c):
d = get_next_pair()
if not is_low_surrogate(d):
error
print make_char_from_surrogate_pair(c, d)
else:
print make_char_from_double_byte(c)

So UTF-16 is a *variable width* (could be 1 unit, could be 2 units) 
*double byte* encoding (each unit is two bytes).

Prior to Python 3.3, using UTF-16 was an option when compiling Python's 
source code. Such versions of the interpreter are called "narrow builds".

Another option is UTF-32. UTF-32 uses four bytes for every character. 
That's enough to store every Unicode character, and then some, so there 
are no surrogate pairs needed. But every character takes up four bytes: 
"A" would be stored as 0x0041 or 0x4100. Although UTF-32 is 
faster than UTF-16, because you don't have to walk the string checking 
each individual pair of bytes to see if they are part of a surrogate, 
strings use up to twice as much memory as UTF-16 whether they need it or 
not. (And four times more memory than ASCII strings.)

Prior to Python 3.3, UTF-32 was a build option too. Such versions of the 
interpreter are called "wide builds".

Another option is to use UTF-8 internally. With UTF-8, every character 
uses between 1 and 4 bytes. By design, ASCII characters are stored using 
a single byte, the same byte they would have in old fashioned single-byte 
ASCII: the letter "A" is stored as 0x41. (The algorithm used by UTF-8 can 
continue up to six bytes, but there is no need to since there aren't that 
many Unicode characters.) Because it's variable-width, you have the same 
variable-width issues as UTF-16, only even more so, but because most 
common characters (at least for English speakers) use only 1 or 2 bytes, 
it's much more compact than either.

No version of Python has, to my knowledge, used UTF-8 internally. Some 
other languages, such as Go and Haskell, do, and consequently string 
processing is slow for them.

In Python 3.3, CPython introduced an internal scheme that gives the best 
of all worlds. When a string is created, Python uses a different 
implementation depending on the characters in the string:

* If all the characters are ASCII or Latin-1, then the string uses 
  a single byte per character.

* If all the characters are no greater than ordinal value 0x, 
  then UTF-16 is used. Because the characters are all below 0x, 
  no surrogate pairs are required.

* Only if there is at least one ord() greater than 0x does 
  Python use UTF-32 for that string.

The end result is that creating strings is slightly slower, as Python may 
have to inspect each character at most twice to decide what system to 
use. But memory use is much improved: Python has *many* strings (every 
function, method and class uses many strings in their implementation) and 
the memory savings can be considerable. Depending on your application and 
what you do with those strings, that may even lead to time savings as 
well as memory savings.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Tuples and immutability

2014-03-08 Thread Ian Kelly
On Sat, Mar 8, 2014 at 5:45 PM, Gregory Ewing
 wrote:
> Ian Kelly wrote:
>
>> I already mentioned this earlier in the thread, but a balanced binary
>> tree might implement += as node insertion and then return a different
>> object if the balancing causes the root node to change.
>
>
> That would be a really bad way to design a binary tree
> implementation. What if there is another reference to
> the tree somewhere? It's still going to be referring to
> the old root object, and will have an incoherent view
> of the data -- partly old and partly new.
>
> If you're going to have a mutable tree, it needs to be
> encapsulated in a stable top-level object.

Well, as I parenthetically noted the first time I brought it up,
"whether this is good design is tangential; it's a possible design".
The language shouldn't be written such that only those designs deemed
"good" by some external committee can be implemented.  What you
dismiss as "really bad" may be exactly what somebody else needs, and
maybe they intend that there won't be other references.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread Chris Angelico
On Sun, Mar 9, 2014 at 1:08 PM, Dan Stromberg  wrote:
> OK, I know that Unicode data is stored in an encoding on disk.
>
> But how is it stored in RAM?
>
> I realize I shouldn't write code that depends on any relevant
> implementation details, but knowing some of the more common
> implementation options would probably help build an intuition for
> what's going on internally.
>
> I've heard that characters are no longer all c bytes wide internally,
> so is it sometimes utf-8?
>

As of Python 3.3, it's as MRAB described. If you like, Python chooses
between one of three (or four) encodings, based on what can handle the
string:

1) ASCII (there are some minor differences with 7-bit strings, eg it
knows the conversion to UTF-8 is the identity function)
2) Latin-1
3) UCS-2
4) UCS-4

This means that finding the Nth codepoint in a string is simply a
matter of shifting N by either 0, 0, 1, or 2, and picking the right
number of bytes from that position. You can read the gory details in
PEP 393:

http://www.python.org/dev/peps/pep-0393/

but the important bit here is the "kind", which is 01 for Latin-1, 10
for UCS-2, 11 for UCS-4. (The "ascii-only" flag is stored elsewhere.)
There's a functionally-identical field in Pike's strings, called
size_shift - 0 for ASCII or Latin-1, 1 for UCS-2, 2 for UCS-4.
Whichever it is, it's really efficient - and as an added bonus, all
those ASCII-only strings that scripts are full of (you know, words
like "print" and "len" and "int") are stored compactly, so it's much
tighter than the 3.2 builds, even narrow ones. It's pretty awesome!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread Roy Smith
In article <531bd709$0$29985$c3e8da3$54964...@news.astraweb.com>,
 Steven D'Aprano  wrote:

> There are various common ways to store Unicode strings in RAM.
> 
> The first, UTF-16.
> [...]
> Another option is UTF-32.
> [...]
> Another option is to use UTF-8 internally.
> [...]
> In Python 3.3, CPython introduced an internal scheme that gives the best 
> of all worlds. When a string is created, Python uses a different 
> implementation depending on the characters in the string:

This was an excellent post, but I would take exception to the "best of 
all worlds" statement.  I would put it a little less absolutely and say 
something like, "a good compromise for many common use cases".  I would 
even go with, "... for most common use cases".  But, there are 
situations where it loses.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread Rustom Mody
On Sunday, March 9, 2014 8:20:49 AM UTC+5:30, Steven D'Aprano wrote:
> No version of Python has, to my knowledge, used UTF-8 internally. Some 
> other languages, such as Go and Haskell, do, and consequently string 
> processing is slow for them.

Haskell: Its more like: "Heres the menu, take your pick"
http://blog.ezyang.com/2010/08/strings-in-haskell/


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread Chris Angelico
On Sun, Mar 9, 2014 at 2:01 PM, Roy Smith  wrote:
> In article <531bd709$0$29985$c3e8da3$54964...@news.astraweb.com>,
>  Steven D'Aprano  wrote:
>
>> There are various common ways to store Unicode strings in RAM.
>>
>> The first, UTF-16.
>> [...]
>> Another option is UTF-32.
>> [...]
>> Another option is to use UTF-8 internally.
>> [...]
>> In Python 3.3, CPython introduced an internal scheme that gives the best
>> of all worlds. When a string is created, Python uses a different
>> implementation depending on the characters in the string:
>
> This was an excellent post, but I would take exception to the "best of
> all worlds" statement.  I would put it a little less absolutely and say
> something like, "a good compromise for many common use cases".  I would
> even go with, "... for most common use cases".  But, there are
> situations where it loses.

It's universally good for string indexing/slicing on binary CPUs
(there's no point using a 24-bit or 21-bit representation on an
Intel-compatible CPU, even though they'd be just as good as UTC-32).
It's not a compromise, so much as a recognition that Python offers
convenient operators for indexing and slicing. If, on the other hand,
Python fundamentally worked with U+0020 separated words (REXX has a
whole set of word-based functions), then it might be better to
represent strings as lists of words internally. Or if the string
operations are primarily based on the transitions between Unicode
types of "space" and "non-space", which would be more likely these
days, then something of that sort would still work. Anyway, it's based
on the operations the language makes convenient, and which will
therefore be common and expected to be fast: those are the operations
to optimize for.

If the only thing you ever do with a string is iterate sequentially
over its characters, UTF-8 would be the perfect representation. It's
compact, you can concatenate strings without re-encoding, and it
iterates forwards easily. But it sucks for "give me character #142857
from this string", so it's a bad choice for Python.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread Ned Batchelder

On 3/8/14 9:08 PM, Dan Stromberg wrote:

OK, I know that Unicode data is stored in an encoding on disk.

But how is it stored in RAM?

I realize I shouldn't write code that depends on any relevant
implementation details, but knowing some of the more common
implementation options would probably help build an intuition for
what's going on internally.

I've heard that characters are no longer all c bytes wide internally,
so is it sometimes utf-8?

Thanks.



In abstract terms, a Unicode string is a sequence of integers (code 
points).  There are lots of ways to store a sequence of integers.


In Python 2.x, it's either a vector of 16-bit ints, or 32-bit ints. 
These are the Unicode representations known as UTF-16 and UTF-32, 
respectively, and which you have depends on whether you have a "narrow" 
or "wide" build of Python.  You can tell the difference by examining 
sys.maxunicode, which is 65535 (narrow) or 1114111 (wide).


In Python 3.3, the representation was changed from narrow/wide to the 
so-called Flexible String Representation which others here have 
described.  It uses either 1-, 2-, or 4-bytes per code point, depending 
on the set of code points in the string.  It's specified in PEP 393: 
http://legacy.python.org/dev/peps/pep-0393/


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: gdb unable to read python frame information

2014-03-08 Thread Wesley
Anybody has suggestions?

This really makes me crazy...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How is unicode implemented behind the scenes?

2014-03-08 Thread Dan Sommers
On Sun, 09 Mar 2014 03:50:49 +, Steven D'Aprano wrote:

> ... UTF-16 ... the letter "A" is stored as two bytes 0x0041 (or 0x4100
> depending on your platform's byte order) ...

At the risk of being pedantic, the two bytes are 0x00 and 0x41, and the
order in which they appear in memory depends on your platform and even
your particular view of that platform (do stacks grow up or down?  are
addresses of higher memory larger or smaller?).

> ... UTF-32 ... "A" would be stored as 0x0041 or 0x4100 ...

Or even some other sequence if you're on a PDP-11.

See .

But you knew that.  ;-)

Pedantic'ly yours,
Dan
-- 
https://mail.python.org/mailman/listinfo/python-list


process.popen with Japanese args => UTF8 JAVA

2014-03-08 Thread Jun Tanaka
Hello,

I have tried to process.popen to run java program with Japanese language.
test.java is compiled with utf8
'日本語' below means Japanese in Japanese.
but it does not work. Anyone who knows this matter well. Please help.

Jun

python code>
sentence = '日本語'
filename = 'japanese'
java_file = 'test'
cmd = "java {0} {1} {2}".format(java_file, sentence, filename)
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
python code end>>
-- 
https://mail.python.org/mailman/listinfo/python-list