Re: Brython - Python in the browser

2012-12-20 Thread Pierre Quentel
Le jeudi 20 décembre 2012 01:07:15 UTC+1, Terry Reedy a écrit :
> On 12/19/2012 1:19 PM, Pierre Quentel wrote:
> 
> 
> 
> > The objective of Brython is to replace Javascript by Python as the
> 
> > scripting language for web browsers, making it usable on all
> 
> > terminals including smartphones, tablets, connected TVs, etc. Please
> 
> > forgive the lack of ambition ;-)
> 
> 
> 
> This sounds similar to pyjs, but the latter has two big problems: a) 
> 
> personality conflicts splits among the developers; b) last I knew, it 
> 
> was stuck on Python 2.
> 
It is indeed different from pyjs : both translate Python into Javascript, but 
with Brython the translation is done on the fly by the browser, with pyjs is is 
done once by a Python script
> 
> 
> I think your home page/doc/announcement should specify Python 3 at the 
> 
> top, so it is not a mystery until one reads down to
> 
> "Brython supports most keywords and functions of Python 3 : "
> 
Done on the home page
> 
> 
> "lists are created with [] or list(), tuples with () or tuple(), 
> 
> dictionaries with {} or dict() and sets with set()"
> 
> 
> 
> non-empty sets are also created with {} and you should support that.
> 
Ok, I put this point in the issue tracker
> 
> 
> > The best introduction is to visit the Brython site
> 
> > (http://www.brython.info).
> 
> 
> 
> That says that my browser, Firefox 17, does not support HTML5. Golly 
> 
> gee. I don't think any browser support5 all of that moving target, and 
> 
> Gecko apparently supports about as large a subset as most.
> 
> https://en.wikipedia.org/wiki/Comparison_of_layout_engines_%28HTML5%29
> 
> It is possible the FF still does not support the particular feature 
> 
> needed for the clock, but then the page should say just that. Has the 
> 
> latest FF (17) actually been tested?
> 
I changed the error message adding "or Javascript is turned off"
> 
> 
> > To create an element, for instance an HTML anchor :
> 
> > doc <= A('Python',href="http://www.python.org";)
> 
> 
> 
> To me, that is a awful choice and I urge you to change it.
> 
> 
> 
> '<=' is not just an operator, it is a comparison operator. It normally 
> 
> return False or True. Numpy array comparison returns arrays of booleans, 
> 
> so the meaning is extended, not completely changed. People will often be 
> 
> using it with its normal mean in conditionals elsewhere, so this usage 
> 
> creates strong cognitive dissonance. Also, using an expression as a 
> 
> statement is allowed, but except in the interactive interpreter, it only 
> 
> makes sense with an expression that obviously has side-effects or could 
> 
> have side-effects (like the expression 'mylist.sort()'. It just looks 
> 
> wrong to an experienced Python programmer like me.
> 
> 
> 
> It also is unnecessary. Use '+=' or '|='. The former means just what you 
> 
> want the statement to do and the latter is at least somewhat related 
> 
> (bit or-addition) and is rarely used and is very unlikely to be used in 
> 
> code intended for a browser.
> 
> 
I'm afraid I am going to disagree. The document is a tree structure, and today 
Python doesn't have a syntax for easily manipulating trees. To add a child to a 
node, using an operator instead of a function call saves a lot of typing ; <= 
looks like a left arrow, which is a visual indication of the meaning "receive 
as child". |= doesn't have this arrow shape

+= is supported by Brython, but it means something different. <= means "add 
child" ; the addition operator + means "add brother"

For instance,

d = UL(LI('test1'))
d += UL(LI('test2'))
doc <= d

will show two unordered lists at the same level, while

d = UL(LI('test1'))
d <= UL(LI('test2'))
doc <= d

will nest the second list inside the first one

In fact, even in CPython there could be a built-in tree class that could be 
managed by a syntax such as this one
> 
> 
> 
> > It still lacks important features of Python, mostly list
> 
> > comprehensions and classes ;
> 
> 
> 
> Since Python 3 has 4 types of comprehensions, while Python 2 only has 
> 
> list comprehensions, I took this to mean that Brython was Python 2.
> 
> 
> 
> And yes, I am all in favor of being able to use a subset of Py3 instead 
> 
> of javascript. A full Python interpreter in a browser is too dangerous. 
> 
> (Actually, I think javascript is too, but that is a different issue.) 
> 
> Python translated to javascript cannot be worse than javascript. I 
> 
> presume the same would be true if the javascript step were omitted and 
> 
> Python were directly compiled to the virtual machines defined by current 
> 
> javascript engines.
> 
> 
> 
> -- 
> 
> Terry Jan Reedy

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Brython - Python in the browser

2012-12-20 Thread Pierre Quentel
Le jeudi 20 décembre 2012 01:54:44 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 5:07 PM, Terry Reedy  wrote:
> 
> > That says that my browser, Firefox 17, does not support HTML5. Golly gee. I
> 
> > don't think any browser support5 all of that moving target, and Gecko
> 
> > apparently supports about as large a subset as most.
> 
> > https://en.wikipedia.org/wiki/Comparison_of_layout_engines_%28HTML5%29
> 
> > It is possible the FF still does not support the particular feature needed
> 
> > for the clock, but then the page should say just that. Has the latest FF
> 
> > (17) actually been tested?
> 
> 
> 
> It works for me using FF 17.0.1.
> 
> 
> 
> >> To create an element, for instance an HTML anchor :
> 
> >> doc <= A('Python',href="http://www.python.org";)
> 
> >
> 
> >
> 
> > To me, that is a awful choice and I urge you to change it.
> 
> 
> 
> +1.  The DOM already has a well-established API.  The following may
> 
> require more typing:
> 
> 
> 
> link = document.createElement('a')
> 
> link.setAttribute("href", "http://www.python.org/";)
> 
> link.appendChild(document.createTextNode('Python'))
> 
> document.body.appendChild(link)
> 
> 
> 
> But it is much clearer in intent.  Since these methods map directly to
> 
> DOM methods, I know exactly what I expect them to do, and I can look
> 
> them up in the browser documentation if I have any doubts.  With the
> 
> one-liner above, I don't know exactly what that maps to in actual DOM
> 
> calls, and so I'm a lot less clear on what exactly it is supposed to
> 
> do.  I'm not even entirely certain whether it's actually equivalent to
> 
> my code above.
> 
> 
> 
> I suggest that Brython should have a "low-level" DOM API that matches
> 
> up to the actual DOM in as close to a 1:1 correspondence as possible.
> 
> Then if you want to have a higher-level API that allows whiz-bang
> 
> one-liners like the above, build it as an abstraction on top of the
> 
> low-level API and include it as an optional library.  This has the
> 
> added benefit that if the user runs into an obscure bug where the
> 
> fancy API breaks on some particular operation on some specific
> 
> browser, they will still have the option of falling back to the
> 
> low-level API to work around it.  It would also make the conversion
> 
> barrier much lower for web programmers looking to switch to Brython,
> 
> if they can continue to use the constructs that they're already
> 
> familiar with but just write them in Python instead of JavaScript.

We don't have the same point of view. Mine is to offer an alternative to 
Javascript, with the simplicity and elegance of the Python syntax, for a 
programer who wants to develop a web application and doesn't know Javascript. 
Ultimately this means that the whole DOM API would be described without any 
mention of Javascript, only with the Python API

With this idea in mind, asking Brython to have a Javascript-like low-level API 
is like asking CPython to support iteration with a low-level construct like 
"for i=0;i<10;i++" along with "for i in range(10)". The Python engine is stable 
enough that we don't have to inspect the bytecode for debugging ; similarly, 
when Brython is mature enough, you won't have to look at the generated 
Javascript code (which you can do though, eg in the console)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Brython - Python in the browser

2012-12-20 Thread Chris Angelico
On Thu, Dec 20, 2012 at 8:37 PM, Pierre Quentel
 wrote:
> I'm afraid I am going to disagree. The document is a tree structure, and 
> today Python doesn't have a syntax for easily manipulating trees. To add a 
> child to a node, using an operator instead of a function call saves a lot of 
> typing ; <= looks like a left arrow, which is a visual indication of the 
> meaning "receive as child". |= doesn't have this arrow shape

This is the reasoning that gave us the C++ stdio system, where:

cout << "Hello, world!\n";

is the way to make console output. Quite frankly, I don't like it;
when I write C++ code, I use printf same as in C. I'd much rather work
with methods than with operators that try to look like the flowing of
data, but actually have a quite different meaning.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Brython - Python in the browser

2012-12-20 Thread Jamie Paul Griffin
* Ian Kelly  [2012-12-19 17:54:44 -0700]:

> On Wed, Dec 19, 2012 at 5:07 PM, Terry Reedy  wrote:
> > That says that my browser, Firefox 17, does not support HTML5. Golly gee. I
> > don't think any browser support5 all of that moving target, and Gecko
> > apparently supports about as large a subset as most.
> > https://en.wikipedia.org/wiki/Comparison_of_layout_engines_%28HTML5%29
> > It is possible the FF still does not support the particular feature needed
> > for the clock, but then the page should say just that. Has the latest FF
> > (17) actually been tested?
> 
> It works for me using FF 17.0.1.

I'm using FF 13 on OpenBSD and it works for me too. 
-- 
http://mail.python.org/mailman/listinfo/python-list


how to detect the encoding used for a specific text data ?

2012-12-20 Thread iMath
 how to detect the encoding used for a specific text data ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the encoding used for a specific text data ?

2012-12-20 Thread Jussi Piitulainen
iMath writes:

>  how to detect the encoding used for a specific text data ?

The practical thing to do is to try an encoding and see whether you
find the expected frequent letters of the relevant languages in the
decoded text, or the most frequent words. This is likely to help you
decide between some of the most common encodings. Some decoding
attempts may even raise an exception, which should be a clue.

Strictly speaking, it cannot be done with complete certainty. There
are lots of Finnish texts that are identical whether you think they
are in Latin-1 or Latin-9. A further text from the same source might
still reveal the difference, so the distinction matters.

Short Finnish texts might also be identical whether you think they are
in Latin-1 or UTF-8, but the situation is different: a couple of
frequent letters turn into nonsense in the wrong encoding. It's easy
to tell at a glance.

Sometimes texts declare their encoding. That should be a clue, but in
practice the declaration may be false. Sometimes there is a stray
character that violates the declared or assumed encoding, or a part of
the text is in one encoding and another part in another. Bad source.
You decide how important it is to deal with the mess. (This only
happens in the real world.)

Good luck.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Virtualenv loses context

2012-12-20 Thread rhythmicdevil
Brought my laptop out of hibernation to do some work this morning. I attempted 
to run one of my ETLs and got the following error. I made no changes since it 
was running yesterday.



[swright@localhost app]$ python etl_botnet_meta.py --mode dev -f
Traceback (most recent call last):
  File "etl_botnet_meta.py", line 9, in 
from make_zip import EtlMakeZip
  File "/home/swright/workspace/botnet_etl/app/make_zip.py", line 1, in 
import M2Crypto, os, time, datetime, json, hashlib, base64, zipfile
ImportError: No module named M2Crypto
[swright@localhost app]$ python -m site
sys.path = [
'/home/swright/workspace/botnet_etl/app',
'/usr/lib/python2.6/site-packages/pymongo-2.3-py2.6-linux-x86_64.egg',
'/usr/lib/python2.6/site-packages/pip-1.2.1-py2.6.egg',
'/usr/lib64/python26.zip',
'/usr/lib64/python2.6',
'/usr/lib64/python2.6/plat-linux2',
'/usr/lib64/python2.6/lib-tk',
'/usr/lib64/python2.6/lib-old',
'/usr/lib64/python2.6/lib-dynload',
'/usr/lib64/python2.6/site-packages',
'/usr/lib64/python2.6/site-packages/gst-0.10',
'/usr/lib64/python2.6/site-packages/gtk-2.0',
'/usr/lib64/python2.6/site-packages/webkit-1.0',
'/usr/lib/python2.6/site-packages',
'/usr/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg-info',
]
USER_BASE: '/home/swright/.local' (exists)
USER_SITE: '/home/swright/.local/lib/python2.6/site-packages' (doesn't exist)
ENABLE_USER_SITE: True
[swright@localhost app]$
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the encoding used for a specific text data ?

2012-12-20 Thread Stefan H. Holek
On 20.12.2012, at 12:57, iMath wrote:

> how to detect the encoding used for a specific text data ?

http://pypi.python.org/pypi?%3Aaction=search&term=detect+encoding

-- 
Stefan H. Holek
ste...@epy.co.at

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the encoding used for a specific text data ?

2012-12-20 Thread iMath
which package to use ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the encoding used for a specific text data ?

2012-12-20 Thread Jussi Piitulainen
iMath writes:

> which package to use ?

Read the text in as a "bytes object" (bytes), then it has a .decode
method that you can experiment with. Strings (str) are Unicode and
have an .encode method. These methods allow you to specify a desired
encoding and and what to do when there are errors.

help(bytes.decode)
help(str.encode)
help(open)


In Python 2.7 and before, strings seem to do double duty and have both
the .encode and .decode methods, so Python version matters here.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the encoding used for a specific text data ?

2012-12-20 Thread Christian Heimes
Am 20.12.2012 12:57, schrieb iMath:
>  how to detect the encoding used for a specific text data ?

You can't.

It's not possible unless the file format can specify the encoding
somehow, e.g. like XML's header .
Sometimes you can try and make an educated guess. But it's just a guess
and it may give you wrong results.

Christian
-- 
http://mail.python.org/mailman/listinfo/python-list


Python3 + sqlite3: Where's the bug?

2012-12-20 Thread Johannes Bauer
Hi group,

I've run into a problem using Python3.2 and sqlite3 db access that I
can't quite wrap my head around. I'm pretty sure there's a bug in my
program, but I can't see where. Help is greatly appreciated. I've
created a minimal example to demonstrate the phaenomenon (attached at
bottom).

First, the program creates a db and inits two tables "foo" and "bar",
which both only have a "int" value. Then "foo" is populated with unique
ints.

A fetchmanychks function is supposed to have the same behavior as
fetchall(), but instead perform the operation in many subsequent
fetchmany() chunks.

When I traverse the foo table using cursor cur1 and insert into the bar
table using cursor cur2, I receive at some point:

Traceback (most recent call last):
  File "y.py", line 25, in 
cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
sqlite3.IntegrityError: PRIMARY KEY must be unique

Which means that the fetchmany() read returns the *same* value again!
How is this possible? If I either

- Remove the "db.commit()"
- Replace fetchmanychks(cur1) by cur1.fetchall()

it works without error -- but I want neither (I want regular commits
because sqlite3 becomes horribly slow when the journal becomes large and
the tables nothing to do with each other anyways and atomicity is not
needed in my case).

Do I grossly misunderstand fetchmany() or where's my bug here?

Thanks in advance,
Joe




#!/usr/bin/python3.2
import sqlite3

db = sqlite3.connect("foobar.sqlite")
cur1 = db.cursor()
cur2 = db.cursor()

def fetchmanychks(cursor):
cursor.execute("SELECT id FROM foo;")
while True:
result = cursor.fetchmany()
if len(result) == 0:
break
for x in result:
yield x

cur1.execute("CREATE TABLE foo (id integer PRIMARY KEY);")
cur1.execute("CREATE TABLE bar (id integer PRIMARY KEY);")
for i in range(0, 20, 5):
cur1.execute("INSERT INTO foo VALUES (?);", (i,))
db.commit()

ctr = 0
for (v, ) in fetchmanychks(cur1):
cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
ctr += 1
if ctr == 100:
db.commit()
ctr = 0



-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Johannes Bauer
On 19.12.2012 16:40, Chris Angelico wrote:

> You may not be familiar with jmf. He's one of our resident trolls, and
> he has a bee in his bonnet about PEP 393 strings, on the basis that
> they take up more space in memory than a narrow build of Python 3.2
> would, for a string with lots of BMP characters and one non-BMP.

I was not :-( Thanks for the heads up and the good summary on what the
issue was about.

Best regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python3 + sqlite3: Where's the bug?

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer  wrote:
> def fetchmanychks(cursor):
> cursor.execute("SELECT id FROM foo;")
> while True:
> result = cursor.fetchmany()
> if len(result) == 0:
> break
> for x in result:
> yield x

I'm not familiar with sqlite, but from working with other databases,
I'm wondering if possibly your commits are breaking the fetchmany.

Would it spoil your performance improvements to do all the fetchmany
calls before yielding anything? Alternatively, can you separate the
two by opening a separate database connection for the foo-reading (so
it isn't affected by the commit)?

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python3 + sqlite3: Where's the bug?

2012-12-20 Thread Johannes Bauer
On 20.12.2012 16:05, Chris Angelico wrote:
> On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer  wrote:
>> def fetchmanychks(cursor):
>> cursor.execute("SELECT id FROM foo;")
>> while True:
>> result = cursor.fetchmany()
>> if len(result) == 0:
>> break
>> for x in result:
>> yield x
> 
> I'm not familiar with sqlite, but from working with other databases,
> I'm wondering if possibly your commits are breaking the fetchmany.

Hmm, but this:

def fetchmanychks(cursor):
cursor.execute("SELECT id FROM foo;")
while True:
result = cursor.fetchone()
if result is not None:
yield result
else:
break

Works nicely -- only the fetchmany() makes the example break.

> Would it spoil your performance improvements to do all the fetchmany
> calls before yielding anything?

Well this would effectively then be a fetchall() call -- this is
problematic since the source data is LARGE (spekaing of gigabytes of
data here).

> Alternatively, can you separate the
> two by opening a separate database connection for the foo-reading (so
> it isn't affected by the commit)?

At that point in the code I don't actually have a filename anymore,
merely the connection. But shouldn't the cursor actually be the
"correct" solution? I.e. in theory, should the example work at all or am
I thinking wrong?

Because if I'm approaching this from the wrong angle, I'll have no
choice but to change all that code to open separate connections to the
same file (something that currently are no provisions for).

Best regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python3 + sqlite3: Where's the bug?

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 2:20 AM, Johannes Bauer  wrote:
> Hmm, but this:
>
> result = cursor.fetchone()
> yield result
>
> Works nicely -- only the fetchmany() makes the example break.

Okay, now it's sounding specific to sqlite. I'll bow out. :)

>
>> Would it spoil your performance improvements to do all the fetchmany
>> calls before yielding anything?
>
> Well this would effectively then be a fetchall() call -- this is
> problematic since the source data is LARGE (spekaing of gigabytes of
> data here).

That would be a "yes", then. Scratch that!

>> Alternatively, can you separate the
>> two by opening a separate database connection for the foo-reading (so
>> it isn't affected by the commit)?
>
> At that point in the code I don't actually have a filename anymore,
> merely the connection. But shouldn't the cursor actually be the
> "correct" solution? I.e. in theory, should the example work at all or am
> I thinking wrong?

You say "db.commit()", not "cur2.commit()", so I don't see that a
cursor would un-break what part-way commits is breaking.

> Because if I'm approaching this from the wrong angle, I'll have no
> choice but to change all that code to open separate connections to the
> same file (something that currently are no provisions for).

Is that an sqlite limitation, or just one of your code?

I poked around at the sqlite3 docs, but didn't find any obvious
"clone" option on the connection, nor a way to retrieve the file name.
That would have been fairly convenient. Oh well.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python3 + sqlite3: Where's the bug?

2012-12-20 Thread Hans Mulder
On 20/12/12 16:20:13, Johannes Bauer wrote:
> On 20.12.2012 16:05, Chris Angelico wrote:
>> On Fri, Dec 21, 2012 at 1:52 AM, Johannes Bauer  wrote:
>>> def fetchmanychks(cursor):
>>> cursor.execute("SELECT id FROM foo;")
>>> while True:
>>> result = cursor.fetchmany()
>>> if len(result) == 0:
>>> break
>>> for x in result:
>>> yield x
>>
>> I'm not familiar with sqlite, but from working with other databases,
>> I'm wondering if possibly your commits are breaking the fetchmany.

Yes, that's what it looks like.

I think that should be considered a bug in fetchmany.

> Hmm, but this:
> 
> def fetchmanychks(cursor):
>   cursor.execute("SELECT id FROM foo;")
>   while True:
>   result = cursor.fetchone()
>   if result is not None:
>   yield result
>   else:
>   break
> 
> Works nicely -- only the fetchmany() makes the example break.

On my system, fetchmany() defaults to returning only one row.

The documentation says that the default should be the optimal
number of rows per chunk for the underlying database engine.
If the optimum is indeed fetchone one row at a time, then
maybe you could consider using fetchone() as a work-around.

>> Would it spoil your performance improvements to do all the
>> fetchmany calls before yielding anything?
> 
> Well this would effectively then be a fetchall() call -- this is
> problematic since the source data is LARGE (speaking of gigabytes
> of data here).
> 
>> Alternatively, can you separate the
>> two by opening a separate database connection for the foo-reading
>> (so it isn't affected by the commit)?
> 
> At that point in the code I don't actually have a filename anymore,
> merely the connection. But shouldn't the cursor actually be the
> "correct" solution? I.e. in theory, should the example work at all
> or am I thinking wrong?

I think you're right and that fetchmany is broken.

> Because if I'm approaching this from the wrong angle, I'll have no
> choice but to change all that code to open separate connections to
> the same file (something that currently are no provisions for).


Hope this helps,

-- HansM
-- 
http://mail.python.org/mailman/listinfo/python-list


Data Driven Process

2012-12-20 Thread balparmak

I need to come up with a Python code to automate a map creation process but I 
am a little bit of lack of python knowledge. Any help would be much 
appreciated. 
My problem is that I have a 20 mxd file which need an update process in timely 
manner(this is just one project) and pdf creation for each mxd then appended 
pdfs. At the moment, I have a tool which creates map index shape file for at 
the layout view for each mxd. 

I want to :
1 Loop through the index shape file's attribute (first one in the table of 
contents)
2 Select or zoom into first row then
 -get mxd name from attribute table(its available in attribute table)
 -specify the destinatin folder to save the mxd(i think that I can get 
that in attribute table as well) or specify it in the code.
 -save mxd to specified location
 -specify the destinatin folder to save the pdf file
 -save pdf to specified location
3 Move to next row and do the same process in step 2
4 When reached the last row append all pdf files created into same locaition as 
pdfs
5 Close

Please help me! 

Any Help would be much appreciated.

Thank you 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python3 + sqlite3: Where's the bug?

2012-12-20 Thread inq1ltd
On Thursday, December 20, 2012 03:52:39 PM Johannes Bauer wrote:
> Hi group,
> 
> I've run into a problem using Python3.2 and sqlite3 db access that I
> can't quite wrap my head around. I'm pretty sure there's a bug in my
> program, but I can't see where. Help is greatly appreciated. I've
> created a minimal example to demonstrate the phaenomenon (attached at
> bottom).
> 
> First, the program creates a db and inits two tables "foo" and "bar",
> which both only have a "int" value. Then "foo" is populated with unique
> ints.
> 
> A fetchmanychks function is supposed to have the same behavior as
> fetchall(), but instead perform the operation in many subsequent
> fetchmany() chunks.
> 
> When I traverse the foo table using cursor cur1 and insert into the bar
> table using cursor cur2, I receive at some point:
> 
> Traceback (most recent call last):
>   File "y.py", line 25, in 
> cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
> sqlite3.IntegrityError: PRIMARY KEY must be unique
> 
> Which means that the fetchmany() read returns the *same* value again!
> How is this possible? If I either
> 
> - Remove the "db.commit()"
> - Replace fetchmanychks(cur1) by cur1.fetchall()
> 
> it works without error -- but I want neither (I want regular commits
> because sqlite3 becomes horribly slow when the journal becomes large and
> the tables nothing to do with each other anyways and atomicity is not
> needed in my case).
> 
> Do I grossly misunderstand fetchmany() or where's my bug here?
> 
> Thanks in advance,
> Joe
> 
> 
Joe,

Both of the following addresses will get you to the same place.

You will get an answer from the sqlite help site.

sqlite-us...@sqlite.org

General Discussion of SQLite Database 


jd
inqvista.com


> 
> 
> #!/usr/bin/python3.2
> import sqlite3
> 
> db = sqlite3.connect("foobar.sqlite")
> cur1 = db.cursor()
> cur2 = db.cursor()
> 
> def fetchmanychks(cursor):
>   cursor.execute("SELECT id FROM foo;")
>   while True:
>   result = cursor.fetchmany()
>   if len(result) == 0:
>   break
>   for x in result:
>   yield x
> 
> cur1.execute("CREATE TABLE foo (id integer PRIMARY KEY);")
> cur1.execute("CREATE TABLE bar (id integer PRIMARY KEY);")
> for i in range(0, 20, 5):
>   cur1.execute("INSERT INTO foo VALUES (?);", (i,))
> db.commit()
> 
> ctr = 0
> for (v, ) in fetchmanychks(cur1):
>   cur2.execute("INSERT INTO bar (id) VALUES (?);", (v,))
>   ctr += 1
>   if ctr == 100:
>   db.commit()
>   ctr = 0
> 
> >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> > 
> > Zumindest nicht öffentlich!
> 
> Ah, der neueste und bis heute genialste Streich unsere großen
> Kosmologen: Die Geheim-Vorhersage.
>  - Karl Kaos über Rüdiger Thomas in dsa -- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread Grant Rettke
If you set up 2 sample MDX files that are dead simple along with some
code to demonstrate what you are attempting, and some unit tests, then
you will be helping people to help you.

Most people probably do not have experience or familiarity with what
you are attemping.

On Thu, Dec 20, 2012 at 11:08 AM,   wrote:
> mxd file



-- 
Grant Rettke | ACM, AMA, COG, IEEE
gret...@acm.org | http://www.wisdomandwonder.com/
Wisdom begins in wonder.
((λ (x) (x x)) (λ (x) (x x)))
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to detect the encoding used for a specific text data ?

2012-12-20 Thread rurpy
On Thursday, December 20, 2012 4:57:19 AM UTC-7, iMath wrote:
> how to detect the encoding used for a specific text data ?

The chardet package will probably do what you want:
  http://pypi.python.org/pypi/chardet

  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread balparmak
Thank you for your reply Grant, 

I am trying to attach mxd's but no chance. As you said that i dont have much 
experience in python. I used to work with VBA but its not an option anymore 
with new ArcGIS 10.

How can I add mxd's here? 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread Grant Rettke
Maybe try posting them on your blog.

On Thu, Dec 20, 2012 at 12:08 PM,  wrote:

> Thank you for your reply Grant,
>
> I am trying to attach mxd's but no chance. As you said that i dont have
> much experience in python. I used to work with VBA but its not an option
> anymore with new ArcGIS 10.
>
> How can I add mxd's here?
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Grant Rettke | ACM, AMA, COG, IEEE
gret...@acm.org | http://www.wisdomandwonder.com/
Wisdom begins in wonder.
((λ (x) (x x)) (λ (x) (x x)))
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Virtualenv loses context

2012-12-20 Thread Ian Kelly
On Thu, Dec 20, 2012 at 5:50 AM,   wrote:
> Brought my laptop out of hibernation to do some work this morning. I 
> attempted to run one of my ETLs and got the following error. I made no 
> changes since it was running yesterday.
>
>
>
> [swright@localhost app]$ python etl_botnet_meta.py --mode dev -f

Are you sure you activated the virtual env?  I see no indication of it
in your prompt.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread wxjmfauth
Fact.
In order to work comfortably and with efficiency with a "scheme for
the coding of the characters", can be unicode or any coding scheme,
one has to take into account two things: 1) work with a unique set
of characters and 2) work with a contiguous block of code points.

At this point, it should be noticed I did not even wrote about
the real coding, only about characters and code points.

Now, let's take a look at what happens when one breaks the rules
above and, precisely, if one attempts to work with multiple
characters sets or if one divides - artificially - the whole range
of the unicode code points in chunks.

The first (and it should be quite obvious) consequence is that
you create bloated, unnecessary and useless code. I simplify
the flexible string representation (FSR) and will use an "ascii" / 
"non-ascii" model/terminology.

If you are an "ascii" user, a FSR model has no sense. An
"ascii" user will use, per definition, only "ascii characters".

If you are a "non-ascii" user, the FSR model is also a non
sense, because you are per definition a n"on-ascii" user of
"non-ascii" character. Any optimisation for "ascii" user just
become irrelevant. 

In one sense, to escape from this, you have to be at the same time
a non "ascii" user and a non "non-ascii" user. Impossible.
In both cases, a FSR model is useless and in both cases you are
forced to use bloated and unnecessary code.

The rule is to treat every character of a unique set of characters
of a coding scheme in, how to say, an "equal way". The problematic
can be seen the other way, every coding scheme has been built
to work with a unique set of characters, otherwhile it is not
properly working!

The second negative aspect of this splitting, is just the 
splitting itsself. One can optimize every subset of characters,
one will always be impacted by the "switch" between the subsets.
One more reason to work with a unique set characters or this is
the reason why every coding scheme handle a unique set of
characters.

Up to now, I spoke only about the characters and the sets of
characters, not about the coding of the characters.
There is a point which is quite hard to understand and also hard
to explain. It becomes obvious with some experience.

When one works with a coding scheme, one always has to think
characters / code points. If one takes the perspective of encoded
code points, it simply does not work or may not work very well
(memory/speed). The whole problematic is that it is impossible to
work with characters, one is forced to manipulate encoded code
points as characters. Unicode is built and though to work with
code points, not with encoded code points. The serialization,
transformation code point -> encoded code point, is "only" a
technical and secondary process. Surprise, all the unicode
coding schemes (utf-8, 16, 32) are working with the same
set of characters. They differ in the serialization, but
they are all working with a unique set of characters.
The utf-16 / ucs-2 is an interesting case. Their encoding mechanisms
are quasi the same, the difference lies in the sets of characters.

There is an another way to empiricaly understand the problem.
The historical evolution of the coding of the characters. Practically,
all the coding schemes have been created to handle different sets of
characters or coding schemes have been created, because it is the
only way to work properly. If it would have been possible to work
with multiple coding schemes, I'm pretty sure a solution would
have emerged. It never happened and it would not have been necessary
to create iso10646 or unicode. Neither it would have been necessary
to create all these codings iso-8859-***, cp***, mac** which are
all *based on set of characters*.

plan9 had attempted to work with multiple characters set, it did not
work very well, main issue: the switch between the codings.

A solution à la FSR can not work or not work in a optimized way.
It is not a coding scheme, it is a composite of coding schemes
handling several characters sets. Hard to imagine something worse.

Contrary to what has been said, the bad cases I presented here are
not corner cases. There is practically and systematically a regression
in Py33 compared to Py32.
That's very easy to test. I did all my tests at the light of what
I explained above. I was not a suprise for me to this expectidly
bad behaviour.

Python is not my tool. If I'm allowing me to give an advice, a
scientifical approach.
I suggest the core devs to firstly spend their time to proof
a FSR model can beat the existing models (purely on the C level).
Then, if they succeeded, to later implement this.

My feeling is that most of the people are defending this FSR simply
because it exists, not because of its intrisic quality.

Hint: I suggest the experts to take a comprehensive look at the
cmap table of the OpenType fonts (pure unicode technology).
Those people know how to work.

I would be very happy to be wrong. Unfortunately, I'm affraid
it's not t

Re: Py 3.3, unicode / upper()

2012-12-20 Thread wxjmfauth
Le mercredi 19 décembre 2012 22:31:42 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 2:18 PM,   wrote:
> 
> > latin-1 (iso-8859-1) ? are you sure ?
> 
> 
> 
> Yes.
> 
> 
> 
>  sys.getsizeof('a')
> 
> > 26
> 
>  sys.getsizeof('ab')
> 
> > 27
> 
>  sys.getsizeof('aé')
> 
> > 39
> 
> 
> 
> Compare to:
> 
> 
> 
> >>> sys.getsizeof('a\u0100')
> 
> 42
> 
> 
> 
> The reason for the difference you posted is that pure ASCII strings
> 
> have a further optimization, which I glossed over and which is purely
> 
> a savings in overhead:
> 
> 
> 
> >>> sys.getsizeof('abcde') - sys.getsizeof('a')
> 
> 4
> 
> >>> sys.getsizeof('ábçdê') - sys.getsizeof('á')
> 
> 4

-

I know all of this. And this is exactly, what I explained.
I do not care about this optimization. I'm not an ascii user.
As a non ascii user, this optimization is just irrelevant.

What should a Python user think, if he sees his strings
are comsuming more memory just because he uses non ascii
characters or he sees his strings are changing just because
he "uppercases" them.
Unicode is here to serve anybody.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread wxjmfauth
Le mercredi 19 décembre 2012 22:23:15 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 1:55 PM,   wrote:
> 
> > Yes, it is correct (or can be considered as correct).
> 
> > I do not wish to discuss the typographical problematic
> 
> > of "Das Grosse Eszett". The web is full of pages on the
> 
> > subject. However, I never succeeded to find an "official
> 
> > position" from Unicode. The best information I found seem
> 
> > to indicate (to converge), U+1E9E is now the "supported"
> 
> > uppercase form of U+00DF. (see DIN).
> 
> 
> 
> Is this link not official?
> 
> 
> 
> http://unicode.org/cldr/utility/character.jsp?a=00DF
> 
> 
> 
> That defines a full uppercase mapping to SS and a simple uppercase
> 
> mapping to U+00DF itself, not U+1E9E.  My understanding of the simple
> 
> mapping is that it is not allowed to map to multiple characters,
> 
> whereas the full mapping is so allowed.
> 
> 
> 
> > What is bothering me, is more the implementation. The Unicode
> 
> > documentation says roughly this: if something can not be
> 
> > honoured, there is no harm, but do not implement a workaroud.
> 
> > In that case, I'm not sure Python is doing the best.
> 
> 
> 
> But this behavior is per the specification, not a workaround.  I think
> 
> the worst thing we could do in this regard would be to start diverging
> 
> from the specification because we think we know better than the
> 
> Unicode Consortium.
> 
> 
> 
> 
> 
> > If "wrong", this can be considered as programmatically correct
> 
> > or logically acceptable (Py3.2)
> 
> >
> 
>  'Straße'.upper().lower().capitalize() == 'Straße'
> 
> > True
> 
> >
> 
> > while this will *always* be problematic (Py3.3)
> 
> >
> 
>  'Straße'.upper().lower().capitalize() == 'Straße'
> 
> > False
> 
> 
> 
> On the other hand (Py3.2):
> 
> 
> 
> >>> 'Straße'.upper().isupper()
> 
> False
> 
> 
> 
> vs. Py3.3:
> 
> 
> 
> >>> 'Straße'.upper().isupper()
> 
> True
> 
> 
> 
> There is probably no one clearly correct way to handle the problem,
> 
> but personally this contradiction bothers me more than the example
> 
> that you posted.



At least, we agree on the problematic of this very special case.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread wxjmfauth
Le jeudi 20 décembre 2012 06:32:42 UTC+1, Terry Reedy a écrit :
> On 12/19/2012 10:12 PM, Westley Martínez wrote:
> 
> > On Wed, Dec 19, 2012 at 09:54:20PM -0500, Terry Reedy wrote:
> 
> >> On 12/19/2012 9:03 PM, Chris Angelico wrote:
> 
> >>> On Thu, Dec 20, 2012 at 5:27 AM, Ian Kelly  wrote:
> 
>   From what I've been able to discern, [jmf's] actual complaint about PEP
> 
>  393 stems from misguided moral concerns.  With PEP-393, strings that
> 
>  can be fully represented in Latin-1 can be stored in half the space
> 
>  (ignoring fixed overhead) compared to strings containing at least one
> 
>  non-Latin-1 character.  jmf thinks this optimization is unfair to
> 
>  non-English users and immoral; he wants Latin-1 strings to be treated
> 
>  exactly like non-Latin-1 strings (I don't think he actually cares
> 
>  about non-BMP strings at all; if narrow-build Unicode is good enough
> 
>  for him, then it must be good enough for everybody).
> 
> >>>
> 
> >>> Not entirely; most of his complaints are based on performance (speed
> 
> >>> and/or memory) of 3.3 compared to a narrow build of 3.2, using silly
> 
> >>> edge cases to prove how much worse 3.3 is, while utterly ignoring the
> 
> >>> fact that, in those self-same edge cases, 3.2 is buggy.
> 
> >>
> 
> >> And the fact that stringbench.py is overall about as fast with 3.3
> 
> >> as with 3.2 *on the same Windows 7 machine* (which uses narrow build
> 
> >> in 3.2), and that unicode operations are not far from bytes
> 
> >> operations when the same thing can be done with both.
> 
> >>
> 
> >> --
> 
> >> Terry Jan Reedy
> 
> >
> 
> > Really, why should we be so obsessed with speed anyways?  Isn't
> 
> > improving the language and fixing bugs far more important?
> 
> 
> 
> Being conservative, there are probably at least 10 enhancement patches 
> 
> and 30 bug fix patches for every performance patch. Performance patches 
> 
> are considered enhancements and only go in new versions with 
> 
> enhancements, where they go through the extended alpha, beta, candidate 
> 
> test and evaluation process.
> 
> 
> 
> In the unicode case, Jim discovered that find was several times slower 
> 
> in 3.3 than 3.2 and claimed that that was a reason to not use 3.2. I ran 
> 
> the complete stringbency.py and discovered that find (and consequently 
> 
> find and replace) are the only operations with such a slowdown. I also 
> 
> discovered that another at least as common operation, encoding strings 
> 
> that only contain ascii characters to ascii bytes for transmission, is 
> 
> several times as fast in 3.3. So I reported that unless one is only 
> 
> finding substrings in long strings, there is no reason to not upgrade to 
> 
> 3.3.
> 
> 
> 
> -- 
> 
> Terry Jan Reedy



I shew a case where the Py33 works 10 times slower than Py32, 
"replace". You the devs spend your time to correct that case.

Now, if I'm putting on the table an exemple working 20 times
slower. Will you spend your time to optimize that?

I'm affraid, this is the FSR which is problematic, not the
corner cases.

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread MRAB

On 2012-12-20 19:19, wxjmfa...@gmail.com wrote:

Fact.
In order to work comfortably and with efficiency with a "scheme for
the coding of the characters", can be unicode or any coding scheme,
one has to take into account two things: 1) work with a unique set
of characters and 2) work with a contiguous block of code points.

At this point, it should be noticed I did not even wrote about
the real coding, only about characters and code points.

Now, let's take a look at what happens when one breaks the rules
above and, precisely, if one attempts to work with multiple
characters sets or if one divides - artificially - the whole range
of the unicode code points in chunks.

The first (and it should be quite obvious) consequence is that
you create bloated, unnecessary and useless code. I simplify
the flexible string representation (FSR) and will use an "ascii" /
"non-ascii" model/terminology.

If you are an "ascii" user, a FSR model has no sense. An
"ascii" user will use, per definition, only "ascii characters".

If you are a "non-ascii" user, the FSR model is also a non
sense, because you are per definition a n"on-ascii" user of
"non-ascii" character. Any optimisation for "ascii" user just
become irrelevant.

In one sense, to escape from this, you have to be at the same time
a non "ascii" user and a non "non-ascii" user. Impossible.
In both cases, a FSR model is useless and in both cases you are
forced to use bloated and unnecessary code.

The rule is to treat every character of a unique set of characters
of a coding scheme in, how to say, an "equal way". The problematic
can be seen the other way, every coding scheme has been built
to work with a unique set of characters, otherwhile it is not
properly working!


[snip]
It's true that in an ideal world you would treat all codepoints the
same. However, this is a case where "practicality beats purity".

In order to accommodate every codepoint you need 3 bytes per codepoint
(although for pragmatic reasons it's 4 bytes per codepoint).

But not all codepoints are used equally. Those in the "astral plane",
for example, are used rarely, so the vast majority of the time you
would be using twice as much memory as strictly necessary. There are
also, in reality, many times in which strings contain only ASCII-range
codepoints, although they may not be visible to the average user, being
the names of functions and attributes in program code, or tags and
attributes in HTML and XML.

FSR is a pragmatic solution to dealing with limited resources.

Would you prefer there to be a switch that makes strings always use 4
bytes per codepoint for those users and systems where memory is no
object?
--
http://mail.python.org/mailman/listinfo/python-list


Strange effect with import

2012-12-20 Thread Jens Thoms Toerring
Hi,

   I hope that this isn't a stupid question, asked already a
hundred times, but I haven't found anything definitive on
the problem I got bitten by. I have two Python files like
this:

 S1.py --
import random
import S2

class R( object ) :
r = random.random( )

if __name__ == "__main__" :
print R.r
S2.p( )

 S2.py --
import S1

def p( ) :
print S1.R.r

and my expectation was that the static variable 'r' of class
R would be identical when accessed from S1.py and S2.py.
Unfortunately, that isn't the case, the output is different
(and R seems to get instantiated twice).

But when I define R in S2.py instead

 S1.py --
import S2

print S2.R.r
S2.p( )

 S2.py --
import random

class R( object ) :
r = random.random( )

def p( ) :
print R.r

or, alternatively, if I put the defintion of class R into
a third file which I then import from the other 2 files,
things suddenly start to work as expected/ Can someone
explain what's going one here? I found this a bit sur-
prising.

This is, of course, not my "real" code - it would be much
more sensible to pass the number to the function in the
second file as an argument - but is the smallest possinle
program I could come up with that demonstrate the prob-
lem. In my "real" code it's unfortunately not possible
to pass that number to whatever is going to use it in the
 other file, I have to simulate a kind of global variable
shared between different files.

Best regards, Jens
-- 
  \   Jens Thoms Toerring  ___  j...@toerring.de
   \__  http://toerring.de
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread balparmak
Hi Grant

can you help me with this?

I am working with the python code below in ArcGIS to zoom into a shapefile's 
attribute table row features without selected until the end of table one by one.

I am trying to use this code but this one requires that a row is selected. 

import arcpy

mxd = arcpy.mapping.MapDocument('CURRENT')

df = arcpy.mapping.ListDataFrames(mxd, "Layers") [0]

df.zoomToSelectedFeatures()

arcpy.RefreshActiveView()

Any Help? 

Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread Jeffrey Ciesla
I'm just learning Python, so I doubt I could be much help, but I'd like to see 
how this progresses, maybe learn a little more about the language.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Dave Angel
On 12/20/2012 03:39 PM, Jens Thoms Toerring wrote:
> Hi,
>
>I hope that this isn't a stupid question, asked already a
> hundred times, but I haven't found anything definitive on
> the problem I got bitten by. I have two Python files like
> this:
>
>  S1.py --
> import random
> import S2
>
> class R( object ) :
> r = random.random( )
>
> if __name__ == "__main__" :
> print R.r
> S2.p( )
>
>  S2.py --
> import S1

You have a big problem right here.  You have two modules importing each
other.  Any time you have direct or indirect mutual imports, you have
the potential for trouble.

That trouble gets much worse since you are actually running one of these
as a script.  Presumably you're running S1.py as a script.  The script's
module object is NOT the same one as the other module S2 gets by
importing S1.  Don't do that.

Move the common code into a third module, and import that one from both
places.  Then it'll only exist once.



> def p( ) :
> print S1.R.r
>
> and my expectation was that the static variable 'r' of class
> R would be identical when accessed from S1.py and S2.py.
> Unfortunately, that isn't the case, the output is different
> (and R seems to get instantiated twice).
>
> But when I define R in S2.py instead
>
>  S1.py --
> import S2
>
> print S2.R.r
> S2.p( )
>
>  S2.py --
> import random
>
> class R( object ) :
> r = random.random( )
>
> def p( ) :
> print R.r
>
> or, alternatively, if I put the defintion of class R into
> a third file which I then import from the other 2 files,
> things suddenly start to work as expected/ Can someone
> explain what's going one here? I found this a bit sur-
> prising.
>
> This is, of course, not my "real" code - it would be much
> more sensible to pass the number to the function in the
> second file as an argument - but is the smallest possinle
> program I could come up with that demonstrate the prob-
> lem. In my "real" code it's unfortunately not possible
> to pass that number to whatever is going to use it in the
>  other file, I have to simulate a kind of global variable
> shared between different files.
>
> Best regards, Jens


-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread Steven D'Aprano
On Thu, 20 Dec 2012 10:08:20 -0800, balparmak wrote:

> Thank you for your reply Grant,
> 
> I am trying to attach mxd's but no chance. As you said that i dont have
> much experience in python. I used to work with VBA but its not an option
> anymore with new ArcGIS 10.
> 
> How can I add mxd's here?

The same way you would attach any other file.

What program are you using to send these posts? Are you using email or 
Usenet? Posting from a web interface or a smart phone or a desktop 
application? We are not mind-readers, nor are we watching you, so how can 
we tell you what button to click or command to give?



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Peter Otten
Jens Thoms Toerring wrote:

> Hi,
> 
>I hope that this isn't a stupid question, asked already a
> hundred times, but I haven't found anything definitive on
> the problem I got bitten by. I have two Python files like
> this:
> 
>  S1.py --
> import random
> import S2
> 
> class R( object ) :
> r = random.random( )
> 
> if __name__ == "__main__" :
> print R.r
> S2.p( )
> 
>  S2.py --
> import S1
> 
> def p( ) :
> print S1.R.r
> 
> and my expectation was that the static variable 'r' of class
> R would be identical when accessed from S1.py and S2.py.
> Unfortunately, that isn't the case, the output is different
> (and R seems to get instantiated twice).
> 
> But when I define R in S2.py instead
> 
>  S1.py --
> import S2
> 
> print S2.R.r
> S2.p( )
> 
>  S2.py --
> import random
> 
> class R( object ) :
> r = random.random( )
> 
> def p( ) :
> print R.r
> 
> or, alternatively, if I put the defintion of class R into
> a third file which I then import from the other 2 files,
> things suddenly start to work as expected/ 

That's the correct approach.

> Can someone
> explain what's going one here? I found this a bit sur-
> prising.

You should never import your program's main module anywhere else in the 
program. When Python imports a module it looks it up by the module's name in 
the sys.modules cache. For the main script that name will be "__main__" 
regardless of the file's actual name, so a subsequent "import S2" will 
result in a cache miss and a new module instance.

Similar problems occur when there is a PYTHONPATH pointing into a package 
and you have both

import package.module

and

import module

Again you will end up with two module instances, one called 
"package.module", the other just "module".

> This is, of course, not my "real" code - it would be much
> more sensible to pass the number to the function in the
> second file as an argument - but is the smallest possinle
> program I could come up with that demonstrate the prob-
> lem. In my "real" code it's unfortunately not possible
> to pass that number to whatever is going to use it in the
>  other file, I have to simulate a kind of global variable
> shared between different files.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread Steven D'Aprano
On Thu, 20 Dec 2012 12:39:38 -0800, balparmak wrote:

> I am working with the python code below in ArcGIS to zoom into a
> shapefile's attribute table row features without selected until the end
> of table one by one.
>
> I am trying to use this code but this one requires that a row is
> selected.

Then select a row.

When you get an error message that tells you what is required, don't 
argue with it, fix the problem that it tells you. If arcpy requires you 
to select a row to work with, then you have to select a row to work with.


> import arcpy
> mxd = arcpy.mapping.MapDocument('CURRENT')
> df = arcpy.mapping.ListDataFrames(mxd, "Layers") [0]
> df.zoomToSelectedFeatures()

How do you expect to zoom to selected features if you have no selected 
features? Before this line, you need to select the feature you want to 
zoom to.

> arcpy.RefreshActiveView()
> 
> Any Help?



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 7:20 AM, MRAB  wrote:
> On 2012-12-20 19:19, wxjmfa...@gmail.com wrote:
>> The rule is to treat every character of a unique set of characters
>> of a coding scheme in, how to say, an "equal way". The problematic
>> can be seen the other way, every coding scheme has been built
>> to work with a unique set of characters, otherwhile it is not
>> properly working!
>>
> It's true that in an ideal world you would treat all codepoints the
> same. However, this is a case where "practicality beats purity".

Actually no. Not all codepoints are the same. Ever heard of Huffman
coding? It's a broad technique used in everything from PK-ZIP/gzip
file compression to the Morse code ("here come dots!"). It exploits
and depends on a dramatically unequal usage distribution pattern, as
all text (he will ask "All?" You will respond "All!" He will
understand -- referring to Caeser) exhibits.

In the case of strings in a Python program, it's fairly obvious that
there will be *many* that are ASCII-only; and what's more, most of the
long strings will either be ASCII-only or have a large number of
non-ASCII characters. However, your microbenchmarks usually look at
two highly unusual cases: either a string with a huge number of ASCII
chars and one non-ASCII, or all the same non-ASCII (usually for your
replace() tests). I haven't seen strings like either of those come up.

Can you show us a performance regression in an  *actual* *production*
*program*? And make sure you're comparing against a wide build, here.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread Dave Angel
On 12/20/2012 01:08 PM, balpar...@gmail.com wrote:
> Thank you for your reply Grant, 
>
> I am trying to attach mxd's but no chance. As you said that i dont have much 
> experience in python. I used to work with VBA but its not an option anymore 
> with new ArcGIS 10.
>
> How can I add mxd's here? 

MXD doesn't seem to be a text format.  So don't try to attach it to a
text mailing list.  Instead put it on a web site, and put a link to it
in your message.  You also should attempt to make the sample file(s)
small, so people don't have to download something large.

Having said that, I personally won't be able to help, as I know nothing
about ArcGis.

-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread balparmak
On Thursday, December 20, 2012 2:03:52 PM UTC-7, Steven D'Aprano wrote:
> On Thu, 20 Dec 2012 12:39:38 -0800, balparmak wrote: > I am working with the 
> python code below in ArcGIS to zoom into a > shapefile's attribute table row 
> features without selected until the end > of table one by one. > > I am 
> trying to use this code but this one requires that a row is > selected. Then 
> select a row. When you get an error message that tells you what is required, 
> don't argue with it, fix the problem that it tells you. If arcpy requires you 
> to select a row to work with, then you have to select a row to work with. > 
> import arcpy > mxd = arcpy.mapping.MapDocument('CURRENT') > df = 
> arcpy.mapping.ListDataFrames(mxd, "Layers") [0] > df.zoomToSelectedFeatures() 
> How do you expect to zoom to selected features if you have no selected 
> features? Before this line, you need to select the feature you want to zoom 
> to. > arcpy.RefreshActiveView() > > Any Help? -- Steven

what I am thinking is that I can specify the first layer(mapindex shapefile) in 
the table of contents (using ArcGIS 10) and then specify the row of that 
attribute table(mapindex's). Create loop to go through the attribute table's 
record until the end of it.

Thanks again,
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Data Driven Process

2012-12-20 Thread balparmak

I thought that with python you can specify first layer's attribute table in the 
table of contents and then go through the records in arcgis. 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Terry Reedy

On 12/20/2012 2:19 PM, wxjmfa...@gmail.com wrote:



If you are an "ascii" user, a FSR model has no sense. An
"ascii" user will use, per definition, only "ascii characters".

If you are a "non-ascii" user, the FSR model is also a non
sense, because you are per definition a n"on-ascii" user of
"non-ascii" character. Any optimisation for "ascii" user just
become irrelevant.


This is a false dichotomy. Conclusions based on falsity are false.


In one sense, to escape from this, you have to be at the same time
a non "ascii" user and a non "non-ascii" user. Impossible.


This is wrong. Every Python user is an ascii user. All names in the 
stdlib are ascii-only. These names all become strings in code objects. 
All docstrings (with a couple of rare exceptions) are ascii-only. They 
also become strings. *Every Python user* benefits from the new system in 
3.3.


Some Python users are also non-ascii user. This include many English 
speakers, as many English texts include non-ascii characters. (Just for 
starters, the copyright and trademark symbols are not in the ascii set.)



Contrary to what has been said, the bad cases I presented here are
not corner cases. There is practically and systematically a regression
in Py33 compared to Py32.


I posted evidence otherwise. Jim never responded to those posts. Instead 
he repeats the falsehood refuted by evidence.



That's very easy to test.


Yes. Run stringbench.py on the OS/machine on 3.2 and 3.3 as I did.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Steven D'Aprano
On Thu, 20 Dec 2012 20:39:19 +, Jens Thoms Toerring wrote:

> Hi,
> 
>I hope that this isn't a stupid question, asked already a
> hundred times, but I haven't found anything definitive on the problem I
> got bitten by. I have two Python files like this:
>
>  S1.py --
> import random
> import S2
> 
> class R( object ) :
> r = random.random( )
> 
> if __name__ == "__main__" :
> print R.r
> S2.p( )
> 
>  S2.py --
> import S1
> 
> def p( ) :
> print S1.R.r
> 
> and my expectation was that the static variable 'r' of class R 

The terminology we prefer here is "class attribute", not "static 
variable". Attributes are always assigned in dynamic storage, whether 
they are per-instance or on the class.



> would be
> identical when accessed from S1.py and S2.py. Unfortunately, that isn't
> the case, the output is different (and R seems to get instantiated
> twice).

You don't instantiate R at all. You only ever refer to the class object, 
you never instantiate it to create an instance. What you are actually 
seeing is a side-effect of the way Python modules are imported:

- Python modules are instances that are instantiated at import 
  time, and then cached by module name;

- the module name is *usually* the file name (sans .py extension), 
  except when you are running it as a script, in which case it 
  gets set to the special value "__main__" instead.

So the end result is that you actually end up with THREE module objects, 
__main__, S2 and S1, even though there are only two module *files*. Both 
__main__ and S1 are instantiated from the same source code and contain 
the same objects: both have a class called R, with fully-qualified names 
__main__.R and S1.R, but they are separate objects.


[...]
> or, alternatively, if I put the defintion of class R into a third file
> which I then import from the other 2 files, things suddenly start to
> work as expected/ Can someone explain what's going one here? I found
> this a bit surprising.

You have a combination of two tricky situations:

* A circular import: module S1 imports S2, and S2 imports S1.

* A .py file, S1.py, being used as both an importable module 
  and a runnable script.

Circular imports are usually hard to get rid at the best of time. 
Combined with the second factor, they can lead to perplexing errors, as 
you have just found out.


> This is, of course, not my "real" code - it would be much more sensible
> to pass the number to the function in the second file as an argument -
> but is the smallest possinle program I could come up with that
> demonstrate the problem. 

And let me say sincerely, thank you for doing so! You would be amazed how 
many people do not make any effort to simplify their problem before 
asking for help.


> In my "real" code it's unfortunately not
> possible to pass that number to whatever is going to use it in the
>  other file, I have to simulate a kind of global variable
> shared between different files.

Well, I find that hard to believe. "Not convenient"? I could believe 
that. "Difficult"? Maybe. "Tricky"? I could even believe that. But "not 
possible"? No, I don't believe that it is impossible to pass variables 
around as method arguments.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Build and runtime dependencies

2012-12-20 Thread Jack Silver
I have two Linux From Scratch machine.

On the first one (the server), I want to build install python 3.3.0 in a
shared filesystem and access it from the second one (the client). These
machines are fairly minimal in term of the number of software installed. I
just want to install python on this filesystem and anything else.

I would like to know what are the build and runtime dependencies that are
needed on both machine.

My understanding is that the core CPython interpreter only needs a C
compiler to be built. For the extension modules, I think that only the
development headers of some additional libraries are needed on the server
machine. Hence, I do not need to install all those libraries on the client
machine. Right ?

I would like to build as much module I can, so I have a complete python
installation. Here is the list of dependecies I think I need to install on
the server machine :

expat
bzip2
gdbm
openssl
libffi
zlib
tk
sqlite
valgrind
bluez

anything ? Is there anything I need to install on the client too ?

Thanks

Jack
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Terry Reedy

On 12/20/2012 2:57 PM, wxjmfa...@gmail.com wrote:


I shew a case where the Py33 works 10 times slower than Py32,
"replace". You the devs spend your time to correct that case.


I discovered that it is the 'find' part of find and replace that is 
slower. The comparison is worse on Windows than on *nix. There is an 
issue on the tracker so it may be improved someday. Most devs are not 
especially bothered and would rather fix errors as part of their 
volunteer work.



Now, if I'm putting on the table an exemple working 20 times
slower. Will you spend your time to optimize that?

I'm affraid, this is the FSR which is problematic, not the
corner cases.


I showed another case where 3.3 is a thousand, a million times faster 
than 3.2. Does that make the old way 'problematic'?


Don't you think that the bugs (wrong answers) in narrow builds to be 
'problematic'? Do you really think that getting wrong answers faster is 
better that getting right answers possibly slower?


The 'find' operation is just 1 of about 30 that are tested by 
stringbench.py. Run that on 3.3 and 3.2, as I did, before talking about 
FSR as 'problematic'.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Terry Reedy

On 12/20/2012 2:40 PM, wxjmfa...@gmail.com wrote:


What should a Python user think, if he sees his strings
are comsuming more memory just because he uses non ascii
characters


What should a Python user think, if he (or she) sees his (or her) 
strings sometimes or often consuming less memory than they did previously?


I think the person should be grateful that people volunteered to make 
the improvement, rather than ungratefully bitch about it.


> or he sees his strings are changing just because

he "uppercases" them.


Uppercasing strings is supposed to change strings.


Unicode is here to serve anybody.


This we agree on. Python3.3 unicode serves everybody better than 3.2 does.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Steven D'Aprano
On Thu, 20 Dec 2012 11:40:21 -0800, wxjmfauth wrote:

> I do not care
> about this optimization. I'm not an ascii user. As a non ascii user,
> this optimization is just irrelevant.

WRONG.

Every Python user is an ASCII user. Every Python program has hundreds or 
thousands of ASCII strings.

# === example ===
import random


There's already one ASCII string in your code: the module name "random" 
is ASCII. Let's look inside that module:

py> dir(random)
['BPF', 'LOG4', 'NV_MAGICCONST', 'RECIP_BPF', 'Random', 'SG_MAGICCONST', 
'SystemRandom', 'TWOPI', '_BuiltinMethodType', '_MethodType', 
'_Sequence', '_Set', '__all__', '__builtins__', '__cached__', '__doc__', 
'__file__', '__initializing__', '__loader__', '__name__', '__package__', 
'_acos', '_ceil', '_cos', '_e', '_exp', '_inst', '_log', '_pi', 
'_random', '_sha512', '_sin', '_sqrt', '_test', '_test_generator', 
'_urandom', '_warn', 'betavariate', 'choice', 'expovariate', 
'gammavariate', 'gauss', 'getrandbits', 'getstate', 'lognormvariate', 
'normalvariate', 'paretovariate', 'randint', 'random', 'randrange', 
'sample', 'seed', 'setstate', 'shuffle', 'triangular', 'uniform', 
'vonmisesvariate', 'weibullvariate']


That's another 58 ASCII strings. Let's pick one of those:

py> dir(random.Random)
['VERSION', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', 
'__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', 
'__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', 
'__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', 
'__subclasshook__', '__weakref__', '_randbelow', 'betavariate', 'choice', 
'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', 
'lognormvariate', 'normalvariate', 'paretovariate', 'randint', 'random', 
'randrange', 'sample', 'seed', 'setstate', 'shuffle', 'triangular', 
'uniform', 'vonmisesvariate', 'weibullvariate']

That's another 51 ASCII strings. Let's pick one of them:

py> dir(random.Random.shuffle)
['__annotations__', '__call__', '__class__', '__closure__', '__code__', 
'__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', 
'__eq__', '__format__', '__ge__', '__get__', '__getattribute__', 
'__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', 
'__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', 
'__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', 
'__sizeof__', '__str__', '__subclasshook__']

And another 34 ASCII strings.

So to get access to just *one* method of *one* class of *one* module, we 
have already seen up to 144 ASCII strings. (Some of them will be 
duplicated.)

Even if every one of *your* classes, methods, functions, modules and 
variables are using non-ASCII names, you will still use ASCII strings for 
built-in functions and standard library modules.


> What should a Python user think, if he sees his strings are comsuming
> more memory just because he uses non ascii characters

WRONG!

His strings are consuming just as much memory as they need to. You cannot 
fit ten thousand different characters into a single byte. A single byte 
can represent only 2**8 = 256 characters. Two bytes can only represent 
65536 characters at most. Four bytes can represent the entire range of 
every character ever represented in human history, and more, but it is 
terribly wasteful: most strings do not use a billion different 
characters, and so use of a four-byte character encoding uses up to four 
times as much memory as necessary.


You are imagining that non-ASCII users are being discriminated against, 
with their strings being unfairly bloated. But that is not the case. 
Their strings would be equally large in a Python wide-build, give or take 
whatever overhead of the string object that change from version to 
version. If you are not comparing a wide-build of Python to Python 3.3, 
then your comparison is faulty. You are comparing "buggy Unicode, cannot 
handle the supplementary planes" with "fixed Unicode, can handle the 
supplementary planes". Python 3.2 narrow builds save memory by 
introducing bugs into Unicode strings. Python 3.3 fixes those bugs and 
still saves memory.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Jens Thoms Toerring
Thanks a lot to all three of you: that helped me understand
the errors of my ways! You just saved me a few more hours
of head-scratching;-)

A few replies to the questions and comments by Steven:

Steven D'Aprano  wrote:
> On Thu, 20 Dec 2012 20:39:19 +, Jens Thoms Toerring wrote:
> > and my expectation was that the static variable 'r' of class R 

> The terminology we prefer here is "class attribute", not "static 
> variable". Attributes are always assigned in dynamic storage, whether 
> they are per-instance or on the class.

I'm comimg from C/C++ and that's were my terminology is from,
I know I still have to learn a lot more about Python;-)



> > In my "real" code it's unfortunately not
> > possible to pass that number to whatever is going to use it in the
> >  other file, I have to simulate a kind of global variable
> > shared between different files.

> Well, I find that hard to believe. "Not convenient"? I could believe 
> that. "Difficult"? Maybe. "Tricky"? I could even believe that. But "not 
> possible"? No, I don't believe that it is impossible to pass variables 
> around as method arguments.

You are rather likely right and I probably should have written:
"I don't see any way to pass that variable to the object that
is supposed to use it". Perhaps you have an idea how it could
be done correctly when I explain the complete picture: I'm
writing a TCP server, based on SocketServer:

 server = SocketServer.TCPServer((192.168.1.10, 12345), ReqHandler)

where ReqHandler is the name of a class derived from
SocketServer.BaseRequestHandler

 class ReqHandler(SocketServer.BaseRequestHandler):
 ...

A new instance of this class is gernerated for each connection
request to the server. In the call that creates the server I can
only specify the name of the class but no arguments to be passed
to it on instantiation - at least I found nothing in the docu-
mentation. On the other hand I need to get some information into
this class and thus the only idea I came up with was to use some
kind of global variable for the purpose. Perhaps there's a much
better way to do that but I haven't found one yet. Or perhaps it
is an omission in the design of SocketServer or (more likely) my
mis-understanding of the documentation (as I wrote I'm relatively
new to Python).
  Thnak you and best regards, Jens
-- 
  \   Jens Thoms Toerring  ___  j...@toerring.de
   \__  http://toerring.de
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Terry Reedy

On 12/20/2012 2:19 PM, wxjmfa...@gmail.com wrote:


My feeling is that most of the people are defending this FSR simply
because it exists, not because of its intrisic quality.


The fact, contrary to your feeling, is that I was initially dubious that 
is could be made to work as well as it does. I was only really convinced 
when I ran stringbench in response to your over-genralized assertions.


It is also a fact that I proposed on the tracker and pydev list a 
different method of fixing the length and index bugs in narrow builds. 
It only saved space relative to wide builds but did not have the 
additional space-saving of the new scheme for ascii and latin-1 text.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Brython - Python in the browser

2012-12-20 Thread Terry Reedy



On Thu, Dec 20, 2012 at 8:37 PM, Pierre Quentel
 wrote:

I'm afraid I am going to disagree. The document is a tree
structure, and today Python doesn't have a syntax for easily
manipulating trees.


What Python does have is 11 versions of the augmented assignment 
statement: +=, -=, *=, /=, //=, %=, **=, >>=, <<=, &=, ^=, |=.
Moreover, these are *intended* to be implemented in place, by mutation, 
for mutable objects, with possibly class-specific meanings.


>> To add a child to a node, using an operator

instead of a function call saves a lot of typing ;


We agree. Just use the proper sort of operator. I believe you said 
elsewhere that you *are* using one augmented assignment, +=, to add a 
sibling. That is a proper use. I am saying to use another to add a child.


<= is a comparison expression operator, which is completely different. 
It is just wrong for this usage. I am 99.9% sure you will come to regret 
it eventually. Better to make the change now than in Brython2 or Brython3.


>> <= looks like a

left arrow, which is a visual indication of the meaning "receive as
child". |= doesn't have this arrow shape


If you want to talk shape, I could argue that you should use -= for 
adding a sibling (horizontal link, -) and |= for adding a child 
(vertical link, |). Since you probably want to stick with += and like 
the 'arrowness' of <=, use the augmented assignment operator <<= instead 
of comparison operator <=.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Py 3.3, unicode / upper()

2012-12-20 Thread Ian Kelly
On Thu, Dec 20, 2012 at 12:19 PM,   wrote:
> The first (and it should be quite obvious) consequence is that
> you create bloated, unnecessary and useless code. I simplify
> the flexible string representation (FSR) and will use an "ascii" /
> "non-ascii" model/terminology.
>
> If you are an "ascii" user, a FSR model has no sense. An
> "ascii" user will use, per definition, only "ascii characters".
>
> If you are a "non-ascii" user, the FSR model is also a non
> sense, because you are per definition a n"on-ascii" user of
> "non-ascii" character. Any optimisation for "ascii" user just
> become irrelevant.
>
> In one sense, to escape from this, you have to be at the same time
> a non "ascii" user and a non "non-ascii" user. Impossible.
> In both cases, a FSR model is useless and in both cases you are
> forced to use bloated and unnecessary code.

As Terry and Steven have already pointed out, there is no such thing
as a "non-ascii" user.  Here I will take the complementary approach
and point out that there is also no such thing as an "ascii" user.
There are only users whose strings are 99.99% (or more) ASCII.  A user
may think that his program will never be given any non-ASCII input to
deal with, but experience tells us that this thought is probably
wrong.

Suppose you were to split the Unicode representation into separate
"ASCII-only" and "wide" data types.  Then which data type is the
correct one to choose for an "ascii" user?  The correct answer is
*always* the wide data type, for the reason stated above.  If the user
chooses the ASCII-only data type, then as soon his program encounters
non-ASCII data, it breaks.  The only users of the ASCII-only data type
then would be the authors of buggy programs.  The same issue applies
to narrow (UTF-16) data types.  So there really are only two viable,
non-buggy options for Unicode representations: FSR, or always wide
(UTF-32).  The latter is wildly inefficient in many cases, so Python
went with FSR.

A third option might be proposed, which would be to have a build
switch between FSR or always wide, with the promise that the two will
be indistinguishable at the Python level (apart from the amount of
memory used).  This is probably not on the table, however, as it would
have a non-negligible maintenance cost, and it's not clear that
anybody other than you would actually want it.

> A solution à la FSR can not work or not work in a optimized way.
> It is not a coding scheme, it is a composite of coding schemes
> handling several characters sets. Hard to imagine something worse.

It is not a composite of coding schemes.  The str type deals with
exactly *one* character set -- the UCS.  The different representations
are not different coding schemes.  They are *all* UTF-32.  The only
significant difference between the representations is that the leading
zero bytes of each character are made implicit (i.e. truncated) if the
nature of the string allows it.

> Contrary to what has been said, the bad cases I presented here are
> not corner cases.

The only significantly regressive case that you've presented here has
been str.replace on inputs engineered for bad performance.  That's why
people characterize them as corner cases -- because that's exactly
what they are.

> There is practically and systematically a regression
> in Py33 compared to Py32.
> That's very easy to test. I did all my tests at the light of what
> I explained above. I was not a suprise for me to this expectidly
> bad behaviour.

Have you run stringbench.py yet?  When I ran it on my system, the full
set of Unicode benchmarks ran in 268.15 seconds for Python 3.2 versus
198.77 seconds for Python 3.3.  That's a 26% overall speedup for the
covered benchmarks, which seem reasonably thorough.  That does not
demonstrate a "systematic regression".  If anything, that shows a
systematic improvement.

Your cherry-picking of benchmarks is like a driver who has two routes
to their destination; one takes ten minutes on average but has one
annoyingly long traffic light, while the second takes fifteen minutes
on average but has no traffic lights (and a correspondingly higher
accident rate).  Yet for some reason you insist that the second route
is better because the traffic light makes the first route
"systematically" slower.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 11:19 AM, larry.mart...@gmail.com
 wrote:
> This code works, but it takes way too long to run - e.g. when cdata has 
> 600,000 elements (which is typical for my app) it takes 2 hours for this to 
> run.
>
> Can anyone give me some suggestions on speeding this up?
>

It sounds like you may have enough data to want to not keep it all in
memory. Have you considered switching to a database? You could then
execute SQL queries against it.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Build and runtime dependencies

2012-12-20 Thread Miki Tebeka
On Thursday, December 20, 2012 2:11:45 PM UTC-8, Jack Silver wrote:
> I have two Linux From Scratch machine.
> Hence, I do not need to install all those libraries on the client machine. 
> Right ?
It depends on what the client needs. For example if you use zlib compression in 
the protocol, you'll need this in the client as well to uncompress.

I suggest you use the exact same Python builds on both machines, it'll save you 
some headache in the future.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread larry.mart...@gmail.com
On Thursday, December 20, 2012 5:38:03 PM UTC-7, Chris Angelico wrote:
> On Fri, Dec 21, 2012 at 11:19 AM, larry.mart...@gmail.com
> 
>  wrote:
> 
> > This code works, but it takes way too long to run - e.g. when cdata has 
> > 600,000 elements (which is typical for my app) it takes 2 hours for this to 
> > run.
> 
> >
> 
> > Can anyone give me some suggestions on speeding this up?
> 
> >
> 
> 
> 
> It sounds like you may have enough data to want to not keep it all in
> 
> memory. Have you considered switching to a database? You could then
> 
> execute SQL queries against it.

It came from a database. Originally I was getting just the data I wanted using 
SQL, but that was taking too long also. I was selecting just the messages I 
wanted, then for each one of those doing another query to get the data within 
the time diff of each. That was resulting in tens of thousands of queries. So I 
changed it to pull all the potential matches at once and then process it in 
python. 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Terry Reedy

On 12/20/2012 5:52 PM, Jens Thoms Toerring wrote:


You are rather likely right and I probably should have written:
"I don't see any way to pass that variable to the object that
is supposed to use it". Perhaps you have an idea how it could
be done correctly when I explain the complete picture: I'm
writing a TCP server, based on SocketServer:

  server = SocketServer.TCPServer((192.168.1.10, 12345), ReqHandler)

where ReqHandler is the name of a class derived from
SocketServer.BaseRequestHandler


You misunderstood the doc. You pass the class, not the name of the class.
From 21.19.4.1. socketserver.TCPServer Example
server = socketserver.TCPServer((HOST, PORT), MyTCPHandler)

MyTCPHandler is the actual class. What gets 'passed' at the C level in 
CPython is a reference that class that TCPServer can use to call it, but 
conceptually, at the Python level, think of it as the class. In the 
code, you enter the name without quotes and that expression evaluates to 
the (reference to the) class that gets passed.


If the signature required the name, the example would have had 
'MyTCPHandler', with the quotes, to pass the name as a string.


Very few builtin functions require names as strings. open('filename'), 
somebytes.encode(encoding='encoding-name', errors = 
'error-handler-name') are two that come to mind. Notice that these are 
situations where requiring a non-string object would be inconvenient at 
best.




  class ReqHandler(SocketServer.BaseRequestHandler):
  ...

A new instance of this class is gernerated for each connection
request to the server. In the call that creates the server I can
only specify the name of the class but no arguments to be passed


Code those arguments directly into the handle method of your version of 
MyTCPhandler. Or if you need to override multiple methods and use the 
same values in multiple methods, override __init__ and add self.x = 
x-value statements.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Hans Mulder
On 20/12/12 23:52:24, Jens Thoms Toerring wrote:
> I'm writing a TCP server, based on SocketServer:
> 
>  server = SocketServer.TCPServer((192.168.1.10, 12345), ReqHandler)
> 
> where ReqHandler is the name of a class derived from
> SocketServer.BaseRequestHandler
> 
>  class ReqHandler(SocketServer.BaseRequestHandler):
>  ...
> 
> A new instance of this class is gernerated for each connection
> request to the server. In the call that creates the server I can
> only specify the name of the class but no arguments to be passed
> to it on instantiation - at least I found nothing in the docu-
> mentation.

What happens if instead of a class you pass a function that
takes the same arguments as the SocketServer.BaseRequestHandler
constructor and returns a new instance of your ReqHandler?

That's not quite what the documentaion clls for, but I'd hope
it's close enough.


Maybe something like this:

class ReqHandler(SocketServer.BaseRequestHandler):
def __init__(self, request, client_address, server, ham, spam)
super(SocketServer, self).__init__(
self, request, client_address, server)
self.ham = ham
self.spam = spam


And later:

import functools

server = SocketServer.TCPServer((192.168.1.10, 12345),
   functools.partial(ReqHandler, ham="hello", spam=42))

> On the other hand I need to get some information into
> this class and thus the only idea I came up with was to use some
> kind of global variable for the purpose. Perhaps there's a much
> better way to do that but I haven't found one yet. Or perhaps it
> is an omission in the design of SocketServer

I think you could call it a weakness in the design of SocketServer.

Life would be easier if it took as an optional third argument some
sequence that it would pass as extra arguments when it instantiates
the handler instance.  Then you wouldn't have to play with functools
(or closures, or global variables) to solve your problems.

> or (more likely) my mis-understanding of the documentation
> (as I wrote I'm relatively new to Python).

>From where I sit, it looks like the authors of the SocketServer
module didn't expect subclasses of BaseRequestHandler to need
extra attributes than their base class.

Or maybe they thought everybody knew functools.partial.


Hope this helps,

-- HansM

-- 
http://mail.python.org/mailman/listinfo/python-list


Question regarding mod_python and a script for web.

2012-12-20 Thread John Pennington

Hi Everyone,
I'm a linux admin that was tasked by his python programming boss to solve a 
problem my boss is having with a web form he wrote on our site. Unfortunately 
for me, I lack any experience whatsoever with python and very little with 
programming on the web, so my hope is someone can point me in the right 
direction for solving this.
Basically, the problem is this. we have a webform that collects data  such as, 
NAME, SSN, EMAIL Address etc.. when the user hits submit, 
the  uri posts to the query string like the folllowing:  
https://test.uchast.com/admit/supp.py?fname=john&lname=fenn&mi=ted&ssn=123456789&ssn_confirm=123456789&phone=412-658-3178&email=jojo%40uc.com&alt_email=jojo12%40yahoo.com&lsacid=&grad_date=May-2013&program=JD&step=2
Which is bad as we are are going to be collecting data like Social Security 
numbers. 
We are using mod_python 
WSGIScriptAlias /myapp /var/www/html/admit/index.pyRewriteEngine OnRewriteRule 
^/admit https://test.uchast.com/admitOrder 
deny,allowAllow from allSSLRequireSSLDirectoryIndex index.py
AddHandler mod_python .pyPythonHandler mod_python.cgihandler
###PythonHandler mod_python.publisher
PythonDebug On
Does anyone have an idea how to make sure a python script doesn't put the data 
in the query string? Am I even making sense? Any all help would be greatly 
appreciated because as I mentioned I'm as new as it gets.
Thanks,
John  -- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Jens Thoms Toerring
Terry Reedy  wrote:
> >   server = SocketServer.TCPServer((192.168.1.10, 12345), ReqHandler)
> >
> > where ReqHandler is the name of a class derived from
> > SocketServer.BaseRequestHandler

> You misunderstood the doc. You pass the class, not the name of the class.
>  From 21.19.4.1. socketserver.TCPServer Example
>  server = socketserver.TCPServer((HOST, PORT), MyTCPHandler)

Yes, I meant "the class", but I'm a bit weak on nomenclature in
Python;-)

> > A new instance of this class is gernerated for each connection
> > request to the server. In the call that creates the server I can
> > only specify the name of the class but no arguments to be passed

> Code those arguments directly into the handle method of your version of 
> MyTCPhandler. Or if you need to override multiple methods and use the 
> same values in multiple methods, override __init__ and add self.x = 
> x-value statements.

Sorry, you lost me there: what means "code those arguments
directly into the handle method"? According to the documen-
tation (or at least to my understanding of it;-) the handle()
method is suppose to accept just one argument, 'self'. And
even if I would change the method to accept more arguments
and that wouldnt blow up into my face, where would they be
coming from (and from where would I pass them)?

   Best regards, Jens
-- 
  \   Jens Thoms Toerring  ___  j...@toerring.de
   \__  http://toerring.de
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread Dave Angel
On 12/20/2012 07:19 PM, larry.mart...@gmail.com wrote:
> I have a list of tuples that contains a tool_id, a time, and a message. I 
> want to select from this list all the elements where the message matches some 
> string, and all the other elements where the time is within some diff of any 
> matching message for that tool. 
>
> Here is how I am currently doing this:

No, it's not.  This is a fragment of code, without enough clues as to
what else is going.  We can guess, but that's likely to make a mess.

First question is whether this code works exactly correctly?  Are you
only concerned about speed, not fixing features?  As far as I can tell,
the logic that includes the time comparison is bogus.  You don't do
anything there to worry about the value of tup[2], just whether some
item has a nearby time.  Of course, I could misunderstand the spec.

Are you making a global called 'self' ?  That name is by convention only
used in methods to designate the instance object.  What's the attribute
self?

Can cdata have duplicates, and are they significant?  Are you including
the time building that as part of your 2 hour measurement?  Is the list
sorted in any way?

Chances are your performance bottleneck is the doubly-nested loop.  You
have a list comprehension at top-level code, and inside it calls a
function that also loops over the 600,000 items.  So the inner loop gets
executed 360 billion times.  You can cut this down drastically by some
judicious sorting, as well as by having a map of lists, where the map is
keyed by the tool.

>
> # record time for each message matching the specified message for each tool
> messageTimes = {}

You're building a dictionary;  are you actually using the value (1), or
is only the key relevant?  A set is a dict without a value.  But more
importantly, you never look up anything in this dictionary.  So why
isn't it a list?  For that matter, why don't you just use the
messageTimes list?

> for row in cdata:   # tool, time, message
> if self.message in row[2]:
> messageTimes[row[0], row[1]] = 1
>
> # now pull out each message that is within the time diff for each matched 
> message
> # as well as the matched messages themselves
>
> def determine(tup):
> if self.message in tup[2]: return True  # matched message 
>
> for (tool, date_time) in messageTimes:
> if tool == tup[0]:
> if abs(date_time-tup[1]) <= tdiff: 
>return True
>
> return False
> 
> cdata[:] = [tup for tup in cdata if determine(tup)]

As the code exists, there's no need to copy the list.  Just do a simple
bind.

>
> This code works, but it takes way too long to run - e.g. when cdata has 
> 600,000 elements (which is typical for my app) it takes 2 hours for this to 
> run. 
>
> Can anyone give me some suggestions on speeding this up?
>
> TIA!


-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Strange effect with import

2012-12-20 Thread Jens Thoms Toerring
Hans Mulder  wrote:
> What happens if instead of a class you pass a function that
> takes the same arguments as the SocketServer.BaseRequestHandler
> constructor and returns a new instance of your ReqHandler?

> That's not quite what the documentaion clls for, but I'd hope
> it's close enough.

Interesting idea - I'm not yet at a level of Python wizardry
that I would dare to do something that's not explicitely bles-
sed be the documentation;-) 

> Maybe something like this:

> class ReqHandler(SocketServer.BaseRequestHandler):
> def __init__(self, request, client_address, server, ham, spam)
> super(SocketServer, self).__init__(
> self, request, client_address, server)
> self.ham = ham
> self.spam = spam
> 

> And later:

> import functools

> server = SocketServer.TCPServer((192.168.1.10, 12345),
>functools.partial(ReqHandler, ham="hello", spam=42))

Ok, that's still way over may head at the moment;-) I will hhave
to read up on functools tomorrow, it's the first time I heard of
it but it looks quite interesting at a first glance.

Thank you for these ideas, I'll need a bit of time to figure out
these new concepts and I don't think I'm up to it tonight any-
more;-)
 Best regards. Jens
-- 
  \   Jens Thoms Toerring  ___  j...@toerring.de
   \__  http://toerring.de
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Build and runtime dependencies

2012-12-20 Thread Hans Mulder
On 20/12/12 23:11:45, Jack Silver wrote:
> I have two Linux From Scratch machine.
> 
> On the first one (the server), I want to build install python 3.3.0 in a
> shared filesystem and access it from the second one (the client). These
> machines are fairly minimal in term of the number of software installed.
> I just want to install python on this filesystem and anything else.
> 
> I would like to know what are the build and runtime dependencies that
> are needed on both machine.
> 
> My understanding is that the core CPython interpreter only needs a C
> compiler to be built.

You need the whole C toolchain: compiler, linker, make, etc.

> For the extension modules, I think that only the
> development headers of some additional libraries are needed on the
> server machine. Hence, I do not need to install all those libraries on
> the client machine. Right ?

Wrong.

Those libraries are typically shared libraries (i.e. .so files). You'll
have to install the shared libraries on both the server and the clients.

The development headers are used only at build time, so they are only
needed on the server.

I don't know the package naming conventions on your distro, but on
Debian the packages you only need on the server tend to contain the
word "dev".  For example, 'sqlite' would be installed on the client
and bots 'sqlite' and 'sqlite-dev' on the server.

> I would like to build as much module I can, so I have a complete python
> installation. Here is the list of dependecies I think I need to install
> on the server machine :
> 
> expat
> bzip2
> gdbm
> openssl
> libffi
> zlib
> tk
> sqlite
> valgrind
> bluez
> 
> anything ?

The source comes with a script named "configure" that tries to find
the headers it needs to build as many extensions modules as possible.

When the script is done, it prints a list of modules it could not
find the headers for.  When this list contains modules you'd like
to build, that means that you're still missing some depencency.

Keep in mind that some modules cannot be built on Linux (for example,
the MacOS module can only be built on MacOS Classic), so you shouldn't
expect to be able to build everything.

> Is there anything I need to install on the client too ?

Yes, the .so files, the actual shared libraries used by these
extensions.


Hope this helps,

-- HansM

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread larry.mart...@gmail.com
On Thursday, December 20, 2012 6:17:04 PM UTC-7, Dave Angel wrote:
> On 12/20/2012 07:19 PM, larry.mart...@gmail.com wrote:
> 
> > I have a list of tuples that contains a tool_id, a time, and a message. I 
> > want to select from this list all the elements where the message matches 
> > some string, and all the other elements where the time is within some diff 
> > of any matching message for that tool. 
> 
> > Here is how I am currently doing this:
> 
> No, it's not.  This is a fragment of code, without enough clues as to
> 
> what else is going.  We can guess, but that's likely to make a mess.

Of course it's a fragment - it's part of a large program and I was just showing 
the relevant parts. 

> First question is whether this code works exactly correctly?  

Yes, the code works. I end up with just the rows I want. 

> Are you only concerned about speed, not fixing features?  

Don't know what you mean by 'fixing features'. The code does what I want, it 
just takes too long.

> As far as I can tell, the logic that includes the time comparison is bogus.  

Not at all. 

> You don't do  anything there to worry about the value of tup[2], just whether 
> some
> item has a nearby time.  Of course, I could misunderstand the spec.

The data comes from a database. tup[2] is a datetime column. tdiff comes from a 
datetime.timedelta() 

> Are you making a global called 'self' ?  That name is by convention only
> used in methods to designate the instance object.  What's the attribute
> self?

Yes, self is my instance object. self.message contains the string of interest 
that I need to look for. 

> Can cdata have duplicates, and are they significant? 

No, it will not have duplicates.

> Are you including  the time building that as part of your 2 hour measurement? 

No, the 2 hours is just the time to run the 

cdata[:] = [tup for tup in cdata if determine(tup)]

> Is the list sorted in any way?

Yes, the list is sorted by tool and datetime.

> Chances are your performance bottleneck is the doubly-nested loop.  You
> have a list comprehension at top-level code, and inside it calls a
> function that also loops over the 600,000 items.  So the inner loop gets
> executed 360 billion times.  You can cut this down drastically by some
> judicious sorting, as well as by having a map of lists, where the map is
> keyed by the tool.

Thanks. I will try that.

> > # record time for each message matching the specified message for each tool
> 
> > messageTimes = {}
>
> You're building a dictionary;  are you actually using the value (1), or
>  is only the key relevant?  

Only the keys.

> A set is a dict without a value.  

Yes, I could use a set, but I don't think that would make it measurably faster. 

> But more mportantly, you never look up anything in this dictionary.  So why
> isn't it a list?  For that matter, why don't you just use the
> messageTimes list?

Yes, it could be a list too.
 
> > for row in cdata:   # tool, time, message
> 
> > if self.message in row[2]:
> 
> > messageTimes[row[0], row[1]] = 1
> 
> >
> 
> > # now pull out each message that is within the time diff for each matched 
> > message
> 
> > # as well as the matched messages themselves
> 
> >
> 
> > def determine(tup):
> 
> > if self.message in tup[2]: return True  # matched message 
> 
> >
> 
> > for (tool, date_time) in messageTimes:
> 
> > if tool == tup[0]:
> 
> > if abs(date_time-tup[1]) <= tdiff: 
> 
> >return True
> 
> >
> 
> > return False
> 
> > 
> 
> > cdata[:] = [tup for tup in cdata if determine(tup)]
> 
> 
> 
> As the code exists, there's no need to copy the list.  Just do a simple
> bind.

This statement is to remove the items from cdata that I don't want. I don't 
know what you mean by bind. I'm not familiar with that python function. 

> 
> 
> 
> >
> 
> > This code works, but it takes way too long to run - e.g. when cdata has 
> > 600,000 elements (which is typical for my app) it takes 2 hours for this to 
> > run. 
> 
> >
> 
> > Can anyone give me some suggestions on speeding this up?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding mod_python and a script for web.

2012-12-20 Thread ian douglas
Short answer: Use the POST method on the form instead of GET. Depending how
you process the form you might need to make a few changes to the script
that answers the request.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread MRAB

On 2012-12-21 00:19, larry.mart...@gmail.com wrote:

I have a list of tuples that contains a tool_id, a time, and a message. I want 
to select from this list all the elements where the message matches some 
string, and all the other elements where the time is within some diff of any 
matching message for that tool.

Here is how I am currently doing this:

# record time for each message matching the specified message for each tool
messageTimes = {}
for row in cdata:   # tool, time, message
 if self.message in row[2]:
 messageTimes[row[0], row[1]] = 1


It looks like 'messageTimes' is really a set of tool/time pairs.

You could make it a dict of sets of time; in other words, a set of
times for each tool:

messageTimes = defaultdict(set)
for row in cdata:   # tool, time, message
if self.message in row[2]:
messageTimes[row[0]].add(row[1])


# now pull out each message that is within the time diff for each matched 
message
# as well as the matched messages themselves

def determine(tup):
 if self.message in tup[2]: return True  # matched message

 for (tool, date_time) in messageTimes:
 if tool == tup[0]:
 if abs(date_time-tup[1]) <= tdiff:
return True

 return False


def determine(tup):
 if self.message in tup[2]: return True  # matched message

 # Scan through the times for the tool given by tup[0].
 for date_time in messageTimes[tup[0]]:
 if abs(date_time - tup[1]) <= tdiff:
return True

 return False


cdata[:] = [tup for tup in cdata if determine(tup)]

This code works, but it takes way too long to run - e.g. when cdata has 600,000 
elements (which is typical for my app) it takes 2 hours for this to run.

Can anyone give me some suggestions on speeding this up?



--
http://mail.python.org/mailman/listinfo/python-list


Re: Brython - Python in the browser

2012-12-20 Thread Steven D'Aprano
On Thu, 20 Dec 2012 18:59:39 -0500, Terry Reedy wrote:

>> On Thu, Dec 20, 2012 at 8:37 PM, Pierre Quentel
>>  wrote:
>>> I'm afraid I am going to disagree. The document is a tree structure,
>>> and today Python doesn't have a syntax for easily manipulating trees.
> 
> What Python does have is 11 versions of the augmented assignment
> statement: +=, -=, *=, /=, //=, %=, **=, >>=, <<=, &=, ^=, |=. Moreover,
> these are *intended* to be implemented in place, by mutation, for
> mutable objects, with possibly class-specific meanings.

I don't believe that is the case. The problem is that augmented 
assignment that mutates can be rather surprising to anyone who expects 
"a += b" to be a short cut for "a = a + b".

py> a = [1, 2, 3]; b = [99]; another = a
py> a = a + b
py> print(a, another)  # What I expect.
[1, 2, 3, 99] [1, 2, 3]

py> a = [1, 2, 3]; b = [99]; another = a
py> a += b
py> print(a, another)  # Surprise!
[1, 2, 3, 99] [1, 2, 3, 99]


Whichever behaviour you pick, you're going to surprise somebody. So I 
wouldn't say that mutate in place is *intended* or preferred in any way, 
only that it is *allowed* as an optimization if the class designer 
prefers so.

One might even have a class where (say) __iadd__ is defined but __add__ 
is not.

[...]
> <= is a comparison expression operator, which is completely different.

<= is a comparison operator for ints, floats, strings, lists, ... but not 
necessarily for *everything*. That's the beauty and horror of operator 
overloading. Any operator can mean anything.

If it were intended to only return a flag, then 1) Python would enforce 
that rule, and 2) the numpy people would be most upset.

I have no opinion on the usefulness or sensibility of using <= as an in-
place mutator method in this context, but I will say that if I were 
designing my own mini-DSL, I would not hesitate to give "comparison 
operators" some other meaning. Syntax should be judged in the context of 
the language you are using, not some other language. If you are using a 
DSL, then normal Python rules don't necessarily apply. <= in particular 
looks just like a left-pointing arrow and is an obvious candidate for 
overloading.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread Mitya Sirenef

On 12/20/2012 08:46 PM, larry.mart...@gmail.com wrote:

On Thursday, December 20, 2012 6:17:04 PM UTC-7, Dave Angel wrote:

On 12/20/2012 07:19 PM, larry.mart...@gmail.com wrote:


I have a list of tuples that contains a tool_id, a time, and a message. I want 
to select from this list all the elements where the message matches some 
string, and all the other elements where the time is within some diff of any 
matching message for that tool.
Here is how I am currently doing this:

No, it's not.  This is a fragment of code, without enough clues as to

what else is going.  We can guess, but that's likely to make a mess.

Of course it's a fragment - it's part of a large program and I was just showing 
the relevant parts.


First question is whether this code works exactly correctly?

Yes, the code works. I end up with just the rows I want.


Are you only concerned about speed, not fixing features?

Don't know what you mean by 'fixing features'. The code does what I want, it 
just takes too long.


As far as I can tell, the logic that includes the time comparison is bogus.

Not at all.


You don't do  anything there to worry about the value of tup[2], just whether 
some
item has a nearby time.  Of course, I could misunderstand the spec.

The data comes from a database. tup[2] is a datetime column. tdiff comes from a 
datetime.timedelta()


Are you making a global called 'self' ?  That name is by convention only
used in methods to designate the instance object.  What's the attribute
self?

Yes, self is my instance object. self.message contains the string of interest 
that I need to look for.


Can cdata have duplicates, and are they significant?

No, it will not have duplicates.


Are you including  the time building that as part of your 2 hour measurement?

No, the 2 hours is just the time to run the

cdata[:] = [tup for tup in cdata if determine(tup)]


Is the list sorted in any way?

Yes, the list is sorted by tool and datetime.


Chances are your performance bottleneck is the doubly-nested loop.  You
have a list comprehension at top-level code, and inside it calls a
function that also loops over the 600,000 items.  So the inner loop gets
executed 360 billion times.  You can cut this down drastically by some
judicious sorting, as well as by having a map of lists, where the map is
keyed by the tool.

Thanks. I will try that.


# record time for each message matching the specified message for each tool
messageTimes = {}

You're building a dictionary;  are you actually using the value (1), or
  is only the key relevant?

Only the keys.


A set is a dict without a value.

Yes, I could use a set, but I don't think that would make it measurably faster.


But more mportantly, you never look up anything in this dictionary.  So why
isn't it a list?  For that matter, why don't you just use the
messageTimes list?

Yes, it could be a list too.
  

for row in cdata:   # tool, time, message
 if self.message in row[2]:
 messageTimes[row[0], row[1]] = 1
# now pull out each message that is within the time diff for each matched 
message
# as well as the matched messages themselves
def determine(tup):
 if self.message in tup[2]: return True  # matched message
 for (tool, date_time) in messageTimes:
 if tool == tup[0]:
 if abs(date_time-tup[1]) <= tdiff:
return True
 return False
 
cdata[:] = [tup for tup in cdata if determine(tup)]



As the code exists, there's no need to copy the list.  Just do a simple
bind.

This statement is to remove the items from cdata that I don't want. I don't 
know what you mean by bind. I'm not familiar with that python function.





This code works, but it takes way too long to run - e.g. when cdata has 600,000 
elements (which is typical for my app) it takes 2 hours for this to run.
Can anyone give me some suggestions on speeding this up?



This code probably is not faster but it's simpler and may be easier for 
you to work with

to experiment with speed-improving changes:


diffrng = 1

L = [
 # id, time, string
 (1, 5, "ok"),
 (1, 6, "ok"),
 (1, 7, "no"),
 (1, 8, "no"),
 ]

match_times = [t[1] for t in L if "ok" in t[2]]

def in_range(timeval):
return bool( min([abs(timeval-v) for v in match_times]) <= diffrng )

print([t for t in L if in_range(t[1])])


But it really sounds like you could look into optimizing the db
query and db indexes, etc.


--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

--
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread Mitya Sirenef

On 12/20/2012 09:39 PM, Mitya Sirenef wrote:

On 12/20/2012 08:46 PM, larry.mart...@gmail.com wrote:

On Thursday, December 20, 2012 6:17:04 PM UTC-7, Dave Angel wrote:

On 12/20/2012 07:19 PM, larry.mart...@gmail.com wrote:

I have a list of tuples that contains a tool_id, a time, and a 
message. I want to select from this list all the elements where the 
message matches some string, and all the other elements where the 
time is within some diff of any matching message for that tool.

Here is how I am currently doing this:

No, it's not.  This is a fragment of code, without enough clues as to

what else is going.  We can guess, but that's likely to make a mess.
Of course it's a fragment - it's part of a large program and I was 
just showing the relevant parts.



First question is whether this code works exactly correctly?

Yes, the code works. I end up with just the rows I want.


Are you only concerned about speed, not fixing features?
Don't know what you mean by 'fixing features'. The code does what I 
want, it just takes too long.


As far as I can tell, the logic that includes the time comparison is 
bogus.

Not at all.

You don't do  anything there to worry about the value of tup[2], 
just whether some

item has a nearby time.  Of course, I could misunderstand the spec.
The data comes from a database. tup[2] is a datetime column. tdiff 
comes from a datetime.timedelta()



Are you making a global called 'self' ? That name is by convention only
used in methods to designate the instance object.  What's the attribute
self?
Yes, self is my instance object. self.message contains the string of 
interest that I need to look for.



Can cdata have duplicates, and are they significant?

No, it will not have duplicates.

Are you including  the time building that as part of your 2 hour 
measurement?

No, the 2 hours is just the time to run the

cdata[:] = [tup for tup in cdata if determine(tup)]


Is the list sorted in any way?

Yes, the list is sorted by tool and datetime.


Chances are your performance bottleneck is the doubly-nested loop.  You
have a list comprehension at top-level code, and inside it calls a
function that also loops over the 600,000 items.  So the inner loop 
gets

executed 360 billion times.  You can cut this down drastically by some
judicious sorting, as well as by having a map of lists, where the 
map is

keyed by the tool.

Thanks. I will try that.

# record time for each message matching the specified message for 
each tool

messageTimes = {}

You're building a dictionary;  are you actually using the value (1), or
  is only the key relevant?

Only the keys.


A set is a dict without a value.
Yes, I could use a set, but I don't think that would make it 
measurably faster.


But more mportantly, you never look up anything in this dictionary.  
So why

isn't it a list?  For that matter, why don't you just use the
messageTimes list?

Yes, it could be a list too.

for row in cdata:   # tool, time, message
 if self.message in row[2]:
 messageTimes[row[0], row[1]] = 1
# now pull out each message that is within the time diff for each 
matched message

# as well as the matched messages themselves
def determine(tup):
 if self.message in tup[2]: return True  # matched message
 for (tool, date_time) in messageTimes:
 if tool == tup[0]:
 if abs(date_time-tup[1]) <= tdiff:
return True
 return False
 cdata[:] = [tup for tup in cdata if determine(tup)]



As the code exists, there's no need to copy the list.  Just do a simple
bind.
This statement is to remove the items from cdata that I don't want. I 
don't know what you mean by bind. I'm not familiar with that python 
function.





This code works, but it takes way too long to run - e.g. when cdata 
has 600,000 elements (which is typical for my app) it takes 2 hours 
for this to run.

Can anyone give me some suggestions on speeding this up?



This code probably is not faster but it's simpler and may be easier 
for you to work with

to experiment with speed-improving changes:


diffrng = 1

L = [
 # id, time, string
 (1, 5, "ok"),
 (1, 6, "ok"),
 (1, 7, "no"),
 (1, 8, "no"),
 ]

match_times = [t[1] for t in L if "ok" in t[2]]

def in_range(timeval):
return bool( min([abs(timeval-v) for v in match_times]) <= diffrng )

print([t for t in L if in_range(t[1])])


But it really sounds like you could look into optimizing the db
query and db indexes, etc.




Actually, it might be slower.. this version of in_range should be better:

def in_range(timeval):
return any( abs(timeval-v) <= diffrng for v in match_times )



--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

--
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 11:43 AM, larry.mart...@gmail.com
 wrote:
> It came from a database. Originally I was getting just the data I wanted 
> using SQL, but that was taking too long also. I was selecting just the 
> messages I wanted, then for each one of those doing another query to get the 
> data within the time diff of each. That was resulting in tens of thousands of 
> queries. So I changed it to pull all the potential matches at once and then 
> process it in python.

Then the best thing to do is figure out how to solve your problem in
SQL. Any decent database engine will be able to optimize that
beautifully, and without multiple recursive searches. You may need to
create an index, but maybe not even that.

I can't speak for other engines, but PostgreSQL has an excellently
helpful mailing list, if you have problems with that side of it. But
have a shot at writing the SQL; chances are it'll work out easily.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Keeping a Tkinter GUI alive during a long running process

2012-12-20 Thread Kevin Walzer
I maintain a Tkinter application that's a front-end to to a package 
manger, and I have never been able to find a way to keep the app from 
locking up at some point during the piping in of the package manager's 
build output into a text widget. At some point the buffer is overwhelmed 
and the app simply can't respond anymore, or writes data to the text 
widget after locking up for a period.


I've long used the typical Tkinter design pattern of opening a pipe to 
the external command, and letting it do its thing. However, after a 
time, this locks up the app. If I try to throttle the buffer with some 
combination of "update" or "after" or "update_idletasks," that keeps the 
data flowing, but it comes in too slowly and keeps flowing in long after 
the external process has terminated.


Below is a sample function that illustrates how I approach this issue. 
Can someone suggest a better approach?


 #install a fink package
def installPackage(self):

self.package = self.infotable.getcurselection()
if not self.package:
showwarning(title='Error', message='Error', detail='Please 
select a package name.', parent=self)

return
else:
self.clearData()
self.packagename = self.package[0][1]
self.status.set('Installing %s' % self.packagename)
self.setIcon(self.phynchronicity_install)
self.playSound('connect')
self.showProgress()
self.file = Popen('echo %s | sudo -S %s -y install %s' % 
(self.passtext, self.finkpath.get(), self.packagename), shell=True, 
bufsize=0, stdout=PIPE).stdout

for line in self.file:
self.inserturltext(line)
self.after(5000, self.update_idletasks)

--
Kevin Walzer
Code by Kevin
http://www.codebykevin.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread Dave Angel
On 12/20/2012 08:46 PM, larry.mart...@gmail.com wrote:
> On Thursday, December 20, 2012 6:17:04 PM UTC-7, Dave Angel wrote:
>> 
> Of course it's a fragment - it's part of a large program and I was just 
> showing the relevant parts. 
But it seems these are methods in a class, or something, so we're
missing context.  And you use self without it being an argument to the
function.  Like it's a global.
> 
> Yes, the code works. I end up with just the rows I want.
>> Are you only concerned about speed, not fixing features?  
> Don't know what you mean by 'fixing features'. The code does what I want, it 
> just takes too long.
>
>> As far as I can tell, the logic that includes the time comparison is bogus.  
> Not at all. 
>
>> You don't do  anything there to worry about the value of tup[2], just 
>> whether some
>> item has a nearby time.  Of course, I could misunderstand the spec.
> The data comes from a database. tup[2] is a datetime column. tdiff comes from 
> a datetime.timedelta() 
I thought that tup[1] was the datetime.  In any case, the loop makes no
sense to me, so I can't really optimize it, just make suggestions.
>
>> Are you making a global called 'self' ?  That name is by convention only
>> used in methods to designate the instance object.  What's the attribute
>> self?
> Yes, self is my instance object. self.message contains the string of interest 
> that I need to look for. 
>
>> Can cdata have duplicates, and are they significant? 
> No, it will not have duplicates.
>
>> Is the list sorted in any way?
> Yes, the list is sorted by tool and datetime.
>
>> Chances are your performance bottleneck is the doubly-nested loop.  You
>> have a list comprehension at top-level code, and inside it calls a
>> function that also loops over the 600,000 items.  So the inner loop gets
>> executed 360 billion times.  You can cut this down drastically by some
>> judicious sorting, as well as by having a map of lists, where the map is
>> keyed by the tool.
> Thanks. I will try that.

So in your first loop, you could simply split the list into separate
lists, one per tup[0] value, and the lists as dictionary items, keyed by
that tool string.

Then inside the determine() function, make a local ref to the particular
list for the tool.
   recs = messageTimes[tup[0]]

Instead of a for loop over recs, use a binary search to identify the
first item that's >= date_time-tdiff.  Then if it's less than
date_time+tdiff, return True, otherwise False.  Check out the bisect
module.  Function bisect_left() should do what you want in a sorted list.


>>> cdata[:] = [tup for tup in cdata if determine(tup)]
>>
>>
>> As the code exists, there's no need to copy the list.  Just do a simple
>> bind.
> This statement is to remove the items from cdata that I don't want. I don't 
> know what you mean by bind. I'm not familiar with that python function. 

Every "assignment" to a simple name is really a rebinding of that name.

cdata = [tup for tup in cdata if determine(tup)]

will rebind the name to the new object, much quicker than copying.  If
this is indeed a top-level line, it should be equivalent.  But if in
fact this is inside some other function, it may violate some other
assumptions.  In particular, if there are other names for the same
object, then you're probably stuck with modifying it in place, using
slice notation.

BTW, a set is generally much more memory efficient than a dict, when you
don't use the "value".  But since I think you'll be better off with a
dict of lists, it's a moot point.

-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with making my code more efficient

2012-12-20 Thread Roy Smith
In article ,
 "larry.mart...@gmail.com"  wrote:

> On Thursday, December 20, 2012 5:38:03 PM UTC-7, Chris Angelico wrote:
> > On Fri, Dec 21, 2012 at 11:19 AM, larry.mart...@gmail.com
> > 
> >  wrote:
> > 
> > > This code works, but it takes way too long to run - e.g. when cdata has 
> > > 600,000 elements (which is typical for my app) it takes 2 hours for this 
> > > to run.
> > 
> > >
> > 
> > > Can anyone give me some suggestions on speeding this up?
> > 
> > >
> > 
> > 
> > 
> > It sounds like you may have enough data to want to not keep it all in
> > 
> > memory. Have you considered switching to a database? You could then
> > 
> > execute SQL queries against it.
> 
> It came from a database. Originally I was getting just the data I wanted 
> using SQL, but that was taking too long also. I was selecting just the 
> messages I wanted, then for each one of those doing another query to get the 
> data within the time diff of each. That was resulting in tens of thousands of 
> queries. So I changed it to pull all the potential matches at once and then 
> process it in python. 

If you're doing free-text matching, an SQL database may not be the right 
tool.  I suspect you want to be looking at some kind of text search 
engine, such as http://lucene.apache.org/ or http://xapian.org/.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Brython - Python in the browser

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 1:05 PM, Steven D'Aprano
 wrote:
> On Thu, 20 Dec 2012 18:59:39 -0500, Terry Reedy wrote:
>> What Python does have is 11 versions of the augmented assignment
>> statement: +=, -=, *=, /=, //=, %=, **=, >>=, <<=, &=, ^=, |=. Moreover,
>> these are *intended* to be implemented in place, by mutation, for
>> mutable objects, with possibly class-specific meanings.
>
> I don't believe that is the case. The problem is that augmented
> assignment that mutates can be rather surprising to anyone who expects
> "a += b" to be a short cut for "a = a + b".

This is confusing only because it violates the principle that exists
with methods, that it _either_ mutates _or_ returns. The augmented
assignment operators must return, and in some cases also mutate, hence
confusion.

> One might even have a class where (say) __iadd__ is defined but __add__
> is not.

That would be plausible, if it had an easy way to clone an object.
This would in fact be my preferred way to do things if the clone
operation is expensive - such as in this case. Adding two DOM trees
could be prohibitively expensive (if the tree is deep), but parenting
a tree to another is cheap.

>> <= is a comparison expression operator, which is completely different.
>
> <= is a comparison operator for ints, floats, strings, lists, ... but not
> necessarily for *everything*. That's the beauty and horror of operator
> overloading. Any operator can mean anything.
>
> If it were intended to only return a flag, then 1) Python would enforce
> that rule, and 2) the numpy people would be most upset.

There's a difference between returning a different data type that
makes good sense (compare two arrays and get an array of booleans) and
abusing an operator for its visual characteristics. The former is a
good reason to have the language grant freedom; the latter is proof
that freedom can be used in many ways. I'm not saying it's always
wrong, but it certainly isn't right as often as the other is.

> I have no opinion on the usefulness or sensibility of using <= as an in-
> place mutator method in this context, but I will say that if I were
> designing my own mini-DSL, I would not hesitate to give "comparison
> operators" some other meaning. Syntax should be judged in the context of
> the language you are using, not some other language. If you are using a
> DSL, then normal Python rules don't necessarily apply. <= in particular
> looks just like a left-pointing arrow and is an obvious candidate for
> overloading.

But there's no corresponding => arrow! How
can you make your DSL look like PHP arrays without that vital array
creation operator?

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


redirect standard output problem

2012-12-20 Thread iMath
redirect standard output problem

why the result only print A but leave out 888 ?

import sys
class RedirectStdoutTo:

def __init__(self, out_new):
self.out_new = out_new
def __enter__(self):
sys.stdout = self.out_new
def __exit__(self, *args):
sys.stdout = sys.__stdout__


print('A')
with open('out.log', mode='w', encoding='utf-8') as a_file, 
RedirectStdoutTo(a_file):

print('B')
print('C')

print(888)
-- 
http://mail.python.org/mailman/listinfo/python-list


Pass and return

2012-12-20 Thread iMath
Pass and return
Are these two functions the same ?

def test():
return 
 
def test():
pass
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pass and return

2012-12-20 Thread Mitya Sirenef

On 12/21/2012 12:23 AM, iMath wrote:

Pass and return
Are these two functions the same ?

def test():
return
  
def test():

pass


I believe they are the same, but these statements have
different meanings in other circumstances, e.g.:

Class A(object): pass

def test():
  if x: return
  else: # do something

In first example, (in a class), return would be invalid.

In second example, return would return None from function,
pass would result in continuing execution after if/else block.

Btw you can use disassemble function to look into what
these functions do:

>>> def a(): pass
>>> def b():return
>>> from dis import dis
>>> dis(a)
  1   0 LOAD_CONST   0 (None)
  3 RETURN_VALUE
>>> dis(b)
  1   0 LOAD_CONST   0 (None)
  3 RETURN_VALUE


So indeed they should be the same..

 -m

--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

--
http://mail.python.org/mailman/listinfo/python-list


Re: Pass and return

2012-12-20 Thread Mitya Sirenef

On 12/21/2012 12:23 AM, iMath wrote:

Pass and return
Are these two functions the same ?

def test():
return
  
def test():

pass



From the point of style, of course, the latter is
much better because that's the idiomatic way
to define a no-op function. With a return, it
looks like you might have forgotten to add the
value to return or deleted it by mistake.

 -m

--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

--
http://mail.python.org/mailman/listinfo/python-list


Re: Pass and return

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 4:23 PM, iMath  wrote:
> Pass and return
> Are these two functions the same ?
>
> def test():
> return
>
> def test():
> pass

They're different statements, but in this case they happen to
accomplish the same thing.

The pass statement means "do nothing". For instance:

while input("Enter 5 to continue: ")!="5":
  pass

The return statement means "stop executing this function now, and
return this value, or None if no value".

Running off the end of a function implicitly returns None.

So what you have is one function that stops short and returns None,
and another that does nothing, then returns None. The functions
accomplish exactly the same, as does this:

test = lambda: None

All three compile to the same short block of code - load the constant
None, and return it.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


compile python 3.3 with bz2 support

2012-12-20 Thread Isml
hi, everyone:
 I want to compile python 3.3 with bz2 support on RedHat 5.5 but fail to do 
that. Here is how I do it:
 1. download bzip2 and compile it(make??make -f Makefile_libbz2_so??make 
install)
 2.chang to python 3.3 source directory : ./configure 
--with-bz2=/usr/local/include
 3. make
 4. make install
  
 after installation complete, I test it??
 [root@localhost Python-3.3.0]# python3 -c "import bz2"
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.3/bz2.py", line 21, in 
from _bz2 import BZ2Compressor, BZ2Decompressor
ImportError: No module named '_bz2'

 By the way, RedHat 5.5 has a built-in python 2.4.3. Would it be a problem?-- 
http://mail.python.org/mailman/listinfo/python-list


compile python 3.3 with bz2 support on RedHat 5.5

2012-12-20 Thread Isml
hi, everyone:
I want to compile python 3.3 with bz2 support on RedHat 5.5 but fail to do 
that. Here is how I do it:
1??download bzip2 and compile it(make??make -f Makefile_libbz2_so??make 
install)
2??chang to python 3.3 source directory : ./configure 
--with-bz2=/usr/local/include
3??make
4??make install
 
after installation complete, I test it??
[root@localhost Python-3.3.0]# python3 -c "import bz2"
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.3/bz2.py", line 21, in 
from _bz2 import BZ2Compressor, BZ2Decompressor
ImportError: No module named '_bz2'
 By the way, RedHat 5.5 has a built-in python 2.4.3. Would it be a problem?-- 
http://mail.python.org/mailman/listinfo/python-list


Re: redirect standard output problem

2012-12-20 Thread Chris Angelico
On Fri, Dec 21, 2012 at 4:23 PM, iMath  wrote:
> redirect standard output problem
>
> why the result only print A but leave out 888 ?

No idea, because when I paste your code into the Python 3.3
interpreter or save it to a file and run it, it does exactly what I
would expect. A and 888 get sent to the screen, B and C go to the
file.

What environment are you working in? Python version, operating system,
any little details that just might help us help you.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list