from:"Laszlo Nagy"

Re: 2 + 2 = 5

2012-07-05 Thread Laszlo Nagy


On 2012-07-04 21:37, Paul Rubin wrote:

I just came across this (https://gist.github.com/1208215):

 import sys
 import ctypes
 pyint_p = ctypes.POINTER(ctypes.c_byte*sys.getsizeof(5))
 five = ctypes.cast(id(5), pyint_p)
 print(2 + 2 == 5) # False
 five.contents[five.contents[:].index(5)] = 4
 print(2 + 2 == 5) # True (must be sufficiently large values of 2 there...)

Heh.  The author is apparently anonymous, I guess for good reason.


>>> five.contents[five.contents[:].index(5)] = 4
>>> 5
4
>>> 5 is 4
True

But this I don't understand:

>>> 5+0
4
>>> 5+1
4
>>> 5+2
6


--
http://mail.python.org/mailman/listinfo/python-list

Re: How to safely maintain a status file

2012-07-08 Thread Laszlo Nagy

On Sun, 8 Jul 2012 21:29:41 +1000, Richard Baron Penman 
 declaimed the following in gmane.comp.python.general:

and then on startup read from tmp_file if status_file does not exist.
But this seems awkward.


It also violates your requirement -- since the "crash" could take
place with a partial "temp file".

I'd suggest that, rather than deleting the old status file, you
rename IT -- and only delete it IF you successfully rename the temp
file.
Yes, this is much better. Almost perfect. Don't forget to consult your 
system documentation, and check if the rename operation is atomic or 
not. (Most probably it will only be atomic if the original and the 
renamed file are on the same physical partition and/or mount point).


But even if the rename operation is atomic, there is still a race 
condition. Your program can be terminated after the original status file 
has been deleted, and before the temp file was renamed. In this case, 
you will be missing the status file (although your program already did 
something just it could not write out the new status).


Here is an algorithm that can always write and read a status (but it 
might not be the latest one). You can keep the last two status files.


Writer:
*create temp file, write new status info
* create lock file if needed
* flock it
* try:
*delete older status file
*   rename temp file to new status file
* finally: unlock the lock file

Reader:

* flock the lock file
* try:
*select the newer status file
*   read status info
* finally: unlock the lock file

It is guaranteed that you will always have a status to read, and in most 
cases this will be the last one (because the writer only locks for a 
short time). However, it is still questionable, because your writer may 
be waiting for the reader to unlock, so the new status info may not be 
written immediatelly.


It would really help if you could tell us what are you trying to do that 
needs status.


Best,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy




You are contradicting yourself. Either the OS is providing a fully
atomic rename or it doesn't. All POSIX compatible OS provide an atomic
rename functionality that renames the file atomically or fails without
loosing the target side. On POSIX OS it doesn't matter if the target exists.
This is not a contradiction. Although the rename operation is atomic, 
the whole "change status" process is not. It is because there are two 
operations: #1 delete old status file and #2. rename the new status 
file. And because there are two operations, there is still a race 
condition. I see no contradiction here.


You don't need locks or any other fancy stuff. You just need to make
sure that you flush the data and metadata correctly to the disk and
force a re-write of the directory inode, too. It's a standard pattern on
POSIX platforms and well documented in e.g. the maildir RFC.
It is not entirely true. We are talking about two processes. One is 
reading a file, another one is writting it. They can run at the same 
time, so flushing disk cache forcedly won't help.


--
http://mail.python.org/mailman/listinfo/python-list

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy




Renaming files is the wrong way to synchronize a
crawler.  Use a database that has ACID properties, such as
SQLite.  Far fewer I/O operations are required for small updates.
It's not the 1980s any more.
I agree with this approach. However, the OP specifically asked about 
"how to update status file".

--
http://mail.python.org/mailman/listinfo/python-list

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy




Sorry, but you are wrong. It's just one operation that boils down to
"point name to a different inode". After the rename op the file name
either points to a different inode or still to the old name in case of
an error. The OS guarantees that all processes either see the first or
second state (in other words: atomic).

POSIX has no operation that actually deletes a file. It just has an
unlink() syscall that removes an associated name from an inode. As soon
as an inode has no names and is not references by a file descriptor, the
file content and inode is removed by the operating system. rename() is
more like a link() followed by an unlink() wrapped in a system wide
global lock.

Then please help me understand this.

"Good" case:

process #1:  unlink(old status file)
process #1: rename(new status file)
process#2: open(new status file)
process#2: read(new status file)

"Bad" case:

process #1:  unlink(old status file)
process#2: open(???) -- there is no file on disk here, this system call 
returns with an error!

process #1: rename(new status file)

If it would be possible to rename + unlink in one step, then it would be 
okay. Can you please explain what am I missing?


--
http://mail.python.org/mailman/listinfo/python-list

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy




This is not a contradiction. Although the rename operation is atomic,
the whole "change status" process is not. It is because there are two
operations: #1 delete old status file and #2. rename the new status
file. And because there are two operations, there is still a race
condition. I see no contradiction here.

On Posix systems, you can avoid the race condition.  The trick is to
skip step #1.  The rename will implicitly delete the old file, and
it will still be atomic.  The whole process now consists of a single
stop, so the whole process is now atomic.
Well, I didn't know that this is going to work. At least it does not 
work on Windows 7 (which should be POSIX compatible?)


>>> f = open("test.txt","wb+")
>>> f.close()
>>> f2 = open("test2.txt","wb+")
>>> f2.close()
>>> import os
>>> os.rename("test2.txt","test.txt")
Traceback (most recent call last):
  File "", line 1, in 
WindowsError: [Error 183] File already exists
>>>

I have also tried this on FreeBSD and it worked.

Now, let's go back to the original question:


This works well on Linux but Windows raises an error when status_file already 
exists.


It SEEMS that the op wanted a solution for Windows


--
http://mail.python.org/mailman/listinfo/python-list

Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy




Windows doesn't suppport atomic renames if the right side exists.  I
suggest that you implement two code paths:

if os.name == "posix":
 rename = os.rename
else:
 def rename(a, b):
 try:
 os.rename(a, b)
 except OSError, e:
 if e.errno != 183:
 raise
 os.unlink(b)
 os.rename(a, b)


Problem is if the process is stopped between unlink and rename there
would no status file.
Yes, and actually it does not need to be an abnormal termination. It is 
enough if the OS scheduler puts this process on hold for some time...


But using a lock file, the problem can be solved. However in that case, 
reading a status file can be a blocking operation.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Implicit conversion to boolean in if and while statements

2012-07-16 Thread Laszlo Nagy




...

Traceback (most recent quip last):
   Author: "", line 7, in 
LogicalFallacyError: "Reductio ad absurdum"


Deary deary me Rick. Reductio ad adsurdum is not a fallacy. It is a
counter-argument to an argument or claim, by showing that the premise of
the original claim leads to an absurd conclusion.

You have claimed that we should always be explicit whenever we write. But
you do not actually live up to your own advice, because you can't: it is
absurd to try to be explicit about everything all the time. You have
misunderstood the purpose of the Zen of Python: it is not to claim that
everything should be explicit, but to avoid code that is hard to
understand because things which need to be explicit for clarity are
implied by other parts of your code.


I agree. There is no pont in abolutizing explicitness anyway. It is not 
a yes/no question. You cannot tell that somebody is "not explicit". It 
is not something that can be decided. But you can say that he was "not 
explicit enough" in a concrete case. There is an accepted level of 
explicitness. You can probably always be more expicit, or less explicit. 
Being more explicit is not the goal. But is a good practice to be more 
explicit if it helps you achieve the real goal. For example, writting a 
program that can be maintained easily.


--
http://mail.python.org/mailman/listinfo/python-list

Re: Implicit conversion to boolean in if and while statements

2012-07-16 Thread Laszlo Nagy




This syntax is explicit *enough*. We don't need to be any more
explicit.

But if you are going to argue that "if obj" is *explicit enough*, then
apply your argument consistently to "String"+1.75 also. Why must we be
explicit about string conversion BUT not boolean conversion? Can you
reduce this to the absurd? Or will you just choose to ignore this
valid point?
Not all decisions in Python are based purely on the "explicit enough" 
thing. Some things in the language cannot be explained using 
explicitness alone. So, when we cannot fully explain the ' why 
String+1.75 ' question with statements about explicitness then it 
doesn't mean that anybody lied or wrote something wrong. :-)



--
http://mail.python.org/mailman/listinfo/python-list

Re: Implicit conversion to boolean in if and while statements

2012-07-17 Thread Laszlo Nagy


On 2012-07-17 10:23, Andrew Berg wrote:

I don't want that, but I am suggesting that it would be consistent with
the idea of "something or nothing".
Don't confuse names and objects. You can only test the truth value of 
objects. If you don't have a name in a namespace, then it means you 
don't have a tool to have a reference to anything (including the False 
object).


Using the same logic you could also say that not giving any condition to 
the "if" statement should be evaluated as False:


if:
print "This never gets executed"

But  it makes no sense.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Implicit conversion to boolean in if and while statements

2012-07-17 Thread Laszlo Nagy




Not really. It doesn't quack like anything.

Actually, there is no "it". So we cannot talk about how it quacks. :-D

--
http://mail.python.org/mailman/listinfo/python-list

How to represent dates BC

2012-07-24 Thread Laszlo Nagy


>>> import datetime
>>> old_date = datetime.date(1,12,31)
>>> str(old_date)
'0001-12-31'
>>> one_year = datetime.timedelta(days=365)
>>> str(one_year)
'365 days, 0:00:00'
>>> old_date - 10*one_year
Traceback (most recent call last):
  File "", line 1, in 
OverflowError: date value out of range
>>>


My main problem is that I have an application that stores dates in a 
PostgreSQL database. The PostgreSQL date type is capable of storing 
dates from 4713 BC to 294276 AD.


http://www.postgresql.org/docs/9.2/static/datatype-datetime.html

The application itself stores historical data of events. Apparently, the 
Python datetime.date object cannot handle dates before 1 AD. The 
psycopg2 driver converts date values to date objects. But not in this case:


>>> conn = dbpool.borrow("central")
>>> conn.getqueryvalue("select '1311-03-14 BC'::date")
Traceback (most recent call last):
  File "", line 1, in 
 (some more tracelog here).
data = cur.fetchone()
ValueError: year is out of range
>>>

What is the good solution? I could - in theory - store the dates in a 
text field, but then I won't be able to create incides on dates, 
add/substract with other date values etc.


I could try to always use something like:

select extract(year from date_field) as year,extract(month from 
date_field) as month,extract(day from date_field) as day 


but this is really messy!

What is the good representation here? Should I implement my own date 
type? (I wouldn't want to.)


Thanks,

  Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: How to represent dates BC

2012-07-24 Thread Laszlo Nagy


On 2012-07-24 12:29, Christian Heimes wrote:

Am 24.07.2012 11:55, schrieb Laszlo Nagy:

What is the good representation here? Should I implement my own date
type? (I wouldn't want to.)

JDN [1] is a commonly used format for wide ranges of dates. I've used it
in the past to refer to days BC. PyPI offers a Python module [2] that
looks well written and documented.
It was really useful, thanks. I also figured out how to convert julian 
date to timestamp in postgresql:


|to_date(2455452::text,  'J')
|

So it is possible to create incides.

   L

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Daemon loses file reference after a while

2012-07-24 Thread Laszlo Nagy


On 2012-07-24 14:17, ivdn...@gmail.com wrote:

Hello all,

I have a deamon process that runs for some considerable time (weeks) without 
any problems. At some point it starts throwing the following exception:

   File "/some/path/scheduler.py", line 376, in applyrule
 result = execrule(rule_code)
   File "/some/path/scheduler.py", line 521, in execrule
 rulepath = 
os.path.dirname(__file__)+"/"+'/'.join(rule['modules'])+"/"+rule['rulename']
NameError: name '__file__' is not defined
It is not a direct solution to your problem, but can you save the value 
of os.path.dirname(__file__) into another variable?


--
http://mail.python.org/mailman/listinfo/python-list

Re: Daemon loses file reference after a while.

2012-07-24 Thread Laszlo Nagy




If you use fork(), it drops all file descriptors, and creates new
ones - may be then loss the __file__...?

I don't think this is the case. He wrote that the process runs for weeks 
without problems, and code using __file__ is being executed all the time.

--
http://mail.python.org/mailman/listinfo/python-list

Generating valid identifiers

2012-07-26 Thread Laszlo Nagy

I have a program that creates various database objects in PostgreSQL. 
There is a DOM, and for each element in the DOM, a database object is 
created (schema, table, field, index and tablespace).


I do not want this program to generate very long identifiers. It would 
increase SQL parsing time, and don't look good. Let's just say that the 
limit should be 32 characters. But I also want to recognize the 
identifiers when I look at their modified/truncated names.


So I have come up with this solution:

- I have restricted original identifiers not to contain the dollar sign. 
They can only contain [A-Z] or [a-z] or [0-9] and the underscore. Here 
is a valid example:


"group1_group2_group3_some_field_name"

- I'm trying to use a hash function to reduce the length of the 
identifier when it is too long:


class Connection(object):
# ... more code here
@classmethod
def makename(cls, basename):
if len(basename)>32:
h = hashlib.sha256()
h.update(basename)
tail = base64.b64encode(h.digest(),"_$")[:10]
return basename[:30]+"$"+tail
else:
return basename

Here is the result:

print repr(Connection.makename("some_field_name"))
'some_field_name'
print repr(Connection.makename("group1_group2_group3_some_field_name"))
'group1_group2_group3_some_fiel$AyQVQUXoyf'

So, if the identifier is too long, then I use a modified version, that 
should be unique, and similar to the original name. Let's suppose that 
nobody wants to crack this modified hash on purpose.


And now, the questions:

* Would it be a problem to use CRC32 instead of SHA? (Since security is 
not a problem, and CRC32 is faster.)
* I'm truncating the digest value to 10 characters.  Is it safe enough? 
I don't want to use more than 10 characters, because then it wouldn't be 
possible to recognize the original name.
* Can somebody think of a better algorithm, that would give a bigger 
chance of recognizing the original identifier from the modified one?


Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: Generating valid identifiers

2012-07-26 Thread Laszlo Nagy




* Would it be a problem to use CRC32 instead of SHA? (Since security is
not a problem, and CRC32 is faster.)

What happens if you get a collision?

That is, you have two different long identifiers:

a.b.c.d...something
a.b.c.d...anotherthing

which by bad luck both hash to the same value:

a.b.c.d.$AABB99
a.b.c.d.$AABB99

(or whatever).
Yes, that was the question. How do I avoid that? (Of course I can avoid 
that by using a full sha256 hash value.)

* Can somebody think of a
better algorithm, that would give a bigger chance of recognizing the
original identifier from the modified one?

Rather than truncating the most significant part of the identifier, the
field name, you should truncate the least important part, the middle.

a.b.c.d.e.f.g.something

goes to:

a.b...g.something

or similar.

Yes, this is a good idea. Thank you.


--
http://mail.python.org/mailman/listinfo/python-list

Re: Generating valid identifiers

2012-07-27 Thread Laszlo Nagy




As a side note, the odds of having at least one hash collision among
multiple tables are known as the birthday problem.  At 4 hex digits
there are 65536 possible digests, and it turns out that at 302 tables
there is a >50% chance that at least one pair of those names have the
same 4-digit digest.  That doesn't mean you should be concerned if you
have 302 tables in your Django Oracle database, though, because those
colliding tables also have to match completely in the first 26
characters of their generated names, which is not that common.  If a
collision ever did occur, the resolution would be simple: manually set
the name of one of the offending tables in the model definition.

With 16 ** 10 possible digests, the probability of collision hits 50%
at 1234605 tables.
Thank you for the precise explanation. :-) Well, if Django and Oracle 
uses this, then it can't be a very bad idea. :-)

--
http://mail.python.org/mailman/listinfo/python-list

Re: Generating valid identifiers

2012-07-27 Thread Laszlo Nagy




Unless an attacker can select the field names, in which case they may be
able to improve those odds significantly. In the case of MD5, they can
possibly improve those odds to 1 in 1, since MD5 is vulnerable to
collision attacks. Not so for some (all?) of the SHA hashes, at least not
yet, but they're much more expensive to calculate.

If the OP sticks with his intention to use CRC32, the odds won't be
anywhere near that low. CRC32 is neither collision-resistant nor
cryptographically random, and only generates eight hex digits, not ten.


I'm not affraid of attackers. As I said, nobody will want to "crack" the 
hash. This is for an in-house project. Only the dba can create new 
database objects. If the dba wants to do something wrong, he can simply 
delete whole database. He doesn't need to crack any hash value. :-)


So yes, CRC32 is not collision-resistant, and not cryptographically 
random. But in my case, sha256 is not collision resistant either 
(because I'm using the first few chars of the digest value only). And 
cryptographic randomness is not a requirement.


Given these circumstances, maybe using CRC32 would be fine too.

I wonder what kind of hash Django uses for Oracle.


--
http://mail.python.org/mailman/listinfo/python-list

Re: Generating valid identifiers

2012-07-27 Thread Laszlo Nagy

> With 16 ** 10 possible digests, the probability of collision hits 50% 
at 1234605 tables



Actually, I'm using base64 encoding. So it is 64**10. I guess using 6 
characters will enough.

--
http://mail.python.org/mailman/listinfo/python-list

Re: simplified Python parsing question

2012-07-30 Thread Laszlo Nagy



I appreciate the help because I believe that once this is working, 
it'll make a significant difference in the ability for disabled 
programmers to write code again as well as be able to integrate within 
existing development team and their naming conventions. 


Did you try to use pygments?

http://pygments.org/docs/api/

It already contains a lexer for Python source code. You can create a 
Lexer (pygments.lexer.Lexer) then call its get_tokens method.


Then you can use this to identify statements:

http://docs.python.org/reference/simple_stmts.html

Fortunately, almost all statements begin with a keyword. There are some 
exceptions:


expression statement
assignment statement

I would first tokenize the code, then divide it by statement keywords. 
Finally, you just need to find expression/assignment statements in the 
remaining sections. (Maybe there is a better way to do it.)


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: simplified Python parsing question

2012-07-30 Thread Laszlo Nagy





yeah the problem is also little more complicated than simple parsing 
of Python code. For example, one example (from the white paper)


*meat space blowback = Friends and family [well-meaning attempt]

*could that be parsed by the tools you mention?


It is not valid Python code. Pygments is able to tokenize code that is 
not valid Python code. Because it is not parsing, it is just tokenizing. 
But if you put a bunch of random tokens into a file, then of course you 
will never be able to split that into statements.


Probably, you will need to process ident/dedent tokens, identify the 
"level" of the satement. And then you can tell what file, class, inner 
class, method you are staying in. Inside one "level" or code block, you 
could try to divide the code into statements.


Otherwise, I have no idea how a blind person could navigate in a Python 
source. In fact I have no idea how they use regular programs. So I'm 
affraid I cannot help too much with this. :-(



--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-07-31 Thread Laszlo Nagy

> I think I got it now, if I already just mix the start before another 
add, inside the Process.run it won't see the new data that has been 
added after the start. So this way is perfectly safe only until the 
process is launched, if it's running I need to use some 
multiprocess-aware data structure, is that correct?


Yes. Read this:

http://docs.python.org/library/multiprocessing.html#exchanging-objects-between-processes

You can use Queues and Pipes. Actually, these are basic elements of the 
multiprocessing module and they are well documented. I wonder if you 
read the documentation at all, before posting questions here.



--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy



As I wrote "I found many nice things (Pipe, Manager and so on), but
actually even
this seems to work:" yes I did read the documentation.

Sorry, I did not want be offensive.


I was just surprised that it worked better than I expected even
without Pipes and Queues, but now I understand why..

Anyway now I would like to be able to detach subprocesses to avoid the
nasty code reloading that I was talking about in another thread, but
things get more tricky, because I can't use queues and pipes to
communicate with a running process that it's noit my child, correct?

Yes, I think that is correct. Instead of detaching a child process, you 
can create independent processes and use other frameworks for IPC. For 
example, Pyro.  It is not as effective as multiprocessing.Queue, but in 
return, you will have the option to run your service across multiple 
servers.


The most effective IPC is usually through shared memory. But there is no 
OS independent standard Python module that can communicate over shared 
memory. Except multiprocessing of course, but AFAIK it can only be used 
to communicate between fork()-ed processes.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy





Thanks, there is another thing which is able to interact with running
processes in theory:
https://github.com/lmacken/pyrasite

I don't know though if it's a good idea to use a similar approach for
production code, as far as I understood it uses gdb..  In theory
though I could be able to set up every subprocess with all the data
they need, so I might not even need to share data between them.

Anyway now I had another idea to avoid to be able to stop the main
process without killing the subprocesses, using multiple forks.  Does
the following makes sense?  I don't really need these subprocesses to
be daemons since they should quit when done, but is there anything
that can go wrong with this approach?
On thing is sure: os.fork() doesn't work under Microsoft Windows. Under 
Unix, I'm not sure if os.fork() can be mixed with 
multiprocessing.Process.start(). I could not find official documentation 
on that.  This must be tested on your actual platform. And don't forget 
to use Queue.get() in your test. :-)


--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy





Yes I know we don't care about Windows for this particular project..
I think mixing multiprocessing and fork should not harm, but probably
is unnecessary since I'm already in another process after the fork so
I can just make it run what I want.

Otherwise is there a way to do same thing only using multiprocessing?
(running a process that is detachable from the process that created it)

I'm afraid there is no way to do that. I'm not even sure if 
multiprocessing.Queue will work if you detach a forked process.

--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy


On 2012-08-01 12:39, andrea crotti wrote:

We're having some really obscure problems with gzip.
There is a program running with python2.7 on a 2.6.18-128.el5xen (red
hat I think) kernel.

Now this program does the following:
if filename == 'out2.txt':
  out2 = open('out2.txt')
elif filename == 'out2.txt.gz'
  out2 = open('out2.txt.gz')

Gzip file is binary. You should open it in binary mode.

out2 = open('out2.txt.gz',"b")

Otherwise carriage return and newline characters will be converted (depending 
on the platform).


--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy




The most effective IPC is usually through shared memory. But there is no
OS independent standard Python module that can communicate over shared
memory.

It's true that shared memory is faster than serializing objects over a
TCP connection.  On the other hand, it's hard to imagine anything
written in Python where you would notice the difference.

Well, except in response times. ;-)

The TCP stack likes to wait after you call send() on a socket. Yes, you 
can use setsockopt/TCP_NOWAIT, but my experience is that response times 
with TCP can be long, especially when you have to do many 
request-response pairs.


It also depends on the protocol design - if you can reduce the number of 
request-response pairs then it helps a lot.

--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy




very simple right? But sometimes we get a checksum error.

Do you have a traceback showing the actual error?



  - CRC is at the end of the file and is computed against the whole
file (last 8 bytes)
  - after the CRC there is the \ marker for the EOF
  - readline() doesn't trigger the checksum generation in the
beginning, but only when the EOF is reached
  - until a file is flushed or closed you can't read the new content in it
How do you write the file? Is it written from another Python program? 
Can we see the source code of that?


but the problem is that we can't reproduce it, because doing it
manually on the same files it works perfectly,
and the same files some time work some time don't work.
The problem might be with the saved file. Once you get an error for a 
given file, can you reproduce the error using the same file?


The files are on a shared NFS drive, I'm starting to think that it's a
network/fs problem, which might truncate the file
adding an EOF before the end and thus making the checksum fail..
But is it possible?
Or what else could it be?

Can your try to run the same program on a local drive?
--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy


On 2012-08-01 12:59, Roy Smith wrote:

In article ,
  Laszlo Nagy  wrote:


Yes, I think that is correct. Instead of detaching a child process, you
can create independent processes and use other frameworks for IPC. For
example, Pyro.  It is not as effective as multiprocessing.Queue, but in
return, you will have the option to run your service across multiple
servers.

You might want to look at beanstalk (http://kr.github.com/beanstalkd/).
We've been using it in production for the better part of two years.  At
a 30,000 foot level, it's an implementation of queues over named pipes
over TCP, but it takes care of a zillion little details for you.

Looks very simple to use. Too bad that it doesn't work on Windows systems.
--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy



- The file is written with the linux gzip program.
- no I can't reproduce the error with the same exact file that did
failed, that's what is really puzzling,

How do you make sure that no process is reading the file before it is 
fully flushed to disk?


Possible way of testing for this kind of error: before you open a file, 
use os.stat to determine its size, and write out the size and the file 
path into a log file. Whenever an error occurs, compare the actual size 
of the file with the logged value. If they are different, then you have 
tried to read from a file that was growing at that time.


Suggestion: from the other process, write the file into a different file 
(for example, "file.gz.tmp"). Once the file is flushed and closed, use 
os.rename() to give its final name. On POSIX systems, the rename() 
operation is atomic.




   there seems to be no clear pattern and just randmoly fails. The file
is also just open for read from this program,
   so in theory no way that it can be corrupted.
Yes, there is. Gzip stores CRC for compressed *blocks*. So if the file 
is not flushed to the disk, then you can only read a fragment of the 
block, and that changes the CRC.


   I also checked with lsof if there are processes that opened it but
nothing appears..
lsof doesn't work very well over nfs. You can have other processes on 
different computers (!) writting the file. lsof only lists the processes 
on the system it is executed on.


- can't really try on the local disk, might take ages unfortunately
(we are rewriting this system from scratch anyway)



--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy





Thanks a lotl, someone that writes on the file while reading might be
an explanation, the problem is that everyone claims that they are only
reading the file.
If that is true, then make that file system read only. Soon it will turn 
out who is writing them. ;-)


Apparently this file is generated once and a long time after only read
by two different tools (in sequence), so this could not be possible
either in theory.. I'll try to investigate more in this sense since
it's the only reasonable explation I've found so far.

Safe solution would be to develop a system where files go through 
"states" in a predefined order:


* allow programs to write into files with .incomplete extension.
* allow them to rename the file to .complete.
* create a single program that renames .complete files to .gz files 
AFTER making them read-only for everybody else.

* readers should only read .gz file
* .gz files are then guaranteed to be complete.


--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy




things get more tricky, because I can't use queues and pipes to
communicate with a running process that it's noit my child, correct?


Yes, I think that is correct.

I don't understand why detaching a child process on Linux/Unix would
make IPC stop working.  Can somebody explain?

It is implemented with shared memory. I think (although I'm not 100% 
sure) that shared memory is created *and freed up* (shm_unlink() system 
call) by the parent process. It makes sense, because the child processes 
will surely die with the parent. If you detach a child process, then it 
won't be killed with its original parent. But the shared memory will be 
freed by the original parent process anyway. I suspect that the child 
that has mapped that shared memory segment will try to access a freed up 
resource, do a segfault or something similar.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy




Yes, I think that is correct.

I don't understand why detaching a child process on Linux/Unix would
make IPC stop working.  Can somebody explain?

It is implemented with shared memory. I think (although I'm not 100% 
sure) that shared memory is created *and freed up* (shm_unlink() 
system call) by the parent process. It makes sense, because the child 
processes will surely die with the parent. If you detach a child 
process, then it won't be killed with its original parent. But the 
shared memory will be freed by the original parent process anyway. I 
suspect that the child that has mapped that shared memory segment will 
try to access a freed up resource, do a segfault or something similar.
So detaching the child process will not make IPC stop working. But 
exiting from the original parent process will. (And why else would you 
detach the child?)


--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy




Thanks a lot, that makes a lot of sense..  I haven't given this detail
before because I didn't write this code, and I forgot that there were
threads involved completely, I'm just trying to help to fix this bug.

Your explanation makes a lot of sense, but it's still surprising that
even just reading files without ever writing them can cause troubles
using threads :/
Make sure that file objects are not shared between threads. If that is 
possible. It will probably solve the problem (if that is related to 
threads).

--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy




Make sure that file objects are not shared between threads. If that is
possible. It will probably solve the problem (if that is related to
threads).


Well I just have to create a lock I guess right?
That is also a solution. You need to call file.read() inside an acquired 
lock.

with lock:
 # open file
 # read content

But not that way! Your example will keep the lock acquired for the 
lifetime of the file, so it cannot be shared between threads.


More likely:

## Open file
lock = threading.Lock()
fin = gzip.open(file_path...)
# Now you can share the file object between threads.

# and do this inside any thread:
## data needed. block until the file object becomes usable.
with lock:
data = fin.read() # other threads are blocked while I'm reading
## use your data here, meanwhile other threads can read


--
http://mail.python.org/mailman/listinfo/python-list

Re: Pass data to a subprocess

2012-08-01 Thread Laszlo Nagy




I still don't get it.  shm_unlink() works the same way unlink() does.
The resource itself doesn't cease to exist until all open file handles
are closed. From the shm_unlink() man page on Linux:

The operation of shm_unlink() is analogous to unlink(2): it
removes a shared memory object name, and, once all processes
have unmapped the object, de-allocates and destroys the
contents of the associated memory region. After a successful
shm_unlink(), attempts to shm_open() an object with the same
name will fail (unless O_CREAT was specified, in which case a
new, distinct object is created).

Even if the parent calls shm_unlink(), the shared-memory resource will

continue to exist (and be usable) until all processes that are holding
open file handles unmap/close them.  So not only will detached
children not crash, they'll still be able to use the shared memory
objects to talk to each other.

I stand corrected. It should still be examined, what kind shared memory 
is used under non-linux systems. System V on AIX? And what about 
Windows? So maybe the general answer is still no. But I guess that the 
OP wanted this to work on a specific system.


Dear Andrea Crotti! Please try to detach two child processes, exit from 
the main process, and communicate over a multiprocessing queue. It will 
possibly work. Sorry for my bad advice.

--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-02 Thread Laszlo Nagy



Technically, that is correct, but IMHO its complete nonsense to share 
the file object between threads in the first place. If you need the 
data in two threads, just read the file once and then share the 
read-only, immutable content. If the file is small or too large to be 
held in memory at once, just open and read it on demand. This also 
saves you from having to rewind the file every time you read it.


Am I missing something?
We suspect that his program reads the same file object from different 
threads. At least this would explain his problem. I agree with you - 
usually it is not a good idea to share a file object between threads. 
This is what I told him the first time. But it is not in our hands - he 
already has a program that needs to be fixed. It might be easier for him 
to protect read() calls with a lock. Because it can be done 
automatically, without thinking too much.

--
http://mail.python.org/mailman/listinfo/python-list

Re: CRC-checksum failed in gzip

2012-08-02 Thread Laszlo Nagy




One last thing I would like to do before I add this fix is to actually
be able to reproduce this behaviour, and I thought I could just do the
following:

import gzip
import threading


class OpenAndRead(threading.Thread):
 def run(self):
 fz = gzip.open('out2.txt.gz')
 fz.read()
 fz.close()


if __name__ == '__main__':
 for i in range(100):
 OpenAndRead().start()


But no matter how many threads I start, I can't reproduce the CRC
error, any idea how I can try to help it happening?
Your example did not share the file object between threads. Here an 
example that does that:


class OpenAndRead(threading.Thread):
def run(self):
global fz
fz.read(100)

if __name__ == '__main__':
   fz = gzip.open('out2.txt.gz')
   for i in range(10):
OpenAndRead().start()

Try this with a huge file. And here is the one that should never throw 
CRC error, because the file object is protected by a lock:


class OpenAndRead(threading.Thread):
def run(self):
global fz
global fl
with fl:
fz.read(100)

if __name__ == '__main__':
   fz = gzip.open('out2.txt.gz')
   fl = threading.Lock()
   for i in range(2):
OpenAndRead().start()



The code in run should be shared by all the threads since there are no
locks, right?
The code is shared but the file object is not. In your example, a new 
file object is created, every time a thread is started.


--
http://mail.python.org/mailman/listinfo/python-list

Re: I thought I understood how import worked...

2012-08-07 Thread Laszlo Nagy


On 2012-08-07 15:55, Ben Finney wrote:

Roy Smith  writes:


So, it appears that you *can* import a module twice, if you refer to
it by different names! This is surprising.

The tutorial is misleading on this. It it says plainly:

 A module can contain executable statements as well as function
 definitions. […] They are executed only the *first* time the module
 is imported somewhere.

 http://docs.python.org/tutorial/modules.html>

but it doesn't make clear that a module can exist in the ‘sys.modules’
list multiple times under different names.
sys.modules is a dict. But yes, there can be multiple "instances" of the 
same module loaded.


What I do with bigger projects is that I always use absolute module 
names. For example, when I develop a project called "project1" that has 
several sub packages, then I always do these kinds of imports:


from project1.package1.subpackage2.submodule3 import  *
from project1.package1.subpackage2 import  submodule3
from project1.package1.subpackage2.submodule3 import  some_class

Even from a source file that is inside project1.package1.subpackage2, I 
tend to import them the same way. This makes sure that every module is 
imported under the same package path.


You just need to make sure that the main project has a unique name 
(which is usually the case) and that it is on your sys path (which is 
usually the case, especially when the script is started in the project's 
directory).


The cost is that you have to type more. The benefit is that you can be 
sure that you are importing the thing that you want to import, and there 
will be no multiple imports for the same module.


Mabye somebody will give method that works even better.

For small projects without sub-packages, it is not a problem.

Best,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: I thought I understood how import worked...

2012-08-07 Thread Laszlo Nagy

On 2012-08-08 06:14, Ben Finney wrote:

Cameron Simpson  writes:

All of you are saying "two names for the same module", and variations
thereof. And that is why the doco confuses.

I would expect less confusion if the above example were described as
_two_ modules, with the same source code.

That's not true though, is it? It's the same module object with two
different references, I thought.

They are not the same. Proof:

$ mkdir test
$ cd test
$ touch __init__.py
$ touch m.py
$ cd ..
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.append('test')
>>> import m
>>> from test import m
>>> import m
>>> from test import m as m2
>>> m is m2
False
>>> m.a = 3
>>> m2.a
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'a'

So it is still true that top level code gets executed only once, when 
the module is first imported. The trick is that a module is not a file. 
It is a module object that is created from a file, with a name. If you 
change the name, then you create ("import") a new module.

You can also use the reload() function to execute module level code 
again, but it won't create a new module object. It will just update the 
contents of the very same module object:

What is more interesting is how the reload() function works:

Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test.m
>>> a = test.m
>>> import os
>>> test.m is a
True
>>> os.system("echo \"import sys\" >> test/m.py")
0
>>> reload(test.m) # Updates the module object

>>> test.m is a # They are still the same
True
>>> a.sys # So a.sys is a exist

>>>

--
http://mail.python.org/mailman/listinfo/python-list

Re: type(None)()

2012-08-16 Thread Laszlo Nagy


On 2012-08-16 14:47, Hans Mulder wrote:

On 8/08/12 04:14:01, Steven D'Aprano wrote:

NoneType raises an error if you try to create a second instance. bool
just returns one of the two singletons (doubletons?) again.

py> type(None)()
Traceback (most recent call last):
   File "", line 1, in 
TypeError: cannot create 'NoneType' instances

Why is that?
Because None is a singleton. It is the only instance of its class. This 
is very useful because it allows you to write conditions like this:


if obj is None:
do_something()


--
http://mail.python.org/mailman/listinfo/python-list

Re: Why doesn't Python remember the initial directory?

2012-08-19 Thread Laszlo Nagy


On 2012-08-19 22:42, kj wrote:


As far as I've been able to determine, Python does not remember
(immutably, that is) the working directory at the program's start-up,
or, if it does, it does not officially expose this information.

Does anyone know why this is?  Is there a PEP stating the rationale
for it?

Thanks!
When you start the program, you have a current directory. When you 
change it, then it is changed. How do you want Python to remember a 
directory? For example, you can put it into a variable, and use it 
later. Can you please show us some example code that demonstrates your 
actual problem?

--
http://mail.python.org/mailman/listinfo/python-list

Re: Guarding arithmetic

2012-08-23 Thread Laszlo Nagy




That can work ONLY if the division of 1/0 doesn't raise an exception.
This is why the concept of NaN exists; I'm not sure if there's a way
to tell Python to return NaN instead of bombing, but it's most likely
only possible with floating point, not integer.
For integers, Python will always raise an exception when you try to 
divide by zero. And integers has nothing to do with NaN. Because NaN is 
meaningful for floating point numbers only. Python can be compiled to 
raise floating point exceptions. (On Python 2, this is a compile time 
option: FPECTL. On Python 3, this can be configured runtime:  
http://docs.python.org/library/fpectl.html )




--
http://mail.python.org/mailman/listinfo/python-list

Re: Guarding arithmetic

2012-08-23 Thread Laszlo Nagy


On 2012-08-23 11:05, Mark Carter wrote:

Suppose I want to define a function "safe", which returns the argument passed 
if there is no error, and 42 if there is one. So the setup is something like:

def safe(x):
# WHAT WOULD DEFINE HERE?

print safe(666) # prints 666
print safe(1/0) # prints 42

I don't see how such a function could be defined. Is it possible?
You are very vague. "There is an error" - but what kind of error? To 
catch all possible exceptions you could do:


def unsafe(x):
# put your code here...

def safe(x):
try:
return unsafe(x)
except:
return 42

Generally, it is a bad idea. Exception handlers were invented because 
they give you a way to handle any error in the call chain. When an 
exception occurs, the interpreter will start searching for an 
appropriate exception handler traversing up in the call chain. By 
converting exceptions into return values, you are bypassing this search. 
Then you will have to write conditions instead of exception handlers 
inside ALL methods in the call chain, creating a "manual" search for the 
handler of the exception. In most cases, this will make your code 
difficult, error prone and hard to read.


In some special cases, this can be a good idea to do.

Can you please let us know when and how would you like to use it?


--
http://mail.python.org/mailman/listinfo/python-list

Re: Guarding arithmetic

2012-08-23 Thread Laszlo Nagy




def safe(deferred, default=42, exception=Exception):

... try:
... return deferred()
... except exception:
... return default


What a beautiful solution! I was wondering if the following would be 
possible:



def test(thing, default, *exc_classes):
try:
thing()
except *exc_classes:
return default


But it is syntactically invalid.

Here is a workaround that is not so beautiful:


def test(thing, default, *exc_classes):
try:
thing()
except Exception, e:
for cls in exc_classes:
if isinstance(e,cls):
return default
raise

print test( (lambda: 1/0), -1, ValueError, ZeroDivisionError) # prints -1


--
http://mail.python.org/mailman/listinfo/python-list

Re: Does a wxPython program not run on 64bit Windows?

2012-08-24 Thread Laszlo Nagy


On 2012-08-24 07:37, Levi Nie wrote:

Does  a wxPython  program not run on 64bit Windows?
Did you at least try to download wxPython? Because the download page 
shows the 64bit and the 32bit versions as well. :-)


http://wxpython.org/download.php

By the way, the 32bit version will gladly run on a 64bit Windows. 
However, you will have to install the 32bit version of Python for that. 
(And yes, 32bit Python also runs on 64bit Windows.)


wxPython also runs on OS X, and probably on any platform that runs gtk. 
(Practically, all popular unices.)

--
http://mail.python.org/mailman/listinfo/python-list

Re: something about split()???

2012-08-24 Thread Laszlo Nagy


On 2012-08-15 07:33, Ramchandra Apte wrote:

filter is bad when you use lambda with it
there are (good) cases for filter

On 14 August 2012 22:39, Jean-Michel Pichavant > wrote:


Ramchandra Apte wrote:

(Much) more Pythonic solution:
>>> filter(None,"|".split("|"))

On 14 August 2012 15:14, Andreas Tawn
mailto:andreas.t...@ubisoft.com>
>> wrote:

> I have a question about the split function? surpose a =
"|",and
when I use a.split("|") , I got the list
> ['"",""] ,but I want to get the empty list,what should I
do ?



Too many top posters again :-(

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Extract Text Table From File

2012-08-27 Thread Laszlo Nagy


On 2012-08-27 11:53, Huso wrote:

Hi,

I am trying to extract some text table data from a log file. I am trying 
different methods, but I don't seem to get anything to work. I am kind of new 
to python as well. Hence, appreciate if someone could help me out.


#
# Write test data to test.txt
#

data = """
ROUTES TRAFFIC RESULTS, LSR
TRG  MP   DATE   TIME
 37  17 120824   

R TRAFF   NBIDS   CCONG   NDV  ANBLO   MHTIME  NBANSW
AABBCCO 6.4 204 0.0   1151.0113.4 144
AABBCCI 3.0 293   1151.0 37.0 171
DDEEFFO 0.2   5 0.0590.0107.6   3
EEFFEEI 0.0   0590.0  0.0   0
HHGGFFO 0.0   0 0.0300.0  0.0   0
HHGGFFI 0.3  15300.0 62.2   4
END
"""
fout = open("test.txt","wb+")
fout.write(data)
fout.close()

#
# This is how you iterate over a file and process its lines
#
fin = open("test.txt","r")
for line in fin:
# This is one possible way to extract values.
values = line.strip().split()
print values


This will print:

[]
['ROUTES', 'TRAFFIC', 'RESULTS,', 'LSR']
['TRG', 'MP', 'DATE', 'TIME']
['37', '17', '120824', '']
[]
['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 'MHTIME', 'NBANSW']
['AABBCCO', '6.4', '204', '0.0', '115', '1.0', '113.4', '144']
['AABBCCI', '3.0', '293', '115', '1.0', '37.0', '171']
['DDEEFFO', '0.2', '5', '0.0', '59', '0.0', '107.6', '3']
['EEFFEEI', '0.0', '0', '59', '0.0', '0.0', '0']
['HHGGFFO', '0.0', '0', '0.0', '30', '0.0', '0.0', '0']
['HHGGFFI', '0.3', '15', '30', '0.0', '62.2', '4']
['END']


The "values" list in the last line contains these values. This will work 
only if you don't have spaces in your values. Otherwise you can use 
regular expressions to parse a line. See here:


http://docs.python.org/library/re.html

Since you did not give any specification on your file format, it would 
be hard to give a concrete program that parses your file(s)


Best,

Laszlo



--
http://mail.python.org/mailman/listinfo/python-list

Re: Extract Text Table From File

2012-08-27 Thread Laszlo Nagy




Hi,

Thank you for the information.
The exact way I want to extract the data is like as below.

TRG, MP and DATE and TIME is common for that certain block of traffic.
So I am using those and dumping it with the rest of the data into sql.
Table will have all headers (TRG, MP, DATE, TIME, R, TRAFF, NBIDS, CCONG, NDV, 
ANBLO, MHTIME, NBANSW).

So from this text, the first data will be 37, 17, 120824, , AABBCCO, 6.4, 
204, 0.0, 115, 1.0, 113.4, 144.
How many blocks do you have in a file? Do you want to create different 
data sets for those blocks? How do you identify those blocks? (E.g. are 
they all saved into the same database table the same way?)


Anyway here is something:

import re
# AABBCCO 6.4 204 0.0   1151.0113.4 144
pattern = re.compile(r"""([A-Z]{7})"""+7*r"""\s+([\d\.]+)""")

#
# This is how you iterate over a file and process its lines
#
fin = open("test.txt","r")
blocks = []
block = None
for line in fin:
# This is one possible way to extract values.
values = line.strip().split()
if values==['R', 'TRAFF', 'NBIDS', 'CCONG', 'NDV', 'ANBLO', 
'MHTIME', 'NBANSW']:

if block is not None:
blocks.append(block)
block = []
elif block is not None:
res = pattern.match(line.strip())
if res:
values = list(res.groups())
values[1:] = map(float,values[1:])
block.append(values)
if block is not None:
blocks.append(block)

for idx,block in enumerate(blocks):
print "BLOCK",idx
for values in block:
print values

This prints:

BLOCK 0
['AABBCCO', 6.4, 204.0, 0.0, 115.0, 1.0, 113.4, 144.0]
['DDEEFFO', 0.2, 5.0, 0.0, 59.0, 0.0, 107.6, 3.0]
['HHGGFFO', 0.0, 0.0, 0.0, 30.0, 0.0, 0.0, 0.0]

--
http://mail.python.org/mailman/listinfo/python-list

Re: Extract Text Table From File

2012-08-27 Thread Laszlo Nagy


On 2012-08-27 13:23, Huso wrote:

Hi,

There can be any number of blocks in the log file.
I distinguish the block by the start header 'ROUTES TRAFFIC RESULTS, LSR' and 
ending in 'END'. Each block will have a unique [date + time] value.

I tried the code you mentioned, it works for the data part.
But I need to get the TRG, MP, DATE and TIME for the block with those data as 
well. This is the part that i'm really tangled in.

Thanking,
Huso
Well, I suggest that you try to understand my code and make changes in 
it. It is not too hard. First you start reading documentation of the 
"re" module. It is worth learning Python. Especially for mining data out 
of text files. :-)


Best,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Looking for an IPC solution

2012-08-31 Thread Laszlo Nagy

There are just so many IPC modules out there. I'm looking for a solution 
for developing a new a multi-tier application. The core application will 
be running on a single computer, so the IPC should be using shared 
memory (or mmap) and have very short response times. But there will be a 
tier that will hold application state for clients, and there will be 
lots of clients. So that tier needs to go to different computers. E.g. 
the same IPC should also be accessed over TCP/IP. Most messages will be 
simple data structures, nothing complicated. The ability to run on PyPy 
would, and also to run on both Windows and Linux would be a plus.


I have seen a stand alone cross platform IPC server before that could 
serve "channels", and send/receive messages using these channels. But I 
don't remember its name and now I cannot find it. Can somebody please help?


Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: Looking for an IPC solution

2012-08-31 Thread Laszlo Nagy


Zeromq (suggested by someone) is an option since it's pretty fast for
most purposes, but I don't think it uses shared memory.

Interesting question. The documentation says:

http://api.zeromq.org/2-1:zmq-ipc

The inter-process transport is currently only implemented on operating 
systems that provide UNIX domain sockets.


(OFF: Would it be possible to add local IPC support for Windows using 
mmap()? I have seen others doing it.)


At least, it is functional on Windows, and it excels on Linux. I just 
need to make transports configureable. Good enough for me.

The closest
thing I can think of to what you're asking is MPI, intended for
scientific computation.  I don't know of general purpose IPC that uses
it though I've thought it would be interesting.  There are also some
shared memory modules around, including POSH for shared objects, but
they don't switch between memory and sockets AFAIK.

Based on your description, maybe what you really want is Erlang, or
something like it for Python.  There would be more stuff to do than just
supply an IPC library.
Yes, although I would really like to do this job in Python. I'm going to 
make some tests with zeromq. If the speed is good for local 
inter-process communication, then I'll give it a try.


Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Async client for PostgreSQL?

2012-08-31 Thread Laszlo Nagy

Is there any extension for Python that can do async I/O for PostgreSQL 
with tornadoweb's ioloop?


Something like:

class MainHandler(tornado.web.RequestHandler):
@tornado.web.asynchronous
def get(self):

pg_connection.(long_taking_query_sql,params,callback=self.on_query_opened)

def on_query_opened(self, query):
self.write(process_rows(query))
self.finish()



What would be an alternative?

The theoretical problem: suppose there are 100 clients (web browsers) 
connected to the server with keep alive connections. They are doing 
long-polls and they are also sending/receiving events (with short 
response times). Each web browser has an associated state stored on the 
server side, in the memory (as an object tree). The state is bound to 
the client with a session id. Most requests will have to be responded 
with small amounts of data, calculated from the session state, or 
queried from the database. Most database queries are simple, running for 
about 100msec. But a few of them will run for 1sec or more. Number of 
requests ending in database queries is relatively low (10/sec). Other 
requests can be responded must faster.  but they are much more frequent 
(100/sec, that is. 1 request/sec/client).  There is a big global cache 
full of (Python) objects. Their purpose is to reduce the number of 
database queries. These objects in the global cache are emitting events 
to other objects found in the client sessions. Generally, it is not 
possible to tell what request will end in a database query.


Multi-threading is not an option because number of clients is too high 
(running 100 threads is not good). This is why I decided to use anyc 
I/O. Tornadoweb looks good for most requirements: async i/o, store 
session state in objects etc. The biggest problem is that psycopg is not 
compatible with this model. If I use blocking I/O calls inside a request 
handler, then they will block all other requests most of the time, 
resulting in slow response times.


What would be a good solution for this?

Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: to know binary

2012-09-01 Thread Laszlo Nagy


On 2012-09-01 06:15, contro opinion wrote:

there is a only line in the file nanmed test:
1234
when i open it whit  xxd
xxd  test
what i  get is :
000: 3132 3334 0a 1234.
can you explain it ?
At offset zero (00): chr(0x31) + chr(0x32)+ chr(0x33)+ chr(0x33)+ 
chr(0x0a) = '1'+'2'+'3'+'4'+'.' = '1234.'


Does it have to do ANYTHING with Python? If I were you, and it was 
something that must be explained, then I would rather start books about 
programming before asking questions on a mailing list that is not 
related to my question.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Async client for PostgreSQL?

2012-09-01 Thread Laszlo Nagy




Hi

does running on tornado imply that you would not consider twisted 
http://twistedmatrix.com ?


If not, twisted has exactly this capability hiding long running 
queries on whatever db's behind deferToThread().

All right, I was reading its documentation

http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.threads.deferToThread.html

It doesn't tell too much about it: "Run a function in a thread and 
return the result as a Deferred.".


Run a function but in what thread? Does it create a new thread for every 
invocation? In that case, I don't want to use this. My example case: 10% 
from 100 requests/second deal with a database. But it does not mean that 
one db-related request will do a single db API call only. They will 
almost always do more: start transaction, parse and open query, fetch 
with cursor, close query, open another query etc. then commit 
transaction. 8 API calls to do a quick fetch + update (usually under 
100msec, but it might be blocked by another transaction for a while...) 
So we are talking about 80 database API calls per seconds at least. It 
would be insane to initialize a new thread for each invocation. And 
wrapping these API calls into a single closure function is not useful 
either, because that function would not be able to safely access the 
state that is stored in the main thread. Unless you protet it with 
locks. But it is whole point of async I/O server: to avoid using slow 
locks, expensive threads and context switching.


Maybe, deferToThread uses a thread pool? But it doesn't say much about 
it. (Am I reading the wrong documentation?) BTW I could try a version 
that uses a thread pool.


It is sad, by the way. We have async I/O servers for Python that can be 
used for large number of clients, but most external modules/extensions 
do not support their I/O loops. Including the extension modules of the 
most popular databases. So yes, you can use Twisted or torandoweb until 
you do not want to call *some* API functions that are blocking. (By 
*some* I mean: much less blocking than non-blocking, but quite a few.) 
We also have synchronous Python servers, but we cannot get rid of the 
GIL, Python threads are expensive and slow, so they cannot be used for a 
large number of clients. And finally, we have messaging services/IPC 
like zeromq. They are probably the most expensive, but they scale very 
well. But you need more money to operate the underlying hardware. I'm 
starting to think that I did not get a quick answer because my use case 
(100 clients) fall into to the "heavy weight" category, and the solution 
is to invest more in the hardware. :-)


Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: Async client for PostgreSQL?

2012-09-02 Thread Laszlo Nagy



Hmm, I was suggesting that you could replace the whole DB driver with 
a webservice implemented with twisted, if you rule out threads then 
with ampoule doing it with a process pool and consume this webservice 
with the tornado side asynchronously.
I see. I'm sorry, I misunderstood. So this would be a wrapper around 
PostgreSQL. Very interesting idea. (In this case, I'm not sure how to 
safely manage transactions.)


production level example thread pool based DB API:
Just to give you some ballpark figures, I'm running a game server with 
a daily peak of about 1500 parallel permanent connections and 50k 
games played every day (avg game duration 13min, peak request 
frequency close to 100req/sec) with a lot of statistics going into a 
MySQL DB on US$2k worth of hardware. Twisted as basis sitting atop 
FreeBSD, started the latest version in March, its running since then, 
no restarts, no reboots, no problems.

Thank you.
--
http://mail.python.org/mailman/listinfo/python-list

tornado.web ioloop add_timeout eats CPU

2012-09-02 Thread Laszlo Nagy

JavaScript clients (browsers) do long poll requests. Each request can 
take up to 10 seconds before the server responds. On the server side, 
every client has a queue of messages that needs to be sent to the 
client. When the long poll request comes in, the server checks if there 
are messages to be sent out. If there are no outgoing messages, then it 
does not finish the response, but calls ioloop's add_timeout method for 
doing further checks. After 10 seconds (if there are no new messages) 
the server returns 304/not modified. If there is a message, then it is 
sent back to the client as fast as possible, and the client comes back 
with another long poll immediately.


These message queues are used for UI updates and also for instant 
messaging. UI must be responsive. For this reason, any message in the 
outgoing queue should be sent out to the client within 0.1 seconds. 
Sometimes (rarely) lots of messages arrive quickly, and in those cases 
it would be good to send them out even faster. What I did is that in the 
first 0.1 seconds, I call add_timeout with 0.01 seconds. So if the 
outgoing queue is full of messages, then they are delivered quickly.  
After 0.1 seconds lapsed, add_timeout is called with 0.1 sec parameter. 
So the server load is reduced because most clients are inactive, and 
they are going to get callbacks in every 0.1 sec.


Here are the two most important methods of my request handler:

@tornado.web.asynchronous
def post(self):
"""Handle POST requests."""
# Disable caching
self.set_header("Cache-Control","no-cache, must-revalidate")
self.set_header("Expires","Mon, 26 Jul 1997 05:00:00 GMT")
self.poll_start = time.time()
action = self.get_argument("action")
if action=="poll":
self.poll()
elif action=="message":
self.process_incoming(self.get_argument("message"))
else:
self.set_status(400)
self.finish()

def poll(self):
"""Handle POLL request for the browser's message loop.

This method monitors the outgoing message queue, and sends
new messages to the browser when they come in (or until
self.poll_interval seconds elapsed)."""
poll_elapsed = time.time() - self.poll_start
if poll_elapsed<0.1:
poll_step = 0.01
else:
poll_step = 0.1
if poll_elapsedAnd here is my problem. If I point 5 browsers to the server, then I get 
2% CPU load (Intel i5 2.8GHz on amd64 Linux). But why? Most of the time, 
the server should be sleeping. cProfile tells this:


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.000  845.146  845.146 :1()
  1135775  832.2830.001  832.2830.001 {method 'poll' of 
'select.epoll' objects}


I have copied out the two relevant rows only. As you can see, total 
runtime was 845 seconds, and 832 seconds were spent in "epoll". 
Apparently, CPU load goes up linearly as I connect more clients. It 
means that 50 connected clients would do 20% CPU load. Which is 
ridiculous, because they don't do anything but wait for messages to be 
processed. Something terribly wrong, but I cannot figure out what?


Actually I could not try this with 50 clients. If I open 15 clients, 
then the server starts dropping connections. (Tried from Firefox and 
Chrome too.) If I change the poll() method this way:


else:
print "No messages after %.2f seconds"%poll_elapsed
self.set_status(304)
self.finish()

then I see this in the log:

No messages after 10.01 seconds
ERROR:root:Uncaught exception POST /client (127.0.0.1)
HTTPRequest(protocol='http', host='127.0.0.1:', method='POST', 
uri='/client', version='HTTP/1.1', remote_ip='127.0.0.1', 
body='_xsrf=df157469a62142d7b28c5a4880dd8478&action=poll', 
headers={'Referer': 'http://127.0.0.1:/', 'Content-Length': '50', 
'Accept-Language': 'en-us;q=0.8,en;q=0.5', 'Accept-Encoding': 'gzip, 
deflate', 'Host': '127.0.0.1:', 'Accept': '*/*', 'User-Agent': 
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 
Firefox/15.0', 'Connection': 'keep-alive', 'X-Requested-With': 
'XMLHttpRequest', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 
'Cookie': 
'sid="MS1acHd5b3V1WHFOQU1BbTVmSXJEeVhkLys=|1346652787|e045d786fdb89b73220a2c77ef89572d0c16901e"; 
_xsrf=df157469a62142d7b28c5a4880dd8478; 
xsfr=df157469a62142d7b28c5a4880dd8478', 'Content-Type': 
'application/x-www-form-urlencoded; charset=UTF-8'})

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py", 
line 183, in wrapped

callback(*args, **kwargs)
  File "/home/gandalf/Python/Projects/test/client.py", line 67, in poll
self.finish()
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 641, in 
finish

self.request.finish()
  File "/usr/lib/python2.7/dist-packages/tornado/httpserver.py", line 
411, in finish

self.co

Re: tornado.web ioloop add_timeout eats CPU

2012-09-04 Thread Laszlo Nagy




What's wrong is the 1,135,775 calls to "method 'poll' of
'select.epoll' objects".

I was affraid you are going to say that. :-)

With five browsers waiting for messages over 845 seconds, that works
out to each  waiting browser inducing 269 epolls per second.

Almost equally important is what the problem is *not*. The problem is
*not* spending the vast majority of time in epoll; that's *good* news.
The problem is *not* that CPU load goes up linearly as we connect more
clients. This is an efficiency problem, not a scaling problem.

So what's the fix? I'm not a Tornado user; I don't have a patch.
Obviously Laszlo's polling strategy is not performing, and the
solution is to adopt the event-driven approach that epoll and Tornado
do well.
Actually, I have found a way to overcome this problem, and it seems to 
be working. Instead of calling add_timeout from every request, I save 
the request objects in a list, and operate a "message distributor" 
service in the background that routes messages to clients, and finish 
their long poll requests when needed. The main point is that the 
"message distributor" has a single entry point, and it is called back at 
given intervals. So the number of callbacks per second does not increase 
with the number of clients. Now the CPU load is about 1% with one 
client, and it is the same with 15 clients. While the response time is 
the same (50-100msec). It is efficient enough for me.


I understand that most people do a different approach: they do a fast 
poll request from the browser in every 2 seconds or so. But this is not 
good for me, because then it can take 2 seconds to send a message from 
one browser into another that is not acceptable in my case. Implementing 
long polls with a threaded server would be trivial, but a threaded 
server cannot handle 100+ simultaneous (long running) requests, because 
that would require 100+ threads to be running.


This central "message distributor" concept seems to be working. About 
1-2% CPU overhead I have to pay for being able to send messages from one 
browser into another within 100msec, which is fine.


I could have not done this without your help.

Thank you!

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: Error 32 - Broken Pipe . Please Help!!

2012-09-04 Thread Laszlo Nagy


2012.09.04. 19:08 keltezéssel, Sreenath k írta:

Error:


Exception in thread Thread-1:
Traceback (most recent call last):
   File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
 self.run()
   File 
"/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/monitor.py", 
line 575, in run
 already_pickled=True)
   File "/usr/lib/python2.7/dist-packages/spyderlib/utils/bsdsocket.py", line 
24, in write_packet
 sock.send(struct.pack("l", len(sent_data)) + sent_data)
error: [Errno 32] Broken pipe

Code :

#code

s=1
f=0
c=0

for i in range (1,100):
 c=0
 for j in (1,i):
 s+=j
 c=0
 for k in range(1,(s/2+1)):
 #print s
 t=s%k
 if t==0:
 c+=1
 if c>=5:
 f=1
 print s
 break

print s
 
#code ends.


The program is runnning. It has been more than 10 minutes ( currently). Should 
I raise my hopes for an answer ?
It must not be your full program. The traceback shows "Thread-1" which 
indicates that you are using the threading module. The other possibility 
is that you are feeding the output of your program into another program 
with a pipeline. In that case, the exception might have occured in the 
other program, not yours.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Looking for an IPC solution

2012-09-06 Thread Laszlo Nagy






How about the standard multiprocessing module? It supports shared
memory, remote processes, and will most probably work under PyPy:
http://docs.python.org/library/multiprocessing.html


I always thought, that the multiprocessing module does NOT use shared
memory (at least not under windows)
It uses mmap() under windows. (I'm not an expert, but this is what I was 
said by others.) I did not know that multiprocessing can be used over 
TCP/IP. :) Probably I'll use zmq instead, because it has other nice 
features (auto reconnect, publisher/subscriber, multicast etc.)


I would be very interested in a cross platform shared mem solution for
python.
Could you please point me to the right section.


As far as I know, POSIX compatible shared memory does not exist on 
Windows. I remember a thread about this on the PostgreSQL mailing list - 
the Windows version of PostgreSQL somehow emulates shared memory too. I 
wanted to use shared memory because response times are much faster than 
TCP/IP.


--
http://mail.python.org/mailman/listinfo/python-list

Re: Looking for an IPC solution

2012-09-06 Thread Laszlo Nagy

Hi Laszlo,

There aren't a lot of ways to create a Python object in an "mmap" buffer. "mmap" is conducive to arrays of arrays.
For variable-length structures like strings and lists, you need "dynamic allocation". The C functions "malloc" and
"free" allocate memory space, and file creation and deletion routines operate on disk space. However "malloc" doesn't
allow you to allocate memory space within memory that's already allocated. Operating systems don't provide that capability, and doing it
yourself amounts to creating your own file system. If you did, you still might not be able to use existing libraries like the STL or
Python, because one address might refer to different locations in different processes.

One solution is to keep a linked list of free blocks within your "mmap" buffer.
It is prone to slow access times and segment fragmentation. Another solution is to
create many small files with fixed-length names. The minimum file size on your system
might become prohibitive depending on your constraints, since a 4-byte integer could
occupy 4096 bytes on disk or more. Or you can serialize the arguments and return values
of your functions, and make requests to a central process.
I'm not sure about the technical details, but I was said that
multiprocessing module uses mmap() under windows. And it is faster than
TCP/IP. So I guess the same thing could be used from zmq, under Windows.
(It is not a big concern, I plan to operate server on Unix. Some clients
might be running on Windows, but they will use TCP/IP.)

--
http://mail.python.org/mailman/listinfo/python-list

Re: Looking for an IPC solution

2012-09-06 Thread Laszlo Nagy




Probably the fastest I/RPC implementation for Python should be
OmniOrbpy:

http://omniorb.sourceforge.net/

It's cross-platform, language-independent and standard-(Corba-)
compliant.
I don't want to use IDL though. Clients will be written in Python, and 
it would be a waste of time to write IDL files.



I have seen a stand alone cross platform IPC server before that could
serve "channels", and send/receive messages using these channels. But
I don't remember its name and now I cannot find it. Can somebody
please help?

If it's just for "messaging", Spread should be interesting:

http://www.spread.org/

Also cross-platform & language-independent.

Looks promising. This is what I have found about it:

http://stackoverflow.com/questions/35490/spread-vs-mpi-vs-zeromq


So, it really depends on whether you are trying to build a parallel 
system or distributed system. They are related to each other, but the 
implied connotations/goals are different. Parallel programming deals 
with increasing computational power by using multiple computers 
simultaneously. Distributed programming deals with reliable 
(consistent, fault-tolerant and highly available) group of computers.


I don't know the full theory behind distributed programming or parallel 
programming. ZMQ seems easier to use.




--
http://mail.python.org/mailman/listinfo/python-list

Re: submit jobs on multi-core

2012-09-10 Thread Laszlo Nagy


On 2012-09-11 06:16, Dhananjay wrote:

Dear all,

I have a python script in which I have a list of files to input one by 
one and for each file I get a number as an output.

I used for loop to submit the file to script.
My script uses one file at a time and returns the output.

My computers has 8 cores.
Is there any way that I could submit 8 jobs at a time and get all the 
output faster ?
In other words, how can I modify my script so that I could submit 8 
jobs together on 8 different processors ?


I am bit new to this stuff, please suggest me some directions.
You should first look at the multiprocessing module. It is part of the 
standard library.


http://docs.python.org/library/multiprocessing.html


--
http://mail.python.org/mailman/listinfo/python-list

Re: What's wrong with my arc tangens calculation?

2012-09-14 Thread Laszlo Nagy




but when i lookup tg in a paper table (last decade math book) i've got these 
values:

tg(63'10'') = 1.9768
tg(63'20'') = 1.9912
tg(63'30'') = 2.0057

For me python should return something more like 63'2x'' than 63'4x''(becasue 
63'30'' is higher than 2.0)

what's wrong?


63° 30" is 63.5°. So nothing is wrong. (You know, 1° = 60 arc second!)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Moving folders with content

2012-09-17 Thread Laszlo Nagy


On 2012-09-15 06:36, jyoun...@kc.rr.com wrote:

Hello,

I am working in both OS X Snow Leopard and Lion (10.6.8 and 10.7.4).
I'm simply wanting to move folders (with their content) from various
servers to the hard drive and then back to different directories on the
servers.

I want to be careful not to remove any metadata or resource forks from
the files in the directories.  I did a bit of researching on shutil, and
looks like it is similar to using "cp -p" and copystat(), which I believe
will keep the resource fork, etc.

Here's the code I came up with.  I'm curious if anyone finds fault with
this, or if there's a better way to do this?
Not in this particular case, because you know that these directories are 
on different computers. But instead of rmtree+copytree, I would rather 
use shutil.move() because it will use os.rename() when the source and 
the destination are on the same filesystem. Much much faster.

--
http://mail.python.org/mailman/listinfo/python-list

reportlab and python 3

2012-09-17 Thread Laszlo Nagy


Reportlab is on the wall of shame. http://python3wos.appspot.com/

Is there other ways to create PDF files from python 3? There is pyPdf. I 
haven't tried it yet, but it seem that it is a low level library. It 
does not handle "flowables" that are automatically split across pages. 
It does not handle "table headers" that are repeated automatically on 
the top of every page (when the table does not fit on a page). I need a 
higher level API, with features compareable to reportlab. Is there such 
thing?


Thanks,

   Laszlo
--
http://mail.python.org/mailman/listinfo/python-list

Re: reportlab and python 3

2012-09-18 Thread Laszlo Nagy




A big yes and it is very easy. I assume you know how
to write a plain text file with Python :-).

Use your Python to generate a .tex file and let it compile
with one of the pdf TeX engines.

Potential problems:
- It requires a TeX installation (a no problem).
- Of course I requires some TeX knowledge. Learning it
is not so complicate. Learn how to use TeX with a text
editor and you will quickly understand what you have to
program in Python. Bonus: you learn at the same time
a good text editing engine.

I can not figure out something more simple, versatile and
powerful.

jmf

This is a good idea. Thank you. I wanted to learn TeX anyway. The TeX 
installation is problematic. I also want to use this under MS Windows. 
Yes, I know here is MikTeX for Windows. But there is significant 
difference. ReportLab can be embedded into a small program created with 
py2exe. LaTeX on the other side is a 150MB separate installation package 
that must be installed separately by hand.


But in my particular case, it is still a good solution.

Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: reportlab and python 3

2012-09-18 Thread Laszlo Nagy




I understood, you have Python on a platform and starting
from this you wish to create pdf files.
Obviously, embedding "TeX" is practically a no solution,
although distibuting a portable standalone TeX distribution
is a perfectly viable solution, especially on Windows!

To "I wanted to learn TeX anyway.":
I can only warmly recommend to start with one of the two
unicode compliant engines, LuaTeX or XeTeX.
All right. Which one is the better? :-) I'm totally a beginner. I would 
also like to use mathematical expressions but I guess they are both 
capable of that. Another requirement would be: easy installation under 
unix and windows, good multilingual support.



--
http://mail.python.org/mailman/listinfo/python-list

Re: how to do draw pattern with python?

2012-09-21 Thread Laszlo Nagy


On 2012-09-21 15:36, echo.hp...@gmail.com wrote:

may i know how to shift the bits using only looping and branching??




xx
.x..x.
..xx..
..xx..
.x..x.
xx


What kinds of bits? What are these points and x-es anyway? Are they 
strings? Or binary data?


I recommend this for reading:

http://www.catb.org/esr/faqs/smart-questions.html
--
http://mail.python.org/mailman/listinfo/python-list

Module baldy compiled to pyc?

2012-09-27 Thread Laszlo Nagy

Today I had a strange experience. I have copied some updated py files 
(modules) to a directory on a remote server, overwritting the old ones. 
The pyc files on the server were older. Many programs are importing 
these modules, and most of them are started as background jobs (from 
cron). They started to throw all kinds of errors. I checked the py 
files, and they did have class definitions inside. However when I tried 
to use them I got AttributeError exceptions. Telling that those things 
are not in the module.


I checked their contents by importing them, and they were not defined 
indeed. Finally I have checked the module.__file__ attributes to see 
that they are imported from the right place. The __file__ contained the 
path to the compiled pyc file, but the path was correct. So finally I 
have deleted all pyc files, and suddenly every program was working 
again. (Starting the interpreter again and importing the modules again 
did not solve the problem.)


I suspect that there were two (or more) programs starting at the same 
time, writting the same pyc file at the same time. It happened with two 
modules today. Over the years, I have always copied files to this 
server, and let background programs compile the pyc files as needed.  I 
have never experienced anything like this before, and I cannot reproduce 
the error.


The question is this: do you think this could happen? Is it possible 
that something else caused the problem? What else could it be? What are 
the chances that it will happen again, and how can I prevent it?


Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: howto handle nested for

2012-09-28 Thread Laszlo Nagy


On 2012-09-28 16:39, Neal Becker wrote:

I know this should be a fairly basic question, but I'm drawing a blank.

I have code that looks like:
  
   for s0 in xrange (n_syms):

 for s1 in xrange (n_syms):
 for s2 in xrange (n_syms):
 for s3 in xrange (n_syms):
 for s4 in range (n_syms):
 for s5 in range (n_syms):

Now I need the level of nesting to vary dynamically.  (e.g., maybe I need to add
for  s6 in range (n_syms))

Smells like a candidate for recursion.  Also sounds like a use for yield.  Any
suggestions?

In your example, it seem that the iterable of the for loop is always the 
same: range(n_sysms). It seems to be a number. Is that true? If that is 
so, then here is something useful:


import copy

class MultiLevelIterator(object):
def __init__(self,levels,n):
assert(levels>0)
assert(n>0)
self.levels = levels
self.values = [0]*levels
self.n = n

def __iter__(self):
return self

def next(self):
res = copy.copy(self.values)
idx = self.levels-1
while idx>=0:
self.values[idx]+=1
if self.values[idx]>=self.n:
self.values[idx] = 0
idx-=1
else:
return res
raise StopIteration

i = MultiLevelIterator(2,3)
for values in i:
print values

This will print:

[0, 0]
[0, 1]
[0, 2]
[1, 0]
[1, 1]
[1, 2]
[2, 0]
[2, 1]


--
http://mail.python.org/mailman/listinfo/python-list

Re: Reading properties file in Python, except using ConfigParser()

2012-10-05 Thread Laszlo Nagy


On 2012-10-05 09:20, justmailha...@gmail.com wrote:

Hi All,

How to read properties file in Python? I found ConfigParser() but it has a 
'section' limitation, so looking for other alternatives.

http://wiki.python.org/moin/ConfigParserShootout

--
http://mail.python.org/mailman/listinfo/python-list

Re: how to build Object for List data

2012-10-08 Thread Laszlo Nagy



Seq  validation
1   Program3,1,3,4  # max(F1,F3) to F4
..
n
How to using python to Read the text file, Build the data as object
class ?
Open the file using the open() command. Then iterate over the lines 
within a stateful algorithm that parses the lines with regular expressions.


What did you try so far?
--
http://mail.python.org/mailman/listinfo/python-list

Re: bad httplib latency due to IPv6 use

2012-10-17 Thread Laszlo Nagy




What I'm wondering is this:
1. The server only serves on IPv4, changing this to IPv6 would probably
help. However, I wouldn't consider this a bug, or?

I'd say it's a bug in your TCP/IP stack.  An IP shouldn't take that long
to figure out that it is not configured to connect to IPv6 addresses.
It might also be, that he has a firewall installed that is blocking 
access to ::1. In that case, it takes much more time to figure out that 
you cannot connect. Because in that case, it is not a "connection 
refused" problem, but a "trying to connect to a closed/not responding 
port" problem.

--
http://mail.python.org/mailman/listinfo/python-list

Re: SSH Connection with Python

2012-10-25 Thread Laszlo Nagy


On 2012-10-25 12:16, Schneider wrote:

Hi Folkz,
how can i create a SSH-Connection with python? I have to send some 
commands to the remote host and parse their answers.

greatz Johannes

http://www.lag.net/paramiko/

Another solution would be to use subprocess and/or pexpect


--
http://mail.python.org/mailman/listinfo/python-list

Re: using smtp sent large file upto 60MB

2012-12-04 Thread Laszlo Nagy




Thank for suggestion. The next task will be ftp to user folder. But
first tasks is how to using python send huge files.
Most SMTP servers are configured not to accept attachments bigger than 
10 or 15MB. In general, you should never send emails with >5MB 
attachments. Not because it is not possible, but because it is 
unreliable, and the solution is never in your hand. The solution depends 
on the SMTP server configuration, and in most cases you don't have 
access to the computers holding the final destination of the emails.


If you still don't want to accept this suggestion, then go ahead! Write 
a program, send out 100MB emails, and you will see for yourself that it 
just doesn't work.



--
http://mail.python.org/mailman/listinfo/python-list

urllib.urlretrieve never returns???

2012-03-17 Thread Laszlo Nagy

See attached example code. I have a program that calls exactly the same 
code, and here is the strange thing about it:


 * Program is started as "start.py", e.g. with an open console. In this
   case, the program works!
 * Program is started as "start.pyw", e.g. with no open console under
   Windows 7 64bit - the program does not work!

In the later case, "log.txt" only contains "#1" and nothing else. If I 
look at pythonw.exe from task manager, then its shows +1 thread every 
time I click the button, and "#1" is appended to the file.


Seems like urllib.urlretrieve() does not return at all!

Using wx.PySimpleApp(redirect=True) does not help either - nothing is 
printed on the redirected console.


Unfortunately, I cannot send the whole program, because it is too big 
and also because it is confidental.


Question is: how is it possible that urllib.urlretrieve() does not 
return? It is part of a system library. What I could have possibly done 
to screw it up? And moreover, how is it possible that it does not return 
ONLY when the program is started with pythonw.exe?


Thanks,

   Laszlo

import wx
import wx.html
import urllib
import thread


class Main(wx.Frame):
def __init__(self):
wx.Frame.__init__(self,None,-1)

btn = wx.Button(self,-1,"Test")
btn.Bind(wx.EVT_BUTTON,self.OnTest,btn)

self.imgProduct = wx.html.HtmlWindow(self,-1)
if "gtk2" in wx.PlatformInfo:
self.imgProduct.SetStandardFonts()


sizer = wx.BoxSizer(wx.VERTICAL)
sizer.Add(btn,0,wx.EXPAND)
sizer.Add(self.imgProduct,1,wx.EXPAND)
self.SetSizer(sizer)
self.SetMinSize((800,600))
self.SetSize((800,600))

def OnTest(self,evt):
imgurl = 
"http://www.shopzeus.hu/thumbnail.php?width=200&image=pyramid/PP0830.jpg";
thread.start_new_thread(self.GetThumbnail,(imgurl,))

def Log(self,msg):
fout = open("log.txt","a")
fout.write(repr(msg)+"\n")
fout.close()

def GetThumbnail(self,imgurl):
self.Log("#1")
try:
fpath = urllib.urlretrieve(imgurl)[0]
except:
self.Log(traceback.format_exc())
return
self.Log("#2")
wx.CallAfter(self.imgProduct.SetPage,""%fpath)
self.Log("#3")




app = wx.PySimpleApp(redirect=True)
frm = Main()
frm.Show()
app.MainLoop()
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib.urlretrieve never returns???

2012-03-17 Thread Laszlo Nagy


2012.03.17. 17:34 keltezéssel, Chris Angelico wrote

2012/3/18 Laszlo Nagy:

In the later case, "log.txt" only contains "#1" and nothing else. If I look
at pythonw.exe from task manager, then its shows +1 thread every time I
click the button, and "#1" is appended to the file.

 try:
 fpath = urllib.urlretrieve(imgurl)[0]
 except:
 self.Log(traceback.format_exc())
 return


Just a stab in the dark, but I'm wondering if something's throwing an
exception when it's running without a GUI? Your except clause
references 'traceback' which I don't see anywhere else - or is that
part of the code you can't share? Anyway, my guess is that something
perhaps tries to write to the console and can't, and it gets stuck
inside format_exc(). Is that testable?
You are right, I should have added "import traceback". However, I tried 
this:


try:
fpath = urllib.urlretrieve(imgurl)[0]
except:
self.Log("Exception")
return


and still nothing was logged. Another proof is that the number of 
threads is increased every time I press the button. So I'm 100% sure 
that urlretrieve does not return from the call.



--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib.urlretrieve never returns???

2012-03-19 Thread Laszlo Nagy


On 2012-03-17 19:18, Christian Heimes wrote:

Am 17.03.2012 15:13, schrieb Laszlo Nagy:

See attached example code. I have a program that calls exactly the same
code, and here is the strange thing about it:

   * Program is started as "start.py", e.g. with an open console. In this
 case, the program works!
   * Program is started as "start.pyw", e.g. with no open console under
 Windows 7 64bit - the program does not work!

The pythonw.exe may not have the rights to access network resources.
Have you set a default timeout for sockets?

import socket
socket.setdefaulttimeout(10) # 10 seconds

A pyw Python doesn't have stdout, stderr or stdin. Any attempt to write
to them (e.g. print statement, default exception logger) can cause an
exception.

Yes, that is why I started the app with

wx.PySimpleApp(redirect=True)

so stdout is redirected into a window. Unfortunately, nothing is logged.

I'm going to turn off the firewall and start as admin today. That might 
be the solution. Thanks!

--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib.urlretrieve never returns???

2012-03-19 Thread Laszlo Nagy


The pythonw.exe may not have the rights to access network resources.

Have you set a default timeout for sockets?

import socket
socket.setdefaulttimeout(10) # 10 seconds
I have added pythonw.exe to allowed exceptions. Disabled firewall 
completely. Set socket timeout to 10 seconds. Still nothing.


urllib.urlretrieve does not return from call

any other ideas?


--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib.urlretrieve never returns???

2012-03-20 Thread Laszlo Nagy


Here you can find the example program and the original post.

http://code.activestate.com/lists/python-list/617894/


I gather you are running urlretrieve in a separate thread, inside a GUI?

Yes.


I have learned that whenever I have inexplicable behaviour in a function,
I should check my assumptions. In this case, (1) are you sure you have
the right urlretrieve, and (2) are you sure that your self.Log() method
is working correctly? Just before the problematic call, do this:

# was:
fpath = urllib.urlretrieve(imgurl)[0]

# becomes:
print(urllib.__file__, urlretrieve)
self.Log(urllib.__file__, urlretrieve)
fpath = urllib.urlretrieve(imgurl)[0]
I called self.Log() after each line, and also from a general "except:" 
clause. Definitely, the line after urlretrieve is not executed, and no 
exception is raised. Number of threads goes up (visible from task manager).


It is true that the program uses another module that uses the socket 
module and multiple threads. (These are written in pure python.)


If I remove the other module, then there is no error, however it renders 
the application useless. If I start the program with a console (e.g. 
with python.exe instead of pythonw.exe) then it works. Looks like 
opening a console solves the problem, although nothing is ever printed 
on the console.

and ensure that you haven't accidentally shadowed them with something
unexpected. Does the output printed to the console match the output
logged?
Well, this cannot be tested. If there is a console, then there is no 
problem.


What happens if you take the call to urlretrieve out of the thread and
call it by hand?

Then it works.

Run urllib.urlretrieve(imgurl) directly in the
interactive interpreter. Does it still hang forever?

Then it works perfectly.


When you say it "never" returns, do you mean *never* or do you mean "I
gave up waiting after five minutes"? What happens if you leave it to run
all day?
I did not try that. But I have already set socket timeout to 10 seconds, 
and definitely it is not waiting for a response from the server.


How big are the files you are trying to retrieve?

34 KB

Try retrieving a really small file. Then try retrieving a non-existent file.

Good point. I'll try to retrieve a nonexistent file when I get home. :)


What happens if you call urlretrieve with a reporthook argument? Does it
print anything?
I'll try this too. I'll also try using pycurl or the low level socket 
module instead.


What happens if you try to browse to imgurl in your web browser? Are you
sure the problem is with urlretrieve and not the source?

Yes.


--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib.urlretrieve never returns???

2012-03-20 Thread Laszlo Nagy


2012.03.20. 8:08 keltezéssel, Laszlo Nagy írta:

Here you can find the example program and the original post.

http://code.activestate.com/lists/python-list/617894/


I gather you are running urlretrieve in a separate thread, inside a GUI?

Yes.


I have learned that whenever I have inexplicable behaviour in a 
function,

I should check my assumptions. In this case, (1) are you sure you have
the right urlretrieve, and (2) are you sure that your self.Log() method
is working correctly? Just before the problematic call, do this:

# was:
fpath = urllib.urlretrieve(imgurl)[0]

# becomes:
print(urllib.__file__, urlretrieve)
self.Log(urllib.__file__, urlretrieve)
fpath = urllib.urlretrieve(imgurl)[0]
I called self.Log() after each line, and also from a general "except:" 
clause. Definitely, the line after urlretrieve is not executed, and no 
exception is raised. Number of threads goes up (visible from task 
manager).


It is true that the program uses another module that uses the socket 
module and multiple threads. (These are written in pure python.)


If I remove the other module, then there is no error, however it 
renders the application useless. If I start the program with a console 
(e.g. with python.exe instead of pythonw.exe) then it works. Looks 
like opening a console solves the problem, although nothing is ever 
printed on the console.

and ensure that you haven't accidentally shadowed them with something
unexpected. Does the output printed to the console match the output
logged?
Well, this cannot be tested. If there is a console, then there is no 
problem.


What happens if you take the call to urlretrieve out of the thread and
call it by hand?

Then it works.

Run urllib.urlretrieve(imgurl) directly in the
interactive interpreter. Does it still hang forever?

Then it works perfectly.


When you say it "never" returns, do you mean *never* or do you mean "I
gave up waiting after five minutes"? What happens if you leave it to run
all day?
I did not try that. But I have already set socket timeout to 10 
seconds, and definitely it is not waiting for a response from the server.


How big are the files you are trying to retrieve?

34 KB
Try retrieving a really small file. Then try retrieving a 
non-existent file.

Good point. I'll try to retrieve a nonexistent file when I get home. :)


Today I got a different error message printed on console (program 
started with python.exe)




Unhandled exception in thread started by FrameLocEdit.GetThumbnail of Object of type 'wxPanel *' at 0x4f85300>
>>Unhandled exception in thread started by FrameLocEdit.GetThumbnail of Object of type 'wxPanel *' at 0x4f85300>
>>Unhandled exception in thread started by FrameLocEdit.GetThumbnail of Object of type 'wxPanel *' at 0x4f85300>

>>Unhandled exception in thread started by
Traceback (most recent call last):

Traceback (most recent call last):

Traceback (most recent call last):
of 
>>  File "C:\Python\Projects\Warehouserclient_v3\locedit.py", line 917, 
in GetThumbnail
  File "C:\Python\Projects\Warehouserclient_v3\locedit.py", line 917, 
in GetThumbnail
  File "C:\Python\Projects\Warehouserclient_v3\locedit.py", line 917, 
in GetThumbnail


sys.excepthook is missing
Traceback (most recent call last):

I have never seen a traceback like this before. I didn't install any 
excepthook myself.


Program is using wxPython, socket, threads, threading, PIL.

Here is something even more strange. If I click on the button three 
times, then absolutely nothing gets printed on stdout. However, if I 
close the program with file/exit (actually, calling 
wx.PySimpleApp.ExitMainLoop) then suddenly three stack traces are 
printed on stdout, all lines mixed up:


Unhandled exception in thread started by FrameLocEdit.GetThumbnail of Object of type 'wxPanel *' at 0x4fb530

0>
>>Unhandled exception in thread started by Unhandled exception in 
thread started by 0x4fb5300>
>>Unhandled exception in thread started by Unhandled exception in 
thread started by 0x4fb5300>
>>Traceback (most recent call last):FrameLocEdit.GetThumbnail of Object of type 'wxPanel *' at 0x4fb5300>
>>Traceback (most recent call last):FrameLocEdit.GetThumbnail of Object of type 'wxPanel *' at 0x4fb5300>
>>Traceback (most recent call last):  File 
"C:\Python\Projects\Warehouserclient_v3\locedit.py", line 917, in 
GetThumbnail



Traceback (most recent call last):

  File "C:\Python\Projects\Warehouserclient_v3\locedit.py", line 917, 
in GetThumbnail

Traceback (most recent call last):
  File "C:\Python\Projects\Warehouserclient_v3\locedit.py", line 917, 
in GetThumbnail
  File "C:\Python\Projects\Warehouserclient_v3\locedit.py", line 
917, in GetThumbnail
  File "C:\Python\Projects\Warehouserclient_v3\lo

Re: urllib.urlretrieve never returns??? [SOLVED] - workaround

2012-03-20 Thread Laszlo Nagy




I'll be experimenting with pyCurl now.
By replacing the GetThumbnail method with this brainless example, taken 
from the pyCurl demo:



def GetThumbnail(self,imgurl):
class Test:
def __init__(self):
self.contents = ''

def body_callback(self, buf):
self.contents = self.contents + buf

self.Log("#1: "+repr(imgurl))
try:
t = Test()
c = pycurl.Curl()
c.setopt(c.URL, imgurl)
c.setopt(c.WRITEFUNCTION, t.body_callback)
self.Log("#2")
c.perform()
self.Log("#3")
c.close()
self.Log("#4")
fpath = os.path.join(os.environ["TEMP"],"thumbnail.jpg")
fout = open(fpath,"wb+")
self.Log("#5: "+repr(fpath))
try:
fout.write(t.contents)
finally:
fout.close()
self.Log("#6")
except:
self.Log(traceback.format_exc())
return
self.Log("#7")
wx.CallAfter(self.imgProduct.SetPage,"""src="%s">"""%fpath)

self.Log("#8")

Everything works perfectly, in all modes: console, no console, started 
directly and started in separate thread.


So the problem with urllib must be. Maybe wxPython installs some except 
hooks, or who knows? If somebody feels up to it, I can start narrowing 
down the problem to the smallest possible application. But only if 
someone knows how to debug core code because I don't. Otherwise I'll 
just use pyCURL.


Thank you for your help!

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib.urlretrieve never returns???

2012-03-21 Thread Laszlo Nagy


On 2012-03-20 22:26, Prasad, Ramit wrote:

I just looked at your source file on ActiveState and noticed that
you do not import traceback. That is why you are getting the
AttributeError. Now you should be getting a much better error
once you import it:)
Nope. That would result in a NameError. After adding "import traceback", 
I still get several AttributeError messages. The traceback should 
contain the exact line number, and a description about what object is 
missing what attribute anyway. My code is pure Python, and under no 
circumstances should it be able to start a blocked call for a system 
library function.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Async IO Server with Blocking DB

2012-04-06 Thread Laszlo Nagy


There is asyncmongo!

http://pypi.python.org/pypi/asyncmongo/0.1.3

Although I have never tried it. It has support for async I/O for mongodb 
and tornadoweb. Here is a bit old article about it:


http://www.dunnington.net/entry/asynchronous-mongodb-in-tornado-with-asyncmongo

I have a related question. I just ran into this post, when I was 
wondering how to implement sessions with tornado. It is also a blocking 
db access, right? Say, what if I run four instances of tornado on a 
computer (one instance per CPU core) and load-balance requests between 
these instances with ngnix. It cannot be guaranteed that requests from 
the same user will always go to the same web server process. So the same 
session must somehow be accessed from multiple processes. But tornado 
does not have built-in session support.  I have read this article about 
the issue:


http://milancermak.posterous.com/benchmarking-tornados-sessions-0

but I found no actual code for session managers. Does anyone know a 
good, tested implementation of sessions for tornado? Or should I just 
create my own? I'm thinking about running mongodb and using it for 
storing sessions (plus also using it for storing application data...)


Thanks

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

documentation for asyncmongo?

2012-04-07 Thread Laszlo Nagy

Is there any kind of API documentation for asyncmongo? On GITHub they 
say "asyncmongo syntax strives to be similar to pymongo 
.".


However, many basic things do not work or they are not similar.

http://api.mongodb.org/python/2.1.1/tutorial.html

Example from pymongo:


 db.collection_names()

[u'posts', u'system.indexes']

The same in asyncmongo:

TypeError: 'Cursor' object is not callable

Even the connection is different: pymongo.Connect versus 
asyncmongo.Client. It has a "pool_id" parameter and what the hack is 
that? Is there no documentation for asyncmongo at all?


Thanks,

   Laszlo

-- 
http://mail.python.org/mailman/listinfo/python-list

wx MenuItem - icon is missing

2011-07-05 Thread Laszlo Nagy


def onPopupMenu(self,evt):
menu = wx.Menu()
for title,bitmap in self.getPopupMenuItems():
item = wx.MenuItem(None,-1,title)
if bitmap:
item.SetBitmap(bitmap)
menu.AppendItem(item)
menu.Bind(wx.EVT_MENU,self.onPopupMenuItemSelected,item)
self.PopupMenu( menu, evt.GetPoint())
menu.Destroy()

I have read somewhere that under GTK, I have to assign the bitmap before 
Append-ing the MenuItem to the Menu. So did I, but it doesn't work. Menu 
item icons are not showing up in Ubuntu. On Windows 7, everything is 
fine. What am I doing wrong?


System: Ubuntu 11 amd64
Python: 2.7.1+
wx.__version__ '2.8.11.0'

Thanks,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: wx MenuItem - icon is missing

2011-07-05 Thread Laszlo Nagy




1. Post a complete example that demonstrates the problem so that we don't have 
to dummy up a wx app ourselves to try your code.


import sys
import wx
from wx.lib.embeddedimage import PyEmbeddedImage

img = PyEmbeddedImage(

"iVBORw0KGgoNSUhEUgAAABAQCAYf8/9hBmJLR0QAAAD5Q7t/"

"CXBIWXMAAAsTAAALEwEAmpwYB3RJTUUH1QYGDS8dXc5KpwAAADV0RVh0Q29tbWVudAAo"

"YykgMjAwNCBKYWt1YiBTdGVpbmVyCgpDcmVhdGVkIHdpdGggVGhlIEdJTVCQ2YtvAAAB+klE"

"QVQ4y52TwWpTQRSGvzkzwV7S1pI2CFptC3VhUkjabsSlrxBKF0UQdONW3BsK7sQnUPA9pLos"

"WtskzW3AgopKi6jtxYQSY3LnuEi8pYsidjbD8M98c/7/zJil5dJDoMzZRtksLZf0zt3bZzr9"

"7Olz3N/F5tbLUze2Wkek0wHtdgdrhaGhcywu3AQ4BjSbB6cCPrzfw1ohOmzRbB5x/cZcoiWA"

"mZm5UwFTUzmMMagqxhiMMYlmlpZLGjXbPLh/77/8rz56wqULmX4F3W6P8upjfnU6fVUV/QdA"

"RI4t3FpZ4dXaC7yHi5OTfN3fx/uYkfNjtH5GqPcE6RGMCNHhASKG/g0eFwQBla03XJ2dRVUJ"

"w5B8Po+1ljAMyeVyiAiNRgPFsDhfJJVK0e12qdUrSLvdxsceVU1CAojjGDB0Oh289wB4Vay1"

"6GBOLFyZmuH1+joYw0Q2y85OA+9jxjLjvNvdBVXGMhMoUKvVEkgC+PzpI8VioW+h0SCXu4Zz"

"jnq9znyxiIhQrdZwzlEoFJIqNysbyCB2nHN47/G9HtZanHOISNJ3EQP0S0+lUie7MHl5msrm"

"W8Awns2yXa/jrCU9PMx2GGJUGQoCfg/aPDo6ShRFJ1/i/MICANZa4ulpDGBE0EGARoS9vS98"

"//GNw+hgEHIfUK5WN878nf8AhFzLEABZzNIASUVORK5CYII=")



class MyFrame(wx.Frame):

def __init__(self, parent, id=-1, title='Popup image test',
 pos=wx.DefaultPosition, size=(200, 200),
 style=wx.DEFAULT_FRAME_STYLE):
wx.Frame.__init__(self, parent, id, title, pos, size, style)

lst = wx.ListCtrl(self,-1,style=wx.LC_REPORT)
lst.InsertColumn(0, "Column 01")
for i in range(100):
lst.InsertStringItem(sys.maxint, "Right click me %s"%i)


lst.Bind(wx.EVT_LIST_ITEM_RIGHT_CLICK, self.onPopupMenu, lst)

self.Bind(wx.EVT_CLOSE, self.OnClose)


def OnClose(self, event):
self.Destroy()

def onPopupMenu(self,evt):
global img
menu = wx.Menu()
item = wx.MenuItem(None,-1,u"Test")
item.SetBitmap(img.GetBitmap())
menu.AppendItem(item)
#menu.Bind(wx.EVT_MENU,self.onPopupMenuItemSelected,item)
self.PopupMenu( menu, evt.GetPoint())
menu.Destroy()


app = wx.App()
frame = MyFrame(None)
frame.Show()
app.MainLoop()

Under windows, this displays the icon for the popup menu item. Under GTK 
it doesn't and there is no error message, no exception.


Thanks

L

--
http://mail.python.org/mailman/listinfo/python-list

Re: wx MenuItem - icon is missing

2011-07-06 Thread Laszlo Nagy




Under windows, this displays the icon for the popup menu item. Under GTK it 
doesn't and there is no error message, no exception.


I get different results than you.

Under Ubuntu 9.04 w with wx 2.8.9.1, when I right click I see a menu item 
called test with little icon of a calculator or something.

Under OS X 10.6 with wx 2.8.12.0 and Win XP with wx 2.8.10.1, when I right 
click I get this --

Traceback (most recent call last):
   File "x.py", line 46, in onPopupMenu
 item = wx.MenuItem(None,-1,u"Test")
   File 
"/usr/local/lib/wxPython-unicode-2.8.12.0/lib/python2.6/site-packages/wx-2.8-mac-unicode/wx/_core.py",
 line 11481, in __init__
 _core_.MenuItem_swiginit(self,_core_.new_MenuItem(*args, **kwargs))
wx._core.PyAssertionError: C++ assertion "parentMenu != NULL" failed at 
/BUILD/wxPython-src-2.8.12.0/src/common/menucmn.cpp(389) in wxMenuItemBase(): menuitem 
should have a menu
I guess I'll have to write to the wxPython mailing list. Seriously, 
adding a simple menu to something is supposed to be platform 
independent, but we got four different results on four systems. :-(


Thank you for trying out though.


--
http://mail.python.org/mailman/listinfo/python-list

Re: wx MenuItem - icon is missing

2011-07-07 Thread Laszlo Nagy




I can understand why it's frustrating but a menu items with icons on them 
aren't exactly common, so you're wandering into territory that's probably not 
so throughly explored (nor standard across platforms). Now that I think about 
it, I don't know that I've ever seen one under OSX, and I don't even know if 
it's supported at all.
Maybe you are right, I'm not familiar with OS X. But they are common in 
GTK, Qt and Windows.

Me, I would start by addressing the error in the traceback. wx doesn't seem 
happy with an orphan menu item; why not create a wx.Menu and assign the menu 
item to that? It might solve your icon problem; you never know.

I did create it:

menu = wx.Menu() # wx.Menu created here.
item = wx.MenuItem(None,-1,u"Test")
item.SetBitmap(img.GetBitmap())
menu.AppendItem(item) # Item added to menu here.


In defense of wxPython, we have three wx apps in our project and they contain 
very little platform-specific code. To be fair, we've had to rewrite some code 
after we found that it worked on one platform but not another, but generally 
we're able to find code that works on all platforms. We have only a couple of 
places where we were forced to resort to this kind of thing:

if wx.Platform == "__WXGTK__":
   do X
elif wx.Platform == "__WXMAC__":
   do Y
etc.

Hmmm then probably I'll have to install other OS too. :-)

--
http://mail.python.org/mailman/listinfo/python-list

python.org is down?

2011-07-24 Thread Laszlo Nagy

Can it be a problem on my side? I have tried from several different 
computers. I cannot even ping it.


--
http://mail.python.org/mailman/listinfo/python-list

eval, exec and execfile dilemma

2011-07-30 Thread Laszlo Nagy



  Hi,

I have a program that generates source code for a function. I need to 
compile that function so that it can be executed (relatively) fast 
later. In most cases, I'll be generating about 10 functions per second, 
and evaluate them frequently. But sometimes I'll have to generate 100 
functions per second (upper limit). I also need to change the dict of 
global variables that are accessible from that function. Here is the 
dilemma:


   * the eval function can only be used to evaluate expressions. The
 source code of the generated function is too complex to be put
 into a lambda expression. The def statement is not handled by
 eval(), so I cannot use eval().
   * the exec statement can be used to execute a def statement.
 However, I see no way to change the globals, so I cannot use the
 exec statement.
   * the execfile statement could do everything that I want (e.g.
 support statements + changing globals). But I cannot pass the
 source code as a string. Looks like I MUST give a file name as the
 first argument. Even worse, this can't be a temporary file
 (because it has no name on some systems). This I consider a
 security problem. Yes, I know that I might be able to create a
 directory with good permissions, and then write out 10-100 files
 in every second for compiling... but it seems a waste of time, and
 why would I create 100 files per second on an unknown filesystem
 when I only need to pre-compile some functions?

Are there any other options? (Python 2.7 + numpy, should be compatible 
with Windows, Linux and FreeBSD)


Thanks,

   Laszlo

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: eval, exec and execfile dilemma

2011-07-30 Thread Laszlo Nagy




UnboundLocalError: local variable 'bar' referenced before assignment


This works, though (at least it does on 2.7):

--> exec "def foo():\n\tglobal bar\n\tbar+=1\n\treturn 1\n"
--> bar = 9
--> foo()
1
--> bar
10

Laszlo, why do you think you can't use exec?

I'm sorry, first I was not aware of the "in globals locals" part.

Then I also got an UnboundLocalError. Then I also figured out that I can 
use "global bar", but I did not want to. Reason: 100 functions/second 
installed to global namespace doesn't sound well. However, now I 
discovered the locals parameter, and I figured out that I can force the 
compiler to treat the name of the function to be a local name. I do this 
by reserving its name in locals. Then the compiler has no choice but to 
place it in the local namespace:


locals = {'_f_':None}
globals ={} # More specialized version in my program...
exec "def _f_(a):\n\treturn a+1\n\n" in globals,locals
print locals['_f_'](4) # prints '5'

This is good because it is not interfering with module level globals(), 
nor the local namespace. Also good because I can restrict what is 
visible from inside the function.


Thank you for your help.

Best,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: eval, exec and execfile dilemma

2011-07-31 Thread Laszlo Nagy




100 functions/second
installed to global namespace doesn't sound well.

What on earth are you doing needing to use exec to create hundreds of
functions??

:-)

Have you considered not using exec at all, and using a good old-fashioned
factory function and closures?

def factory(x):
 def inner(param):
 return param + x
 return inner

plusone = factory(1)
plustwo = factory(2)


I'm betting that this will be much faster than exec, and much more readable.
I'm working on a program that creates pivot tables from 
multi-dimensional databases. The user is able to give expressions in a 
tiny language. These expressions are then converted to Python source 
code, and compiled into functions.


The generated function is called with several different numpy arrays. In 
most cases, there are only a few functions are created (e.g. when the 
user changes the expression) and they are called many times. But 
sometimes (for example, when creating charts from the data) I have to 
generate a separate function for every fact set in the database. When 
there are many data series with lots of data in the graph, some 100 
functions needs to be generated very fast.


This cannot be done using factory functions, because the function code 
depends on the user's expression. It COULD be done in a different way: 
parsing the user's expression into an abstract syntax tree and then 
provide methods in the AST to evaluate itself. But this approach would 
use too many python method calls. By generating the function source 
code, I can reduce the number of Python method calls needed from several 
thousand to ten or so. In most cases, the user will enter an expression 
that will produce a lambda function (eval+lambda). But with more 
elaborate expressions, I cannot efficiently convert it to a lambda 
expression.


Of course the user can always write the expression in pure Python source 
code, but there are obvious problems with that...


Best,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

How to avoid confusion with method names between layers of a package

2011-07-31 Thread Laszlo Nagy



  Hi All,

I have a package with more layers of code. Bottom layer contains classes 
and methods for dealing with tabular data. Second layer works with 
multi-dimensional data. It provides a data model with unified API for 
accessing multi dimensional databases. Top layer is responsible for 
displaying views of the data model.


The data model can be used in other (non-GUI) applications as well. When 
I was developing the data model, I have tried to follow PEP 8. For 
method names, it says:


"Use the function naming rules: lowercase with words separated by underscores as 
necessary to improve readability."



I also tried to follow Demeter's law. 
(http://en.wikipedia.org/wiki/Law_of_Demeter)


Here are some classes from different layers:

Facts - access facts data (non-visual)
Query - query facts with cache, holds a reference to a Facts (non-visual)
Cube - drill facts for any number of dimensions, holds a reference to a 
Query (non-visual)

CubeGrid - displays a cube in a pivot grid (visual component)

Some methods in one class are also useful in another. For example, a 
Facts instance can tell the number of measures in the database. A Cube 
instance indirectly "owns" a Facts instance (through a Query). So 
Cube.get_measure_count() is a wrapper for Query.get_measure_count() 
which is a wrapper for Facts.get_measure_count(). This is good, because 
- given a cube instance - you can call *cube.get_measure_count()* 
instead of *cube._query._facts.get_measure_count()*. The latter would be 
ugly, and require extra knowledge about the inner structure of the Cube.


So far so good. The problem is that a CubeGrid instance is also a 
wx.Grid instance. However, different naming conventions apply there. All 
method names in wxPython are coming from C++. They use CamelCase method 
names. There is a naming conflict. What should I do?


Solution #1: Mix CamelCase and PEP 8 names in the CubeGrid class. Very 
ugly, inconsistent.
Solution #2: Convert old PEP 8 names in the CubeGrid class, so every 
method in CubeGrid will have CamelCase names. Very inconsistent! Who 
would understand that CubeGrid.GetMeasureCount() is the same as 
Facts.get_measure_count()?
Solution #3: Drop Demeter's law here, and always use 
*CubeGrid.GetCube().get_measure_count()* - doesn't look very nice.


Any other ideas?

Thanks,

   Laszlo

-- 
http://mail.python.org/mailman/listinfo/python-list

SocketServer expceion after upgrading to 2.7

2011-08-10 Thread Laszlo Nagy

Exception happened during processing of request from ('80.99.165.122', 
56069)

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/SocketServer.py", line 284, in 
_handle_request_noblock

self.process_request(request, client_address)
  File "/usr/local/lib/python2.7/SocketServer.py", line 311, in 
process_request

self.shutdown_request(request)
  File "/usr/local/lib/python2.7/SocketServer.py", line 459, in 
shutdown_request

request.shutdown(socket.SHUT_WR)
TypeError: shutdown() takes exactly 0 arguments (1 given)


I get this error with a program, after upgrading to python 2.7. I'm 
using a program that is based on SocketServer and SimpleXMLRPCDispatcher.


Any idea how to fix this?

Thanks,

   Laszlo


--
http://mail.python.org/mailman/listinfo/python-list

Re: problem with GTK language

2011-08-12 Thread Laszlo Nagy


On 2011-08-10 16:22, Peter Irbizon wrote:

Hello,
I have strange problem with gtk language in pygtk. When I run .py file 
it shows all gtk labels in my default windows language (slovak). But 
when I compile it with py2exe and run exe file all labels are in 
english. Why this happens? How can I define on program startup which 
language to use? I would like to choose between english and slovak. I 
tryed to set locals in my application but no luck. What am I doing wrong?

Do you have this in your main program?


import  locale
locale.setlocale(locale.LC_ALL,  '')



--
http://mail.python.org/mailman/listinfo/python-list

Re: testing if a list contains a sublist

2011-08-15 Thread Laszlo Nagy




hi list,
what is the best way to check if a given list (lets call it l1) is
totally contained in a second list (l2)?

for example:
l1 = [1,2], l2 = [1,2,3,4,5] ->  l1 is contained in l2
l1 = [1,2,2,], l2 = [1,2,3,4,5] ->  l1 is not contained in l2
l1 = [1,2,3], l2 = [1,3,5,7] ->  l1 is not contained in l2

my problem is the second example, which makes it impossible to work with
sets insteads of lists. But something like set.issubset for lists would
be nice.

greatz Johannes

Fastest, error-free and simplest solution is to use sets:

>>> l1 = [1,2]
>>> l2 = [1,2,3,4,5]
>>> set(l1)-set(l2)
set([])
>>> set(l2)-set(l1)
set([3, 4, 5])
>>>

Although with big lists, this is not very memory efficient. But I must 
tell you, sometimes I use this method for lists with millions of 
integers, and it is very fast and reliable, and memory is not a concern 
for me, at least - some million integers will fit into a few MB of 
memory. Read the docs about set operators  for creating union, symmetric 
difference etc.


Best,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Re: testing if a list contains a sublist

2011-08-16 Thread Laszlo Nagy




Error free? Consider this stated requirement:

l1 = [1,2,2,], l2 = [1,2,3,4,5] ->  l1 is not contained in l2
If you look it the strict way, "containment" relation for lists is meant 
this way:



l1 = []
l2 = [1,l1,2]   # l2 CONTAINS l1

But you are right, I was wrong. So let's clarify what the OP wants!

For example:

l1 = [1,2,2,], l2 = [2,1,2,3,4,5]


What is the relation between these two lists? Does l2 contain l1 or not? 
In other words, is this "containment" relation interpreted on multisets 
not considering the order of the items?




It also completely ignores list order, which would make [9,8,7] a
sublist of [5,6,7,8,9].
Exactly. However, from the original post of Johannes it was not clear if 
the order of the elements counts or not.


If It this is interpreted as a multiset relation, it would be easier to 
use collections.Counter. If the order of elements is important then he 
can start with a Boyer-Moore algorithm.


Best,

  Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

1 2 3 4 5 >

1 - 100 of 471 matches

Mail list logo