Re: Handling transactions in Python DBI module
Chris Angelico writes: > ... > When I advise my students on basic databasing concepts, I recommend > this structure: > > conn = psycopg2.connect(...) > > with conn, conn.cursor() as cur: > cur.execute(...) > > The transaction block should always start at the 'with' block and end > when it exits. As long as you never nest them (including calling other > database-using functions from inside that block), it's easy to reason > about the database units of work - they always correspond perfectly to > the code blocks. In my context (web applications), I strongly discourage this use - at least when "conn" does not handle subtransactions properly. In a web application, the main transaction should in general be controlled at the request level: a request should either succeed as a whole or have no side effects whatsoever. This prohibits local (somewhere deep in a component) transaction control. -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling transactions in Python DBI module
Israel Brewster writes: > I am working on implementing a Python DB API module, and am hoping I can get > some help with figuring out the workflow of handling transactions. In my > experience (primarily with psycopg2) the workflow goes like this: > > - When you open a connection (or is it when you get a cursor? I *think* it is > on opening a connection), a new transaction is started All databases I have seen so far associate transaction control with the connection, not with the cursor -- this is important, as for some applications you need several independent cursors at the same time which nevertheless must belong to the same transaction. Your cursor api may give you "commit/rollback" -- but only as a convenience; those operations operate on the "connection", not the cursor. > > My primary confusion is that at least for the DB I am working on, to > start/rollback/commit a transaction, you execute the appropriate SQL > statement (the c library I'm using doesn't have any transactional commands, > not that it should). However, to execute the statement, you need a cursor. So > how is this *typically* handled? Does the connection object keep an internal > cursor that it uses to manage transactions? When you open a connection, it is in an "inital mode". I have seen as "initial mode": * "no transaction mode": a transaction is automatically started when an SQL command is executed * "auto commit mode": each SQL command is run in its own transaction; Use "BEGIN" to enter explicite transaction control. -- https://mail.python.org/mailman/listinfo/python-list
tarfile : read from a socket?
https://docs.python.org/2/library/tarfile.html says: tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs) Return a TarFile object for the pathname name. (How) can I read a tar file from a (tcp) socket? I do not have a pathname but a socket object from socket.create_connection() -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlac...@tik.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling transactions in Python DBI module
On Thu, Feb 11, 2016 at 6:59 PM, dieter wrote: > In my context (web applications), I strongly discourage this use - at least > when "conn" does not handle subtransactions properly. > > In a web application, the main transaction should in general be controlled > at the request level: a request should either succeed as a whole or have no > side effects whatsoever. This prohibits local (somewhere > deep in a component) transaction control. Hmm. I'm not 100% convinced that web apps should behave that way; but part of the simplicity comes from requiring that database-dependent code should not call other database-dependent code, as that would create a nested transaction. (That said, though, it's entirely possible that psycopg2 could handle a nested "with conn" as "SAVEPOINT" and either "RELEASE SAVEPOINT" or "ROLLBACK TO SAVEPOINT". But I wouldn't recommend depending on that without confirming it in the docs.) Bear in mind, the rule I gave was a broad and general rule for students to follow, not a hard-and-fast rule for all databasing. It's designed such that the student can learn the exceptions later on, but meanwhile, his code will be correct (if occasionally a little warped to avoid nesting transactions). ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: asyncio and blocking - an update
"Chris Angelico" wrote in message news:CAPTjJmor8dMv2TDtq8RHQgWeSAaZgAmxK9gFth=oojhidwh...@mail.gmail.com... So really, the question is: Is this complexity buying you enough performance that it's worthwhile? Indeed, that is the question. Actually, in my case it is not quite the question. Firstly, although it took me a little while to get AsyncCursor working, it does not feel unduly complex, and actually feels quite light-weight. My tests show fairly consistently that my approach is slightly (5-10%) slower than run_in_executor(), so if that was the only issue I would not hesitate to abandon my approach. However, my concern is not to maximise database performance, but to ensure that in an asynchronous environment, one task does not block the others from responding. My tests simulate a number of tasks running concurrently and trying to access the database. Among other measurements, I track the time that each database access commences. As I expected, tasks run with 'run_in_executor' run sequentially, i.e. the next one only starts when the previous one has finished. This is not because the tasks themselves are sequential, but because 'fetchall()' is (I think) a blocking operation. Conversely, with my approach, all the tasks start within a short time of each other. Because I can process the rows as they are received, it seems to give each task a fairer time allocation. Not to mention that there are very likely to be other non-database tasks running concurrently, and they should also be more responsive. It would be quite difficult to simulate all of this, so I confess that I am relying on gut instinct at the moment. Frank -- https://mail.python.org/mailman/listinfo/python-list
Re: Cygwin and Python3
On 2/10/2016 11:46 PM, blindanag...@nowhere.net wrote: On 10/02/2016 23:05, Mike S wrote: On 2/10/2016 5:05 AM, Mark Lawrence wrote: [snip] Have you seen this? http://www.davidbaumgold.com/tutorials/set-up-python-windows/ I have now, but I'm perfectly happy with the free versions of Visual Studio. [snip] I don't see any references to VS on that page so I don't know what you're referring to. I suspect that Mark is reacting indirectly to the misleading implication on that page that it is necessary to install Cygwin if you want to develop Python code on Windows. Thanks for explaining, I had no idea what that comment might be based on. -- https://mail.python.org/mailman/listinfo/python-list
Suggestions for best practices when automating geocoding task
Good Morning, I welcome feedback and suggestions for libraries or resources in order to automate the following: 1. Given a directory of CSV files (each containing an address field) a. Read each CSV file b. Use address instance in row as part of a query and send request to external API in order to geocode address c. Write response to each row and return the updates file I have been wondering if using a series of decorators could be implemented. Moreover, for the request component, has anyone explored using Tornado or Twister to create a queue for requests? Thank you again for your feedback. -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
Have you tried socket.makefile() method? -- https://mail.python.org/mailman/listinfo/python-list
timedelta and float multiplication in python 2
I ran into a problem today where I had to determine the mean point between two datetimes. Here is an example: >>> start = datetime.datetime(2016,2,11) >>> stop = datetime.datetime.now() >>> mean = start + (stop-start)*0.5 Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for *: 'datetime.timedelta' and 'float' A workaround could be something like this: >>> start + datetime.timedelta(seconds=0.5*((stop-start).total_seconds())) datetime.datetime(2016, 2, 11, 5, 45, 45, 818009) Interesting thing is that total_seconds() already returns fractions of seconds, so it would be almost trivial to implement timedelta multiplication with floats. I have checked and it does work with Python 3. But it does not work with Python 2 - is there a good reason for this? Thanks, Laszlo -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
On 02/11/2016 09:31 AM, Ulli Horlacher wrote: > https://docs.python.org/2/library/tarfile.html says: > > tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs) > > Return a TarFile object for the pathname name. > > > (How) can I read a tar file from a (tcp) socket? > I do not have a pathname but a socket object from socket.create_connection # First you construct a file object with makefile. fo = socket.makefile() # Then you use the fileobj argument with tarfile.open. tarfile.open(mode='r', fileobj = fo) -- https://mail.python.org/mailman/listinfo/python-list
Re: timedelta and float multiplication in python 2
On Thu, Feb 11, 2016 at 9:39 PM, Nagy László Zsolt wrote: > I have checked and it does work with Python 3. But it does not work with > Python 2 - is there a good reason for this? Mainly that Python 3 has had six years of development since Python 2.7, and Python 2 has been getting only bugfixes and security patches since then. I would recommend migrating to Python 3 if you want this feature. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Heap Implementation
On Feb 10, 2016, at 1:23 PM, "Sven R. Kunze" wrote: > Hi Cem, > > On 08.02.2016 02:37, Cem Karan wrote: >> My apologies for not writing sooner, but work has been quite busy lately >> (and likely will be for some time to come). > > no problem here. :) > >> I read your approach, and it looks pretty good, but there may be one issue >> with it; how do you handle the same item being pushed into the heap more >> than once? In my simple simulator, I'll push the same object into my event >> queue multiple times in a row. The priority is the moment in the future >> when the object will be called. As a result, items don't have unique >> priorities. I know that there are methods of handling this from the >> client-side (tuples with unique counters come to mind), but if your library >> can handle it directly, then that could be useful to others as well. > > I've pondered about that in the early design phase. I considered it a > slowdown for my use-case without benefit. > > Why? Because I always push a fresh object ALTHOUGH it might be equal > comparing attributes (priority, deadline, etc.). > > > That's the reason why I need to ask again: why pushing the same item on a > heap? > > > Are we talking about function objects? If so, then your concern is valid. > Would you accept a solution that would involve wrapping the function in > another object carrying the priority? Would you prefer a wrapper that's > defined by xheap itself so you can just use it? Yes. I use priority queues for event loops. The items I push in are callables (sometimes callbacks, sometimes objects with __call__()) and the priority is the simulation date that they should be called. I push the same item multiple times in a row because it will modify itself by the call (e.g., the location of an actor is calculated by its velocity and the date). There are certain calls that I tend to push in all at once because the math for calculating when the event should occur is somewhat expensive to calculate, and always returns multiple dates at once. That is also why deleting or changing events can be useful; I know that at least some of those events will be canceled in the future, which makes deleting useful. Note that it is also possible to cancel an event by marking it as cancelled, and then simply not executing it when you pop it off the queue, but I've found that there are a few cases in my simulations where the number of dead events that are in the queue exceeds the number of live events, which does have an impact on memory and operational speed (maintaining the heap invariant). There isn't much difference though, but I need FAST code to deal with size of my simulations (thousands to tens of thousands of actors, over hundreds of millions of simulations, which is why I finally had to give up on python and switch to pure C). Having a wrapper defined by xheap would be ideal; I suspect that I won't be the only one that needs to deal with this, so having it centrally located would be best. It may also make it possible for you to optimize xheap's behavior in some way. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: timedelta and float multiplication in python 2
Nagy László Zsolt wrote: > I ran into a problem today where I had to determine the mean point > between two datetimes. Here is an example: > start = datetime.datetime(2016,2,11) stop = datetime.datetime.now() mean = start + (stop-start)*0.5 > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for *: 'datetime.timedelta' and > 'float' > > A workaround could be something like this: > start + datetime.timedelta(seconds=0.5*((stop-start).total_seconds())) > datetime.datetime(2016, 2, 11, 5, 45, 45, 818009) How about mean = start + (stop - start) / 2 ? > Interesting thing is that total_seconds() already returns fractions of > seconds, so it would be almost trivial to implement timedelta > multiplication with floats. > > I have checked and it does work with Python 3. But it does not work with > Python 2 - is there a good reason for this? > > Thanks, > >Laszlo -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
Antoon Pardon wrote: > > (How) can I read a tar file from a (tcp) socket? > > I do not have a pathname but a socket object from socket.create_connection > > # First you construct a file object with makefile. > > fo = socket.makefile() > > # Then you use the fileobj argument with tarfile.open. > > tarfile.open(mode='r', fileobj = fo) I have: sock = socket.create_connection((server,port)) bs = kB64 taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') Traceback (most recent call last): (...) File "./fexit.py", line 1838, in sex_send taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') File "/usr/lib/python2.7/tarfile.py", line 1695, in open return cls.taropen(name, mode, fileobj, **kwargs) File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen return cls(name, mode, fileobj, **kwargs) File "/usr/lib/python2.7/tarfile.py", line 1566, in __init__ self.offset = self.fileobj.tell() AttributeError: '_fileobject' object has no attribute 'tell' -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlac...@tik.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
On 2016-02-11 12:53, Ulli Horlacher wrote: Antoon Pardon wrote: > (How) can I read a tar file from a (tcp) socket? > I do not have a pathname but a socket object from socket.create_connection # First you construct a file object with makefile. fo = socket.makefile() # Then you use the fileobj argument with tarfile.open. tarfile.open(mode='r', fileobj = fo) I have: sock = socket.create_connection((server,port)) bs = kB64 taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') Traceback (most recent call last): (...) File "./fexit.py", line 1838, in sex_send taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') File "/usr/lib/python2.7/tarfile.py", line 1695, in open return cls.taropen(name, mode, fileobj, **kwargs) File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen return cls(name, mode, fileobj, **kwargs) File "/usr/lib/python2.7/tarfile.py", line 1566, in __init__ self.offset = self.fileobj.tell() AttributeError: '_fileobject' object has no attribute 'tell' I suppose you could write your own class to wrap the socket and provide the required methods. -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
On Thu, Feb 11, 2016 at 11:53 PM, Ulli Horlacher wrote: > I have: > > sock = socket.create_connection((server,port)) > bs = kB64 > taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') > > > > Traceback (most recent call last): > (...) > File "./fexit.py", line 1838, in sex_send > taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') > File "/usr/lib/python2.7/tarfile.py", line 1695, in open > return cls.taropen(name, mode, fileobj, **kwargs) > File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen > return cls(name, mode, fileobj, **kwargs) > File "/usr/lib/python2.7/tarfile.py", line 1566, in __init__ > self.offset = self.fileobj.tell() > AttributeError: '_fileobject' object has no attribute 'tell' Sounds like tarfile needs a seekable file. How big is this file you're reading? Can you simply read the whole thing into memory, then use io.BytesIO? I had a quick glance at help(BytesIO) but didn't find a simple way to make a buffer that reads from an upstream file when it needs more content, but it should be possible to build one. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Cygwin and Python3
On 11/02/2016 07:46, blindanag...@nowhere.net wrote: On 10/02/2016 23:05, Mike S wrote: On 2/10/2016 5:05 AM, Mark Lawrence wrote: [snip] Have you seen this? http://www.davidbaumgold.com/tutorials/set-up-python-windows/ I have now, but I'm perfectly happy with the free versions of Visual Studio. [snip] I don't see any references to VS on that page so I don't know what you're referring to. I suspect that Mark is reacting indirectly to the misleading implication on that page that it is necessary to install Cygwin if you want to develop Python code on Windows. Absolutely correct, marks out of ten, fifteen :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
Ulli Horlacher wrote: > I have: > > sock = socket.create_connection((server,port)) > bs = kB64 > taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') > > > > Traceback (most recent call last): > (...) > File "./fexit.py", line 1838, in sex_send > taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w') > File "/usr/lib/python2.7/tarfile.py", line 1695, in open > return cls.taropen(name, mode, fileobj, **kwargs) > File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen > return cls(name, mode, fileobj, **kwargs) > File "/usr/lib/python2.7/tarfile.py", line 1566, in __init__ > self.offset = self.fileobj.tell() > AttributeError: '_fileobject' object has no attribute 'tell' Reading the doc helps :-) https://docs.python.org/2/library/tarfile.html For special purposes, there is a second format for mode: 'filemode|[compression]'. tarfile.open() will return a TarFile object that processes its data as a stream of blocks. With taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w|') I get no more error. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlac...@tik.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
Chris Angelico wrote: > Sounds like tarfile needs a seekable file. How big is this file you're > reading? No limits. It can be many TBs... The use case is: http://fex.rus.uni-stuttgart.de:8080/ -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlac...@tik.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ -- https://mail.python.org/mailman/listinfo/python-list
Does anyone here use wxGlade on Linux?
I am trying out wxGlade on Linux, version 0.7.1 of wxGlade on xubuntu 15.10. I have already written something using wxPython directly so I have the basics (of my Python skills and the environment) OK I think. I am having a lot of trouble getting beyond the first hurdle of creating a trivial Python GUI with wxGlade. Some of the problem is no doubt that I'm unfamiliar with the interface but I seem to repeatedly get to a situation where the interface won't respond to mouse clicks (though the main menu items still work, I can Exit OK for instance). Is wxPython still buggy or is it really just down to my lack of familiarity with it? -- Chris Green · -- https://mail.python.org/mailman/listinfo/python-list
Unable to insert data into MongoDB.
Hi guys. I am basically transferring the data from PLC to PC (where the Python API runs) but I'm unable to insert into MongoDB thereafter. When I run the Python script on IDLE, the output is Hello World! Traceback (most recent call last): File "C:\Users\SRA2LO\Desktop\API.py", line 32, in s.bind((IP, PORT)) File "C:\Python27\lib\socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 10013] An attempt was made to access a socket in a way forbidden by its access permissions and when I change the IP of MongoDB server, it shows Hello World! Traceback (most recent call last): File "C:\Users\SRA2LO\Desktop\API.py", line 32, in s.bind((IP, PORT)) File "C:\Python27\lib\socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 10049] The requested address is not valid in its context. Could you please help me out? I have disabled the firewall as well. Here's the API I have written. #!/usr/bin/python import socket import socket from pymongo import MongoClient #from eve import Eve import datetime # Connection to server (PLC) on port 27017 server = socket.socket() host = "10.52.124.135" #IP of PLC port = 27017 BUFFER_SIZE = 1024 ### server.connect((host, port)) print server.recv(1024) server.close #Connection to Client (Mongodb) on port 27017 IP = "127.0.0.1" PORT = 27017 BUFFER_SIZE = 1024 client = MongoClient('127.0.0.1', 27017) db = client.test_database s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind((IP, PORT)) s.listen(1) #connections loop while True: conn, addr = s.accept() print 'Connection address:',addr try: # read loop while True: data = server.recv(BUFFER_SIZE) if not data: break conn.sendall(data) # send to MongoDB mongodoc = { "data": data, "date" : datetime.datetime.utcnow() } ABC = db.ABC ABC_id = ABC.insert_one(mongodoc).inserted_id finally: conn.close() -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to insert data into MongoDB.
On 2016-02-11 15:12, Arjun Srivatsa wrote: Hi guys. I am basically transferring the data from PLC to PC (where the Python API runs) but I'm unable to insert into MongoDB thereafter. When I run the Python script on IDLE, the output is Hello World! Traceback (most recent call last): File "C:\Users\SRA2LO\Desktop\API.py", line 32, in s.bind((IP, PORT)) File "C:\Python27\lib\socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 10013] An attempt was made to access a socket in a way forbidden by its access permissions and when I change the IP of MongoDB server, it shows Hello World! Traceback (most recent call last): File "C:\Users\SRA2LO\Desktop\API.py", line 32, in s.bind((IP, PORT)) File "C:\Python27\lib\socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 10049] The requested address is not valid in its context. Could you please help me out? I have disabled the firewall as well. Here's the API I have written. #!/usr/bin/python import socket import socket from pymongo import MongoClient #from eve import Eve import datetime # Connection to server (PLC) on port 27017 server = socket.socket() host = "10.52.124.135" #IP of PLC port = 27017 BUFFER_SIZE = 1024 ### server.connect((host, port)) print server.recv(1024) server.close #Connection to Client (Mongodb) on port 27017 IP = "127.0.0.1" PORT = 27017 BUFFER_SIZE = 1024 client = MongoClient('127.0.0.1', 27017) db = client.test_database s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind((IP, PORT)) s.listen(1) #connections loop while True: conn, addr = s.accept() print 'Connection address:',addr try: # read loop while True: data = server.recv(BUFFER_SIZE) if not data: break conn.sendall(data) # send to MongoDB mongodoc = { "data": data, "date" : datetime.datetime.utcnow() } ABC = db.ABC ABC_id = ABC.insert_one(mongodoc).inserted_id finally: conn.close() I don't know whether it's relevant, but you didn't close the server socket. You have "server.close" instead of "server.close()". Also, the code as posted won't compile because the block after the "while True:" isn't indented. -- https://mail.python.org/mailman/listinfo/python-list
Re: Unable to insert data into MongoDB.
Hello, I changed PORT = 27017 to PORT = 5. I am not getting the error anymore. But I am still unable to insert the data into MongoDB. On Thursday, February 11, 2016 at 4:12:30 PM UTC+1, Arjun Srivatsa wrote: > Hi guys. I am basically transferring the data from PLC to PC (where the > Python API runs) but I'm unable to insert into MongoDB thereafter. When I run > the Python script on IDLE, the output is > > Hello World! > Traceback (most recent call last): File "C:\Users\SRA2LO\Desktop\API.py", > line 32, in s.bind((IP, PORT)) File "C:\Python27\lib\socket.py", > line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 10013] > An attempt was made to access a socket in a way forbidden by its access > permissions > and when I change the IP of MongoDB server, it shows > > > Hello World! > Traceback (most recent call last): File "C:\Users\SRA2LO\Desktop\API.py", > line 32, in s.bind((IP, PORT)) File "C:\Python27\lib\socket.py", > line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 10049] > The requested address is not valid in its context. > > Could you please help me out? I have disabled the firewall as well. > > Here's the API I have written. > > #!/usr/bin/python > > import socket > import socket > from pymongo import MongoClient > #from eve import Eve > import datetime > > # Connection to server (PLC) on port 27017 > server = socket.socket() > host = "10.52.124.135" #IP of PLC > port = 27017 > BUFFER_SIZE = 1024 > ### > > server.connect((host, port)) > print server.recv(1024) > > server.close > > #Connection to Client (Mongodb) on port 27017 > IP = "127.0.0.1" > PORT = 27017 > BUFFER_SIZE = 1024 > > client = MongoClient('127.0.0.1', 27017) > db = client.test_database > > s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > s.bind((IP, PORT)) > s.listen(1) > > #connections loop > while True: > conn, addr = s.accept() > print 'Connection address:',addr > try: > # read loop > while True: > data = server.recv(BUFFER_SIZE) > > if not data: break > conn.sendall(data) > > > # send to MongoDB > > mongodoc = { "data": data, "date" : datetime.datetime.utcnow() } > > > ABC = db.ABC > ABC_id = ABC.insert_one(mongodoc).inserted_id > > finally: > conn.close() -- https://mail.python.org/mailman/listinfo/python-list
Re: Does anyone here use wxGlade on Linux?
On Thu, 11 Feb 2016 14:29:04 +, cl wrote: > I am trying out wxGlade on Linux, version 0.7.1 of wxGlade on xubuntu > 15.10. > > I have already written something using wxPython directly so I have the > basics (of my Python skills and the environment) OK I think. > > I am having a lot of trouble getting beyond the first hurdle of > creating a trivial Python GUI with wxGlade. Some of the problem is no > doubt that I'm unfamiliar with the interface but I seem to repeatedly > get to a situation where the interface won't respond to mouse clicks > (though the main menu items still work, I can Exit OK for instance). > > Is wxPython still buggy or is it really just down to my lack of > familiarity with it? Sure, there are bugs in wxPython, but they are "minor". I haven't tried using wxGlade, but if it's anything like the glade I tried using long ago there are issues in getting the two working together. -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
Ulli Horlacher wrote: > With > > taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w|') > > I get no more error. Of course, this is the writing client. Now I have a small problem with the reading client. This code works so far: sfo = sock.makefile('r') taro = tarfile.open(fileobj=sfo,mode='r|') taro.extractall(path=edir) But it does not writes anything to the terminal to inform the user. When I use: for member in taro.getmembers(): print('extracting "%s"' % member.name) taro.extract(member) I get the error: File "/usr/lib/python2.7/tarfile.py", line 556, in seek raise StreamError("seeking backwards is not allowed") Of course, a stream is not seekable. Any ideas? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlac...@tik.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling transactions in Python DBI module
On Feb 10, 2016, at 8:14 PM, Chris Angelico wrote: > > On Thu, Feb 11, 2016 at 4:06 PM, Frank Millman wrote: >> A connection has 2 possible states - 'in transaction', or 'not in >> transaction'. When you create the connection it starts off as 'not'. >> >> When you call cur.execute(), it checks to see what state it is in. If the >> state is 'not', it silently issues a 'BEGIN TRANSACTION' before executing >> your statement. This applies for SELECT as well as other statements. >> >> All subsequent statements form part of the transaction, until you issue >> either conn.commit() or conn.rollback(). This performs the required action, >> and resets the state to 'not'. >> >> I learned the hard way that it is important to use conn.commit() and not >> cur.execute('commit'). Both succeed in committing, but the second does not >> reset the state, therefore the next statement does not trigger a 'BEGIN', >> with possible unfortunate side-effects. > > When I advise my students on basic databasing concepts, I recommend > this structure: > > conn = psycopg2.connect(...) > > with conn, conn.cursor() as cur: >cur.execute(...) And that is the structure I tend to use in my programs as well. I could, of course, roll the transaction control into that structure. However, that is a usage choice of the end user, whereas I am looking at the design of the connection/cursor itself. If I use psycopg, I get the transaction - even if I don't use a with block. > > The transaction block should always start at the 'with' block and end > when it exits. As long as you never nest them (including calling other > database-using functions from inside that block), it's easy to reason > about the database units of work - they always correspond perfectly to > the code blocks. > > Personally, I'd much rather the structure were "with > conn.transaction() as cur:", because I've never been able to > adequately explain what a cursor is/does. It's also a bit weird that > "with conn:" doesn't close the connection at the end (just closes the > transaction within that connection). But I guess we don't need a > "Python DB API 3.0". In my mind, cursors are simply query objects containing (potentially) result sets - so you could have two cursors, and loop through them something like "for result_1,result_2 in zip(cursor_1,cursor_2): ". Personally, I've never had a need for more than one cursor, but if you are working with large data sets, and need to work with multiple queries simultaneously without the overhead of loading the results into memory, I could see them being useful. Of course, someone else might have a completely different explanation :-) > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling transactions in Python DBI module
On Feb 10, 2016, at 8:06 PM, Frank Millman wrote: > > "Israel Brewster" wrote in message > news:92d3c964-0323-46ee-b770-b89e7e7e6...@ravnalaska.net... > >> I am working on implementing a Python DB API module, and am hoping I can get >> some help with figuring out the workflow of handling transactions. In my >> experience (primarily with >> psycopg2) the workflow goes like this: >> >> - When you open a connection (or is it when you get a cursor? I *think* it >> is on opening a connection), a new transaction is started >> - When you close a connection, an implicit ROLLBACK is performed >> - After issuing SQL statements that modify the database, you call commit() >> on the CONNECTION object, not the cursor. >> >> My primary confusion is that at least for the DB I am working on, to >> start/rollback/commit a transaction, you execute the appropriate SQL >> statement (the c library I'm using doesn't >> have any transactional commands, not that it should). However, to execute >> the statement, you need a cursor. So how is this *typically* handled? Does >> the connection object keep an > internal cursor that it uses to manage >> transactions? >> >> I'm assuming, since it is called on the connection, not the cursor, that any >> COMMIT/ROLLBACK commands called affect all cursors on that connection. Is >> that correct? Or is this DB >> specific? >> >> Finally, how do other DB API modules, like psycopg2, ensure that ROLLBACK is >> called if the user never explicitly calls close()? > > Rather than try to answer your questions point-by-point, I will describe the > results of some investigations I carried out into this subject a while ago. > > I currently support 3 databases, so I use 3 DB API modules - > PostgreSQL/psycopg2, Sql Server/pyodbc, and sqlite3/sqlite3. The following > applies specifically to psycopg2, but I applied the lessons learned to the > other 2 as well, and have had no issues. > > A connection has 2 possible states - 'in transaction', or 'not in > transaction'. When you create the connection it starts off as 'not'. > > When you call cur.execute(), it checks to see what state it is in. If the > state is 'not', it silently issues a 'BEGIN TRANSACTION' before executing > your statement. This applies for SELECT as well as other statements. > > All subsequent statements form part of the transaction, until you issue > either conn.commit() or conn.rollback(). This performs the required action, > and resets the state to 'not'. > > I learned the hard way that it is important to use conn.commit() and not > cur.execute('commit'). Both succeed in committing, but the second does not > reset the state, therefore the next statement does not trigger a 'BEGIN', > with possible unfortunate side-effects. Thanks - that is actually quite helpful. So the way I am looking at it now is that the connection would have an internal cursor as I suggested. From your response, I'll add a "state" flag as well. If the state flag is not set when execute is called on a cursor, the cursor itself will start a transaction and set the flag (this could happen from any cursor, though, so that could potentially cause a race condition, correct?). In any case, there is now a transaction open, until such a time as commit() or rollback() is called on the connection, or close is called, which executes a rollback(), using the connection's internal cursor. Hopefully that all sounds kosher. > > HTH > > Frank Millman > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
On 2016-02-11 16:41, Ulli Horlacher wrote: Ulli Horlacher wrote: With taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w|') I get no more error. Of course, this is the writing client. Now I have a small problem with the reading client. This code works so far: sfo = sock.makefile('r') taro = tarfile.open(fileobj=sfo,mode='r|') taro.extractall(path=edir) But it does not writes anything to the terminal to inform the user. When I use: for member in taro.getmembers(): print('extracting "%s"' % member.name) taro.extract(member) I get the error: File "/usr/lib/python2.7/tarfile.py", line 556, in seek raise StreamError("seeking backwards is not allowed") Of course, a stream is not seekable. Any ideas? Try this: member = taro.next() while member is not None: print('extracting "%s"' % member.name) taro.extract(member) member = tar.next() -- https://mail.python.org/mailman/listinfo/python-list
modifying a standard module? (was: Re: tarfile : read from a socket?)
Ulli Horlacher wrote: > This code works so far: > > sfo = sock.makefile('r') > taro = tarfile.open(fileobj=sfo,mode='r|') > taro.extractall(path=edir) > > But it does not writes anything to the terminal to inform the user. > > When I use: > > for member in taro.getmembers(): > print('extracting "%s"' % member.name) > taro.extract(member) > > I get the error: > > File "/usr/lib/python2.7/tarfile.py", line 556, in seek > raise StreamError("seeking backwards is not allowed") > > Of course, a stream is not seekable. > > Any ideas? As a hack, I modified the standard library module tarfile.py: root@diaspora:/usr/lib/python2.7# vv -d --- ./.versions/tarfile.py~1~ 2015-06-22 21:59:27.0 +0200 +++ tarfile.py 2016-02-11 18:01:50.18952 +0100 @@ -2045,6 +2045,7 @@ directories.append(tarinfo) tarinfo = copy.copy(tarinfo) tarinfo.mode = 0700 +print('untar "%s"' % tarinfo.name) self.extract(tarinfo, path) # Reverse sort directories. This gives me exact the output I want :-) BUT I want to distribute my program and all others will not see the tar extracting information. Now my question: How can I substitute the standard module function tarfile.extractall() with my own function? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlac...@tik.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
Ulli Horlacher wrote: > Ulli Horlacher wrote: > >> With >> >> taro = tarfile.open(fileobj=sock.makefile('w',kB64),mode='w|') >> >> I get no more error. > > Of course, this is the writing client. > > Now I have a small problem with the reading client. > > This code works so far: > > sfo = sock.makefile('r') > taro = tarfile.open(fileobj=sfo,mode='r|') > taro.extractall(path=edir) > > But it does not writes anything to the terminal to inform the user. > > When I use: > > for member in taro.getmembers(): > print('extracting "%s"' % member.name) > taro.extract(member) > > I get the error: > > File "/usr/lib/python2.7/tarfile.py", line 556, in seek > raise StreamError("seeking backwards is not allowed") > > Of course, a stream is not seekable. > > Any ideas? A look into the source is often helpful ;) $ cat extract_from_stream.py import sys from tarfile import TarFile class MyTarFile(TarFile): def extract(self, member, path="."): print "extracting", member return TarFile.extract(self, member, path) tf = MyTarFile.open(fileobj=sys.stdin, mode="r|") tf.extractall() $ touch foo bar $ tar -cf archive.tar foo bar $ python extract_from_stream.py < archive.tar extracting extracting -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
On Thu, Feb 11, 2016 at 04:41:43PM +, Ulli Horlacher wrote: > sfo = sock.makefile('r') > taro = tarfile.open(fileobj=sfo,mode='r|') > taro.extractall(path=edir) What about using an iterator? def myiter(tar): for t in tar: print "extracting", t.name yield t sfo = sock.makefile('r') taro = tarfile.open(fileobj=sfo,mode='r|') taro.extractall(members=myiter(taro),path=edir) Cheers, -- Lars Gustäbel l...@gustaebel.de -- https://mail.python.org/mailman/listinfo/python-list
Re: Does anyone here use wxGlade on Linux?
Frank Miles wrote: > On Thu, 11 Feb 2016 14:29:04 +, cl wrote: > > > I am trying out wxGlade on Linux, version 0.7.1 of wxGlade on xubuntu > > 15.10. > > > > I have already written something using wxPython directly so I have the > > basics (of my Python skills and the environment) OK I think. > > > > I am having a lot of trouble getting beyond the first hurdle of > > creating a trivial Python GUI with wxGlade. Some of the problem is no > > doubt that I'm unfamiliar with the interface but I seem to repeatedly > > get to a situation where the interface won't respond to mouse clicks > > (though the main menu items still work, I can Exit OK for instance). > > > > Is wxPython still buggy or is it really just down to my lack of > > familiarity with it? > > Sure, there are bugs in wxPython, but they are "minor". I haven't tried > using wxGlade, but if it's anything like the glade I tried using long ago > there are issues in getting the two working together. > Oops, I meant to say "Is wxGlade still buggy". -- Chris Green · -- https://mail.python.org/mailman/listinfo/python-list
Suggested datatype for getting latest information from log files
I have timestamped log files I need to read through and keep track of the most upto date information. For example lets say we had a log file timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten I need to keep track of every 'name' in this table, I don't want duplicate values so if values come in from a later timestamp that is different then that needs to get updated. For example if a later timestamp showed 'dave' with less marbles that should get updated. I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble. For example lets assume that once I have the most upto date values from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten to get an interval then write all the info together to some other database. Crucially order is important here. I don't know of a particular name will appear in the records or not, so it needs to created on the first instance and updated from then on. Could anyone suggest some good approaches or suggested data structures for this? I thought about trying to create an object for each 'name' then check if that object exists and update values within that object. However that seemed like a. overkill b. beyond my Python skills for the timeframe I have -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggested datatype for getting latest information from log files
On 02/11/2016 07:07 PM, ltomassm...@gmail.com wrote: I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble. If it's your only concern about using dictionaries, then you may have a look at https://docs.python.org/2/library/collections.html#collections.OrderedDict JM -- https://mail.python.org/mailman/listinfo/python-list
Re: Cygwin and Python3
Terry Reedy writes: >>> Since Python runs natively in Windows, why are you trying to run it >>> with Cygwin? I'm not implying that you shouldn't, just offhand I don't >>> see a reason for it. >> >> I do it because it's easier to install third party packages, those that >> need an external library to run. Cygwin come with a lot of lib* and >> lib*-devel that permit to just run `pip install xxx' if not already >> packaged. I gave a try on the native Windows version and Anaconda but >> there is at least one package that I could not run (and I loosed >> a lot of time to compile a bunch of libraries). >> >> Example of package: pyproj (proj4), openpyxl with lxml (libxml2, >> libxslt) and pillow (libjpeg, zlib, libtiff, ...), psycopg2 (libpq). > > I belive these are all available at > http://www.lfd.uci.edu/~gohlke/pythonlibs/ How do you know when an upgrade is avaible? -- Benoit Izac -- https://mail.python.org/mailman/listinfo/python-list
Re: Cygwin and Python3
Terry Reedy writes: >>> Since Python runs natively in Windows, why are you trying to run it >>> with Cygwin? I'm not implying that you shouldn't, just offhand I don't >>> see a reason for it. >> >> I do it because it's easier to install third party packages, those that >> need an external library to run. Cygwin come with a lot of lib* and >> lib*-devel that permit to just run `pip install xxx' if not already >> packaged. I gave a try on the native Windows version and Anaconda but >> there is at least one package that I could not run (and I loosed >> a lot of time to compile a bunch of libraries). >> >> Example of package: pyproj (proj4), openpyxl with lxml (libxml2, >> libxslt) and pillow (libjpeg, zlib, libtiff, ...), psycopg2 (libpq). > > I belive these are all available at > http://www.lfd.uci.edu/~gohlke/pythonlibs/ How do you know when an upgrade is available? -- Benoit Izac -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggested datatype for getting latest information from log files
On Thursday, February 11, 2016 at 6:16:35 PM UTC, jmp wrote: > On 02/11/2016 07:07 PM, ltomassm...@gmail.com wrote: > > I thought a dictionary would be a good idea because of the key restrictions > > ensuring no duplicates, so the data would always update - However because > > they are unordered and I need to do some more processing on the data > > afterwards I'm having trouble. > > If it's your only concern about using dictionaries, then you may have a > look at > https://docs.python.org/2/library/collections.html#collections.OrderedDict > > JM I did look into that but I'm trying to do something like this which doesn't work - I guess I'm struggling a little with the implementation. fillinfo = {} fillInfo['name'] = OrderedDict('info1','info2','info3','info4','info5',) -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggested datatype for getting latest information from log files
On Thursday, February 11, 2016 at 6:16:35 PM UTC, jmp wrote: > On 02/11/2016 07:07 PM, ltomassm...@gmail.com wrote: > > I thought a dictionary would be a good idea because of the key restrictions > > ensuring no duplicates, so the data would always update - However because > > they are unordered and I need to do some more processing on the data > > afterwards I'm having trouble. > > If it's your only concern about using dictionaries, then you may have a > look at > https://docs.python.org/2/library/collections.html#collections.OrderedDict > > JM I did look into this but struggling a little with the implementation, currently trying to do something like this which doesn't work: fillInfo = {} p = re.compile('PATTERN') with (open(path,'r')) as f: for row in f: m = p.search(row) if m == None: continue else: fillInfo[m.group(5)] = OrderedDict(m.group(1),m.group(2),m.group(3),m.group(4),m.group(6)) -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggested datatype for getting latest information from log files
Greetings, >I have timestamped log files I need to read through and keep track >of the most upto date information. > >For example lets say we had a log file > >timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten I do not quite understand the distinction between timeStamp and timeNow. >I need to keep track of every 'name' in this table, I don't want >duplicate values so if values come in from a later timestamp that >is different then that needs to get updated. For example if a later >timestamp showed 'dave' with less marbles that should get updated. > >I thought a dictionary would be a good idea because of the key >restrictions ensuring no duplicates, so the data would always >update - Yes. A dictionary seems reasonable. >However because they are unordered and I need to do some more >processing on the data afterwards I'm having trouble. Ordered how? For each name, you need to keep the stream of data ordered? This is what I'm assuming based on your problem description. If the order of names (dave, steve and jenny) is important, then you should look to OrderedDict as JM has suggested. I am inferring from your description that the order of events (along a timeline) is what is important, not the sequence of players to each other(, since that is already in the logfile). >For example lets assume that once I have the most upto date values >from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten >to get an interval then write all the info together to some other >database. Crucially order is important here. Again, it's not utterly clear what "order" means. If order of events for a single player is important, then see below. >I don't know of a particular name will appear in the records or >not, so it needs to created on the first instance and updated from >then on. Again, a dictionary is great for this. It seems that you could benefit, also from a list (to store an event and the time at which the event occurred). But, you don't want to store all of history, so you want to use a bounded length list. You may find a collections.deque useful here. >Could anyone suggest some good approaches or suggested data >structures for this? First, JM already pointed you to OrderedDict, which may help depending on exactly what you are trying to order. There are two other data structures in the collections module that may be helpful for you. I perceive the following (from your description). You have a set of names (players). You wish to store, for each name, a value (marblesHeld). You wish to store, for each name, a value (timeSinceLastEaten). I recommend learning how to use both: collections.defaultdict [0]: so you can dynamically create entries for new players in the marble game without checking if they already exist in the dictionary (very convenient!) collectionst.deque [1]: in this case, I'm suggesting using it as a bounded-length list; you keep adding stuff to it and after it stores X entries, the old ones will "fall off" Note, I fabricated players and data, but the bit that you are probably interested in is the interaction between the dictionary, whose keys are the names of the players, and whose values contain the deque capturing (the last 10 entries) of the users marble count and the time at which this occurred. mydeque = functools.partial(collections.deque, maxlen=10) record = collections.defaultdict(mydeque) Storing both the marble count and the time will allow you to calculate at any time later the duration since the user last had a marble count change. I don't understand how the eating fits into your problem, but maybe my code (below) will afford you an example of how to approach the problem with a few of Python's wonderfully convenient standard library data structures. Good luck, -Martin P.S. I just read your reply to JM, and it looks like you also are trying to figure out how to read the input data. Is it CSV? Could you simply use the csv module [2]? [0] https://docs.python.org/3/library/collections.html#collections.defaultdict [1] https://docs.python.org/3/library/collections.html#collections.deque [2] https://docs.python.org/3/library/csv.html #! /usr/bin/python3 import time import random import functools import collections import pprint players = ['Steve', 'Jenny', 'Dave', 'Samuel', 'Jerzy', 'Ellen'] mydeque = functools.partial(collections.deque, maxlen=10) def marblegame(rounds): record = collections.defaultdict(mydeque) for _ in range(rounds): now = time.time() who = random.choice(players) marbles = random.randint(0, 100) record[who].append((marbles, now)) for whom, marblehistory in record.items(): print(whom, end=": ") pprint.pprint(marblehistory) if __name__ == '__main__': import sys if len(sys.argv) > 1: count = int(sys.argv[1]) else: count = 30 marblegame(count) # -- end of file -- Martin A
Re: Cygwin and Python3
On Thursday, February 11, 2016 at 10:18:53 AM UTC-8, Benoit Izac wrote: > Terry Reedy writes: > > >>> Since Python runs natively in Windows, why are you trying to run it > >>> with Cygwin? I'm not implying that you shouldn't, just offhand I don't > >>> see a reason for it. > >> > >> I do it because it's easier to install third party packages, those that > >> need an external library to run. Cygwin come with a lot of lib* and > >> lib*-devel that permit to just run `pip install xxx' if not already > >> packaged. I gave a try on the native Windows version and Anaconda but > >> there is at least one package that I could not run (and I loosed > >> a lot of time to compile a bunch of libraries). > >> > >> Example of package: pyproj (proj4), openpyxl with lxml (libxml2, > >> libxslt) and pillow (libjpeg, zlib, libtiff, ...), psycopg2 (libpq). > > > > I belive these are all available at > > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > How do you know when an upgrade is available? > > -- > Benoit Izac This thread also seems to address windows lack of support with linux. Only if windows had a linux command line support instead of using power shell. Mac is not preferred OS and its just a matter of time since windows is obsolete. -- https://mail.python.org/mailman/listinfo/python-list
Re: tarfile : read from a socket?
On Thu, Feb 11, 2016, at 11:41, Ulli Horlacher wrote: > When I use: > > for member in taro.getmembers(): > print('extracting "%s"' % member.name) > taro.extract(member) > > I get the error: > > File "/usr/lib/python2.7/tarfile.py", line 556, in seek > raise StreamError("seeking backwards is not allowed") > > Of course, a stream is not seekable. > > Any ideas? Try this: while True: member = taro.next() if member is None: break print('extracting "%s"' % member.name) taro.extract(member) -- https://mail.python.org/mailman/listinfo/python-list
tarfile : secure extract?
In https://docs.python.org/2/library/tarfile.html there is a warning: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of path, e.g. members that have absolute filenames starting with "/" or filenames with two dots "..". My program has to extract tar archives from untrusted sources :-} So far, I ignore files with dangerous pathnames: for member in taro.getmembers(): file = member.name if match(r'^(?i)([a-z]:)?(\.\.)?[/\\]',file): print('ignoring "%s"' % file) else: print('extracting "%s"' % file) taro.extract(member) A better approach would be to rename such files while extracting. Is this possible? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlac...@tik.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ -- https://mail.python.org/mailman/listinfo/python-list
Storing a big amount of path names
Hi! What is the best (shortest memory usage) way to store lots of pathnames in memory where: 1. Path names are pathname=(dirname,filename) 2. There many different dirnames but much less than pathnames 3. dirnames have in general many chars The idea is to share the common dirnames. More realistically not only the pathnames are stored but objects each object being a MyFile containing self.name - getPathname(self) - other stuff class MyFile: __allfiles=[] def __init__(self,dirname,filename): self.dirname=dirname # But I want to share this with other files self.name=filename MyFile.__allfiles.append(self) ... def getPathname(self): return os.path.join(self.dirname,self.name) ... Thanks for any suggestion. Paulo -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On Fri, Feb 12, 2016 at 11:31 AM, Paulo da Silva wrote: > What is the best (shortest memory usage) way to store lots of pathnames > in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. > > More realistically not only the pathnames are stored but objects each > object being a MyFile containing > self.name - > getPathname(self) - > other stuff Just store them in the most obvious way, and don't worry about memory usage. How many path names are you likely to have? A million? You can still afford to have 1KB pathnames and it'll take up no more than a gigabyte of RAM - and most computers throw around gigs of virtual memory like it's nothing. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
Paulo da Silva writes: > What is the best (shortest memory usage) way to store lots of > pathnames in memory I challenge the premise. Why is “shortest memory usage” your criterion for “best”, here? How have you determined that factors like “easily understandable when reading”, or “using standard Python idioms”, are less important? As for “lots of pathnames”, how many are you expecting? Python's built-in container types are highly optimised for quite large amounts of data. Have you measured an implementation with normal built-in container types with your expected quantity of items, and confirmed that the performance is unacceptable? > Thanks for any suggestion. I would suggest that the assumption you have too much data for Python's built-in container types, is an assumption that should be rigorously tested because it is likely not true. -- \ “We suffer primarily not from our vices or our weaknesses, but | `\from our illusions.” —Daniel J. Boorstin, historian, 1914–2004 | _o__) | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On 2016-02-12 00:31, Paulo da Silva wrote: > What is the best (shortest memory usage) way to store lots of > pathnames in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. Well, you can create a dict that has dirname->list(filenames) which will reduce the dirname to a single instance. You could store that dict in the class, shared by all of the instances, though that starts to pick up a code-smell. But unless you're talking about an obscenely large number of dirnames & filenames, or a severely resource-limited machine, just use the default built-ins. If you start to push the boundaries of system resources, then I'd try the "anydbm" module or use the "shelve" module to marshal them out to disk. Finally, you *could* create an actual sqlite database on disk if size really does exceed reasonable system specs. -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
Tim Chase wrote: > On 2016-02-12 00:31, Paulo da Silva wrote: >> What is the best (shortest memory usage) way to store lots of >> pathnames in memory where: >> >> 1. Path names are pathname=(dirname,filename) >> 2. There many different dirnames but much less than pathnames >> 3. dirnames have in general many chars >> >> The idea is to share the common dirnames. > > Well, you can create a dict that has dirname->list(filenames) which > will reduce the dirname to a single instance. You could store that > dict in the class, shared by all of the instances, though that starts > to pick up a code-smell. > > But unless you're talking about an obscenely large number of > dirnames & filenames, or a severely resource-limited machine, just > use the default built-ins. If you start to push the boundaries of > system resources, then I'd try the "anydbm" module or use the > "shelve" module to marshal them out to disk. Finally, you *could* > create an actual sqlite database on disk if size really does exceed > reasonable system specs. > > -tkc > Probably more memory efficient to make a list of lists, and just declare that element[0] of each list is the dirname. That way you're not wasting memory on the unused entryies of the hashtable. But unless the OP has both a) plus of a million entries and b) let's say at least 20 filenames to each dirname, it's not worth doing. Now, if you do really have a million entries, one thing that would help with memory is setting __slots__ for MyFile rather than letting it create an instance dictionary for each one. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix. -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On 2016-02-12 00:31, Paulo da Silva wrote: Hi! What is the best (shortest memory usage) way to store lots of pathnames in memory where: 1. Path names are pathname=(dirname,filename) 2. There many different dirnames but much less than pathnames 3. dirnames have in general many chars The idea is to share the common dirnames. More realistically not only the pathnames are stored but objects each object being a MyFile containing self.name - getPathname(self) - other stuff class MyFile: __allfiles=[] def __init__(self,dirname,filename): self.dirname=dirname # But I want to share this with other files self.name=filename MyFile.__allfiles.append(self) ... def getPathname(self): return os.path.join(self.dirname,self.name) ... Apart from all of the other answers that have been given: >>> p1 = 'foo/bar' >>> p2 = 'foo/bar' >>> id(p1), id(p2) (982008930176, 982008930120) >>> d = {} >>> id(d.setdefault(p1, p1)) 982008930176 >>> id(d.setdefault(p2, p2)) 982008930176 The dict maps equal strings (dirnames) to the same string, so you won't have multiple copies. -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On Fri, Feb 12, 2016 at 2:13 PM, MRAB wrote: > Apart from all of the other answers that have been given: > p1 = 'foo/bar' p2 = 'foo/bar' id(p1), id(p2) > (982008930176, 982008930120) d = {} id(d.setdefault(p1, p1)) > 982008930176 id(d.setdefault(p2, p2)) > 982008930176 > > The dict maps equal strings (dirnames) to the same string, so you won't have > multiple copies. Simpler to let the language do that for you: >>> import sys >>> p1 = sys.intern('foo/bar') >>> p2 = sys.intern('foo/bar') >>> id(p1), id(p2) (139621017266528, 139621017266528) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
Às 03:49 de 12-02-2016, Chris Angelico escreveu: > On Fri, Feb 12, 2016 at 2:13 PM, MRAB wrote: >> Apart from all of the other answers that have been given: >> ... > > Simpler to let the language do that for you: > import sys p1 = sys.intern('foo/bar') p2 = sys.intern('foo/bar') id(p1), id(p2) > (139621017266528, 139621017266528) > I didn't know about id or sys.intern :-) I need to look at them ... As I can understand I can do in MyFile class self.dirname=sys.intern(dirname) # dirname passed as arg to the __init__ and the character string doesn't get repeated. Is this correct? -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On Fri, Feb 12, 2016 at 3:15 PM, Paulo da Silva wrote: > Às 03:49 de 12-02-2016, Chris Angelico escreveu: >> On Fri, Feb 12, 2016 at 2:13 PM, MRAB wrote: >>> Apart from all of the other answers that have been given: >>> > ... >> >> Simpler to let the language do that for you: >> > import sys > p1 = sys.intern('foo/bar') > p2 = sys.intern('foo/bar') > id(p1), id(p2) >> (139621017266528, 139621017266528) >> > > I didn't know about id or sys.intern :-) > I need to look at them ... > > As I can understand I can do in MyFile class > > self.dirname=sys.intern(dirname) # dirname passed as arg to the __init__ > > and the character string doesn't get repeated. > Is this correct? Correct. Two equal strings, passed to sys.intern(), will come back as identical strings, which means they use the same memory. You can have a million references to the same string and it takes up no additional memory. But I reiterate: Don't even bother with this unless you know your program is running short of memory. Start by coding things in the simple and obvious way, and then fix problems only when you see them. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
Às 04:23 de 12-02-2016, Chris Angelico escreveu: > On Fri, Feb 12, 2016 at 3:15 PM, Paulo da Silva > wrote: >> Às 03:49 de 12-02-2016, Chris Angelico escreveu: >>> On Fri, Feb 12, 2016 at 2:13 PM, MRAB wrote: Apart from all of the other answers that have been given: >> ... >>> >>> Simpler to let the language do that for you: >>> >> import sys >> p1 = sys.intern('foo/bar') >> p2 = sys.intern('foo/bar') >> id(p1), id(p2) >>> (139621017266528, 139621017266528) >>> >> >> I didn't know about id or sys.intern :-) >> I need to look at them ... >> >> As I can understand I can do in MyFile class >> >> self.dirname=sys.intern(dirname) # dirname passed as arg to the __init__ >> >> and the character string doesn't get repeated. >> Is this correct? > > Correct. Two equal strings, passed to sys.intern(), will come back as > identical strings, which means they use the same memory. You can have > a million references to the same string and it takes up no additional > memory. I have being playing with this and found that it is not always true! For example: In [1]: def f(s): ...: print(id(sys.intern(s))) ...: In [2]: import sys In [3]: f("12345") 139805480756480 In [4]: f("12345") 139805480755640 In [5]: f("12345") 139805480756480 In [6]: f("12345") 139805480756480 In [7]: f("12345") 139805480750864 I think a dict, as MRAB suggested, is needed. At the end of the store process I may delete the dict. > > But I reiterate: Don't even bother with this unless you know your > program is running short of memory. Yes, it is. This is part of a previous post (sets of equal files) and I need lots of memory for performance reasons. I only have 2G in this computer. I already had implemented a solution. I used two dicts. One to map dirnames to an int handler and the other to map the handler to dir names. At the end I deleted the 1st. one because I only need to get the dirname from the handler. But I thought there should be a better choice. Thanks Paulo -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On Fri, Feb 12, 2016 at 3:45 PM, Paulo da Silva wrote: >> Correct. Two equal strings, passed to sys.intern(), will come back as >> identical strings, which means they use the same memory. You can have >> a million references to the same string and it takes up no additional >> memory. > I have being playing with this and found that it is not always true! > For example: > > In [1]: def f(s): >...: print(id(sys.intern(s))) >...: > > In [2]: import sys > > In [3]: f("12345") > 139805480756480 > > In [4]: f("12345") > 139805480755640 > > In [5]: f("12345") > 139805480756480 > > In [6]: f("12345") > 139805480756480 > > In [7]: f("12345") > 139805480750864 > > I think a dict, as MRAB suggested, is needed. > At the end of the store process I may delete the dict. I'm not 100% sure of what's going on here, but my suspicion is that a string that isn't being used is allowed to be flushed from the dictionary. If you retain a reference to the string (not to its id, but to the string itself), you shouldn't see that change. By doing the dict yourself, you guarantee that ALL the strings will be retained, which can never be _less_ memory than interning them all, and can easily be _more_. >> But I reiterate: Don't even bother with this unless you know your >> program is running short of memory. > > Yes, it is. > This is part of a previous post (sets of equal files) and I need lots of > memory for performance reasons. I only have 2G in this computer. How many files, roughly? Do you ever look at the contents of the files? Most likely, you'll be dwarfing the files' names with their contents. Unless you actually have over two million unique files, each one with over a thousand characters in the name, you can't use all that 2GB with file names. If virtual memory is active, all that'll happen is that you dip into the swapper / page file a bit... and THAT is when you start looking at reducing memory usage. Don't bother optimizing until you need to, and even then, you measure first to see what part of the program actually needs to be optimized. > I already had implemented a solution. I used two dicts. One to map > dirnames to an int handler and the other to map the handler to dir > names. At the end I deleted the 1st. one because I only need to get the > dirname from the handler. But I thought there should be a better choice. If all your dir names are interned, their identities (approximately the values returned by id(), but not quite) will be those handlers for you, without any overhead and without any complexity. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: First Program with Python!
On Tuesday, February 9, 2016 at 5:55:30 AM UTC-8, Anita Goyal wrote: > > Start learning Python from basics to advance levels here... > https://goo.gl/hGzm6o Python experts please translate the webserver "Ghost" Perlmind into Python. http://ai.neocities.org/P6AI_FAQ.html -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
Às 05:02 de 12-02-2016, Chris Angelico escreveu: > On Fri, Feb 12, 2016 at 3:45 PM, Paulo da Silva > wrote: ... >> I think a dict, as MRAB suggested, is needed. >> At the end of the store process I may delete the dict. > > I'm not 100% sure of what's going on here, but my suspicion is that a > string that isn't being used is allowed to be flushed from the > dictionary. You are right. I have tried with a small class and it seems to work. Thanks. ... > > How many files, roughly? Do you ever look at the contents of the > files? Most likely, you'll be dwarfing the files' names with their > contents. Unless you actually have over two million unique files, each > one with over a thousand characters in the name, you can't use all > that 2GB with file names. That's not only the filenames. The more memory I have more expensive but faster algorithm I can implement. Thank you very much for your nice suggestion which also contributed to my Python knowledge. Thank you all who responded. Paulo -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On Fri, 12 Feb 2016 04:02 pm, Chris Angelico wrote: > On Fri, Feb 12, 2016 at 3:45 PM, Paulo da Silva > wrote: >>> Correct. Two equal strings, passed to sys.intern(), will come back as >>> identical strings, which means they use the same memory. You can have >>> a million references to the same string and it takes up no additional >>> memory. >> I have being playing with this and found that it is not always true! It is true, but only for the lifetime of the string. Once the string is garbage collected, it is removed from the cache as well. If you then add the string again, you may not get the same id. py> mystr = "hello world" py> str2 = sys.intern(mystr) py> str3 = "hello world" py> mystr is str2 # same string object, as str2 is interned True py> mystr is str3 # not the same string object False But if we delete all references to the string objects, the intern cache is also flushed, and we may not get the same id: py> del str2, str3 py> id(mystr) # remember this ID number 3079482600 py> del mystr py> id(sys.intern("hello world")) # a new entry in the cache 3079227624 This is the behaviour you want: if a string is completely deleted, you don't want it remaining in the intern cache taking up memory. > I'm not 100% sure of what's going on here, but my suspicion is that a > string that isn't being used is allowed to be flushed from the > dictionary. If you retain a reference to the string (not to its id, > but to the string itself), you shouldn't see that change. By doing the > dict yourself, you guarantee that ALL the strings will be retained, > which can never be _less_ memory than interning them all, and can > easily be _more_. Yep. Back in the early days, interned strings were immortal and lasted forever. That wasted memory, and is no longer the case. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Storing a big amount of path names
On Feb 12, 2016 6:05 AM, "Paulo da Silva" wrote: > > Hi! > > What is the best (shortest memory usage) way to store lots of pathnames > in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. > > More realistically not only the pathnames are stored but objects each > object being a MyFile containing > self.name - > getPathname(self) - > other stuff > > class MyFile: > > __allfiles=[] > > def __init__(self,dirname,filename): > self.dirname=dirname # But I want to share this with other files > self.name=filename > MyFile.__allfiles.append(self) > ... > > def getPathname(self): > return os.path.join(self.dirname,self.name) > what you want is Trie data structure, which won't use extra memory if the basepath of your strings are common. instead of having constructing a char Trie, try to make it as string Trie i.e each directory name is a node and all the files and folders are it's children, each node can be of two types a file and folder. if you come to think about it this is most intuitive way to represent the file structure in your program. you can extract the directory name from the file object by traversing it's parents. I hope this helps. Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight -- https://mail.python.org/mailman/listinfo/python-list
Re: Best programs written completly in Python
On Sunday, August 5, 2007 at 1:14:38 PM UTC+3, Franz Steinhäusler wrote: > Hello NG, > > wWhat are the best programs in your opinion, written entirly > in pyhton, divided into categories like: > a) Games > b) Utilities/System > c) Office > d) Web/Newsreader/Mail/Browser > ... > > I don't want to start a long thread, if a site of such > an discussion already exists, a link will be enough. > > Many thanks in advance! > > -- > Franz Steinhaeusler ankiweb.net + ankisrs.net = amazing! -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling transactions in Python DBI module
Chris Angelico writes: > On Thu, Feb 11, 2016 at 6:59 PM, dieter wrote: >> In my context (web applications), I strongly discourage this use - at least >> when "conn" does not handle subtransactions properly. >> >> In a web application, the main transaction should in general be controlled >> at the request level: a request should either succeed as a whole or have no >> side effects whatsoever. This prohibits local (somewhere >> deep in a component) transaction control. > > Hmm. I'm not 100% convinced that web apps should behave that way; In a web application, the user either sees a success or an error response (for his request). In case of an error, it would be fatal (in the general case) that persistent data would have been inconsistently modified (e.g. an purchase performed; without proper acknowledgent to the user). > but > part of the simplicity comes from requiring that database-dependent > code should not call other database-dependent code, as that would > create a nested transaction. The web applications I am working with are highly modular. Many of those modules manage persistent data in a database and operate on it independently. It would be fatal when a successful database interaction in one module would as a side effect commit database operations in another module. > (That said, though, it's entirely > possible that psycopg2 could handle a nested "with conn" as > "SAVEPOINT" and either "RELEASE SAVEPOINT" or "ROLLBACK TO SAVEPOINT". > But I wouldn't recommend depending on that without confirming it in > the docs.) For "psycopg2", I have done this. > Bear in mind, the rule I gave was a broad and general rule for > students to follow, not a hard-and-fast rule for all databasing. It's > designed such that the student can learn the exceptions later on, but > meanwhile, his code will be correct (if occasionally a little warped > to avoid nesting transactions). I want to stress that in the domain of web applications, local transaction control is not a good rule of thumb. At least there, transaction control should in general belong to the request framework (as this is the atomic user interaction level for web applications) and to local components. -- https://mail.python.org/mailman/listinfo/python-list
Re: [STORY-TIME] THE BDFL AND HIS PYTHON PETTING ZOO
On Saturday, February 6, 2016 at 12:54:41 PM UTC-8, Rick Johnson wrote: > On Wednesday, February 3, 2016 at 12:02:35 AM UTC-6, John Ladasky wrote: > > > Rick, you don't like Python? > > If i didn't like Python, then i would happily let it self- > destruct, yes? The problem is, i *DO* like Python. Python2 > was a great language, but python3 has been a disaster. Heck, > even the BDFL has warned that Python4 cannot introduce as > many backwards incompatible changes as python3. So obviously, > he is admitting that Python3 was a disaster. I had to wait until my favorite packages were ported (numpy, scipy, matplotlib, pandas). But once that happened, I moved from Py2 to Py3 years ago with scarcely a bump, bruise, or scratch. I like lazy evaluation. I think that Unicode handling is vastly improved (and yes, I'm fully aware that exactly two people in this newsgroup don't agree, they make sure we all know). I have encountered few surprises, and nothing that makes my job harder. To be sure, people who are entrenched in the Py2 way of doing things, with a lot of legacy code, have some work to do -- on their code, and possibly on their brains. Keep Py2 if you want it, then. You still have a few more years before the PSF stops maintaining it. If you really like it that much, why not maintain it yourself? -- https://mail.python.org/mailman/listinfo/python-list