Re: Bi-directional sub-process communication
On 11/23/2015 20:29, Cameron Simpson wrote: On 24Nov2015 16:25, Cameron Simpson wrote: Completely untested example code: class ReturnEvent: def __init__(self): self.event = Event() With, of course: def wait(self): return self.event.wait() Of course :-) Ah, the Event() object comes from the threading module. That makes sense. This should work perfectly. Thanks so much for taking the time to help me out! - Israel Brewster Cheers, Cameron Simpson Maintainer's Motto: If we can't fix it, it ain't broke. -- https://mail.python.org/mailman/listinfo/python-list
Re: Psycopg2 pool clarification
Since I've gotten no replies to this, I was wondering if someone could at least confirm which behavior (my expected or my observed) is *supposed* to be the correct? Should a psycopg2 pool keep connections open when returned to the pool (if closed is False), or should it close them as long as there is more than minconn open? i.e is my observed behavior a bug or a feature? On 2017-06-02 15:06, Israel Brewster wrote: I've been using the psycopg2 pool class for a while now, using code similar to the following: pool=ThreadedConnectionPool(0,5,) conn1=pool.getconn() pool.putconn(conn1) repeat later, or perhaps "simultaneously" in a different thread. and my understanding was that the pool logic was something like the following: - create a "pool" of connections, with an initial number of connections equal to the "minconn" argument - When getconn is called, see if there is an available connection. If so, return it. If not, open a new connection and return that (up to "maxconn" total connections) - When putconn is called, return the connection to the pool for re-use, but do *not* close it (unless the close argument is specified as True, documentation says default is False) - On the next request to getconn, this connection is now available and so no new connection will be made - perhaps (or perhaps not), after some time, unused connections would be closed and purged from the pool to prevent large numbers of only used once connections from laying around. However, in some testing I just did, this doesn't appear to be the case, at least based on the postgresql logs. Running the following code: pool=ThreadedConnectionPool(0,5,) conn1=pool.getconn() conn2=pool.getconn() pool.putconn(conn1) pool.putconn(conn2) conn3=pool.getconn() pool.putconn(conn3) produced the following output in the postgresql log: 2017-06-02 14:30:26 AKDT LOG: connection received: host=::1 port=64786 2017-06-02 14:30:26 AKDT LOG: connection authorized: user=logger database=flightlogs 2017-06-02 14:30:35 AKDT LOG: connection received: host=::1 port=64788 2017-06-02 14:30:35 AKDT LOG: connection authorized: user=logger database=flightlogs 2017-06-02 14:30:46 AKDT LOG: disconnection: session time: 0:00:19.293 user=logger database=flightlogs host=::1 port=64786 2017-06-02 14:30:53 AKDT LOG: disconnection: session time: 0:00:17.822 user=logger database=flightlogs host=::1 port=64788 2017-06-02 14:31:15 AKDT LOG: connection received: host=::1 port=64790 2017-06-02 14:31:15 AKDT LOG: connection authorized: user=logger database=flightlogs 2017-06-02 14:31:20 AKDT LOG: disconnection: session time: 0:00:05.078 user=logger database=flightlogs host=::1 port=64790 Since I set the maxconn parameter to 5, and only used 3 connections, I wasn't expecting to see any disconnects - and yet as soon as I do putconn, I *do* see a disconnection. Additionally, I would have thought that when I pulled connection 3, there would have been two connections available, and so it wouldn't have needed to connect again, yet it did. Even if I explicitly say close=False in the putconn call, it still closes the connection and has to open What am I missing? From this testing, it looks like I get no benefit at all from having the connection pool, unless you consider an upper limit to the number of simultaneous connections a benefit? :-) Maybe a little code savings from not having to manually call connect and close after each connection, but that's easily gained by simply writing a context manager. I could get *some* limited benefit by raising the minconn value, but then I risk having connections that are *never* used, yet still taking resources on the DB server. Ideally, it would open as many connections as are needed, and then leave them open for future requests, perhaps with an "idle" timeout. Is there any way to achieve this behavior? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Psycopg2 pool clarification
On 2017-06-06 22:53, dieter wrote: israel writes: Since I've gotten no replies to this, I was wondering if someone could at least confirm which behavior (my expected or my observed) is *supposed* to be the correct? Should a psycopg2 pool keep connections open when returned to the pool (if closed is False), or should it close them as long as there is more than minconn open? i.e is my observed behavior a bug or a feature? You should ask the author[s] of "psycopg2" about the supposed behavior. From my point of view, everything depends on the meaning of the "min" and "max" parameters for the pool. You seem to interprete "max" as "keep as many connections as this open". But it can also be a hard limit in the form "never open more than this number of connections". In the latter case, "min" may mean "keep this many connections open at all time". You are right about my interpretation of "max", and also about the actual meaning. Thus the reason I was asking :-). I did post on the bug report forum, and was informed that the observed behavior was the correct behavior. As such, using psycopg2's pool is essentially worthless for me (plenty of use for it, i'm sure, just not for me/my use case). So let me ask a different, but related, question: Is there a Python library available that gives me the behavior I described in my first post, where connections are "cached" for future use for a time? Or should I just write my own? I didn't find anything with some quick googling, other than middleware servers like pgpool which, while they have the behavior I want (at least from my reading), will still require the overhead of making a connection (perhaps less than direct to postgres? Any performance comparisons out there?), not to mention keeping yet another service configured/running. I would prefer to keep the pool internal to my application, if possible, and simply reuse existing connections rather than making new ones. Thanks! -- https://mail.python.org/mailman/listinfo/python-list
Re: Psycopg2 pool clarification
On 2017-06-08 19:55, Ian Kelly wrote: On Thu, Jun 8, 2017 at 10:47 AM, Israel Brewster wrote: On Jun 7, 2017, at 10:31 PM, dieter wrote: israel writes: On 2017-06-06 22:53, dieter wrote: ... As such, using psycopg2's pool is essentially worthless for me (plenty of use for it, i'm sure, just not for me/my use case). Could you not simply adjust the value for the "min" parameter? If you want at least "n" open connections, then set "min" to "n". Well, sure, if I didn't care about wasting resources (which, I guess many people don't). I could set "n" to some magic number that would always give "enough" connections, such that my application never has to open additional connections, then adjust that number every few months as usage changes. In fact, now that I know how the logic of the pool works, that's exactly what I'm doing until I am confident that my caching replacement is solid. Of course, in order to avoid having to open/close a bunch of connections during the times when it is most critical - that is, when the server is under heavy load - I have to set that number arbitrarily high. Furthermore, that means that much of the time many, if not most, of those connections would be idle. Each connection uses a certain amount of RAM on the server, not to mention using up limited connection slots, so now I've got to think about if my server is sized properly to be able to handle that load not just occasionally, but constantly - when reducing server load by reducing the frequency of connections being opened/closed was the goal in the first place. So all I've done is trade dynamic load for static load - increasing performance at the cost of resources, rather than more intelligently using the available resources. All-in-all, not the best solution, though it does work. Maybe if load was fairly constant it would make more sense though. So like I said *my* use c ase, whi ch is a number of web apps with varying loads, loads that also vary from day-to-day and hour-to-hour. On the other hand, a pool that caches connections using the logic I laid out in my original post would avoid the issue. Under heavy load, it could open additional connections as needed - a performance penalty for the first few users over the min threshold, but only the first few, rather than all the users over a certain threshold ("n"). Those connections would then remain available for the duration of the load, so it doesn't need to open/close numerous connections. Then, during periods of lighter load, the unused connections can drop off, freeing up server resources for other uses. A well-written pool could even do something like see that the available connection pool is running low, and open a few more connections in the background, thus completely avoiding the connection overhead on requests while never having more than a few "extra" connections at any given time. Even if you left of the expiration logic, it would still be an improvement, because while unused connections wouldn't d rop, the "n" open connections could scale up dynamically until you have "enough" connections, without having to figure out and hard-code that "magic number" of open connections. Why wouldn't I want something like that? It's not like its hard to code - took me about an hour and a half to get to a working prototype yesterday. Still need to write tests and add some polish, but it works. Perhaps, though, the common thought is just "throw more hardware at it and keep a lot of connections open at all time?" Maybe I was raised to conservatively, or the company I work for is too poor :-D Psycopg is first and foremost a database adapter. To quote from the psycopg2.pool module documentation, "This module offers a few pure Python classes implementing *simple* connection pooling directly in the client application" (emphasis added). The advertised list of features at http://initd.org/psycopg/features/ doesn't even mention connection pooling. In short, you're getting what you paid for. It sounds like your needs are beyond what the psycopg2.pool module provides. Quite possible. Thus the reason I was looking for clarification on how the module was intended to work - if it doesn't work in the way that I want it to, I need to look elsewhere for a solution. My main reason for posting this thread was that I was expecting it to work one way, but testing showed it working another way, so I was trying to find out if that was intentional or user error. Apparently it's intentional, so there we go - in it's current form at least, my needs are beyond what the psycopg2 pool provides. Fair enough. I suggest looking into a dedicated connection pooler like PgBouncer. You'll find that it's much
PEP 249 Compliant error handling
I have written and maintain a PEP 249 compliant (hopefully) DB API for the 4D database, and I've run into a situation where corrupted string data from the database can cause the module to error out. Specifically, when decoding the string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 86-87: illegal UTF-16 surrogate" error. This makes sense, given that the string data got corrupted somehow, but the question is "what is the proper way to deal with this in the module?" Should I just throw an error on bad data? Or would it be better to set the errors parameter to something like "replace"? The former feels a bit more "proper" to me (there's an error here, so we throw an error), but leaves the end user dead in the water, with no way to retrieve *any* of the data (from that row at least, and perhaps any rows after it as well). The latter option sort of feels like sweeping the problem under the rug, but does at least leave an error character in the string to l et them know there was an error, and will allow retrieval of any good data. Of course, if this was in my own code I could decide on a case-by-case basis what the proper action is, but since this a module that has to work in any situation, it's a bit more complicated. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 Compliant error handling
> On Oct 17, 2017, at 10:35 AM, MRAB wrote: > > On 2017-10-17 18:26, Israel Brewster wrote: >> I have written and maintain a PEP 249 compliant (hopefully) DB API for the >> 4D database, and I've run into a situation where corrupted string data from >> the database can cause the module to error out. Specifically, when decoding >> the string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't decode >> bytes in position 86-87: illegal UTF-16 surrogate" error. This makes sense, >> given that the string data got corrupted somehow, but the question is "what >> is the proper way to deal with this in the module?" Should I just throw an >> error on bad data? Or would it be better to set the errors parameter to >> something like "replace"? The former feels a bit more "proper" to me >> (there's an error here, so we throw an error), but leaves the end user dead >> in the water, with no way to retrieve *any* of the data (from that row at >> least, and perhaps any rows after it as well). The latter option sort of >> feels like sweeping the problem under the rug, but does at least leave an >> error character in the s > tring to > l >> et them know there was an error, and will allow retrieval of any good data. >> Of course, if this was in my own code I could decide on a case-by-case basis >> what the proper action is, but since this a module that has to work in any >> situation, it's a bit more complicated. > If a particular text field is corrupted, then raising UnicodeDecodeError when > trying to get the contents of that field as a Unicode string seems reasonable > to me. > > Is there a way to get the contents as a bytestring, or to get the contents > with a different errors parameter, so that the user has the means to fix it > (if it's fixable)? That's certainly a possibility, if that behavior conforms to the DB API "standards". My concern in this front is that in my experience working with other PEP 249 modules (specifically psycopg2), I'm pretty sure that columns designated as type VARCHAR or TEXT are returned as strings (unicode in python 2, although that may have been a setting I used), not bytes. The other complication here is that the 4D database doesn't use the UTF-8 encoding typically found, but rather UTF-16LE, and I don't know how well this is documented. So not only is the bytes representation completely unintelligible for human consumption, I'm not sure the average end-user would know what decoding to use. In the end though, the main thing in my mind is to maintain "standards" compatibility - I don't want to be returning bytes if all other DB API modules return strings, or visa-versa for that matter. There may be some flexibility there, but as much as possible I want to conform to the majority/standard/whatever --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 Compliant error handling
> On Oct 18, 2017, at 1:46 AM, Abdur-Rahmaan Janhangeer > wrote: > > all corruption systematically ignored but data piece logged in for analysis Thanks. Can you expound a bit on what you mean by "data piece logged in" in this context? I'm not aware of any logging specifications in the PEP 249, and would think that would be more end-user configured rather than module level. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > Abdur-Rahmaan Janhangeer, > Mauritius > abdurrahmaanjanhangeer.wordpress.com > <http://abdurrahmaanjanhangeer.wordpress.com/> > > On 17 Oct 2017 21:43, "Israel Brewster" <mailto:isr...@ravnalaska.net>> wrote: > I have written and maintain a PEP 249 compliant (hopefully) DB API for the 4D > database, and I've run into a situation where corrupted string data from the > database can cause the module to error out. Specifically, when decoding the > string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in > position 86-87: illegal UTF-16 surrogate" error. This makes sense, given that > the string data got corrupted somehow, but the question is "what is the > proper way to deal with this in the module?" Should I just throw an error on > bad data? Or would it be better to set the errors parameter to something like > "replace"? The former feels a bit more "proper" to me (there's an error here, > so we throw an error), but leaves the end user dead in the water, with no way > to retrieve *any* of the data (from that row at least, and perhaps any rows > after it as well). The latter option sort of feels like sweeping the problem > under the rug, but does at least leave an error character in the string to l > et them know there was an error, and will allow retrieval of any good data. > > Of course, if this was in my own code I could decide on a case-by-case basis > what the proper action is, but since this a module that has to work in any > situation, it's a bit more complicated. > --- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > --- > > > > > -- > https://mail.python.org/mailman/listinfo/python-list > <https://mail.python.org/mailman/listinfo/python-list> -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 Compliant error handling
On Oct 17, 2017, at 12:02 PM, MRAB wrote: > > On 2017-10-17 20:25, Israel Brewster wrote: >> >>> On Oct 17, 2017, at 10:35 AM, MRAB >> <mailto:pyt...@mrabarnett.plus.com>> wrote: >>> >>> On 2017-10-17 18:26, Israel Brewster wrote: >>>> I have written and maintain a PEP 249 compliant (hopefully) DB API for the >>>> 4D database, and I've run into a situation where corrupted string data >>>> from the database can cause the module to error out. Specifically, when >>>> decoding the string, I get a "UnicodeDecodeError: 'utf-16-le' codec can't >>>> decode bytes in position 86-87: illegal UTF-16 surrogate" error. This >>>> makes sense, given that the string data got corrupted somehow, but the >>>> question is "what is the proper way to deal with this in the module?" >>>> Should I just throw an error on bad data? Or would it be better to set the >>>> errors parameter to something like "replace"? The former feels a bit more >>>> "proper" to me (there's an error here, so we throw an error), but leaves >>>> the end user dead in the water, with no way to retrieve *any* of the data >>>> (from that row at least, and perhaps any rows after it as well). The >>>> latter option sort of feels like sweeping the problem under the rug, but >>>> does at least leave an error character in the s >>> tring to >>> l >>>> et them know there was an error, and will allow retrieval of any good >>>> data. >>>> Of course, if this was in my own code I could decide on a case-by-case >>>> basis what the proper action is, but since this a module that has to work >>>> in any situation, it's a bit more complicated. >>> If a particular text field is corrupted, then raising UnicodeDecodeError >>> when trying to get the contents of that field as a Unicode string seems >>> reasonable to me. >>> >>> Is there a way to get the contents as a bytestring, or to get the contents >>> with a different errors parameter, so that the user has the means to fix it >>> (if it's fixable)? >> >> That's certainly a possibility, if that behavior conforms to the DB API >> "standards". My concern in this front is that in my experience working with >> other PEP 249 modules (specifically psycopg2), I'm pretty sure that columns >> designated as type VARCHAR or TEXT are returned as strings (unicode in >> python 2, although that may have been a setting I used), not bytes. The >> other complication here is that the 4D database doesn't use the UTF-8 >> encoding typically found, but rather UTF-16LE, and I don't know how well >> this is documented. So not only is the bytes representation completely >> unintelligible for human consumption, I'm not sure the average end-user >> would know what decoding to use. >> >> In the end though, the main thing in my mind is to maintain "standards" >> compatibility - I don't want to be returning bytes if all other DB API >> modules return strings, or visa-versa for that matter. There may be some >> flexibility there, but as much as possible I want to conform to the >> majority/standard/whatever >> > The average end-user might not know which encoding is being used, but > providing a way to read the underlying bytes will give a more experienced > user the means to investigate and possibly fix it: get the bytes, figure out > what the string should be, update the field with the correctly decoded string > using normal DB instructions. I agree, and if I was just writing some random module I'd probably go with it, or perhaps with the suggestion offered by Karsten Hilbert. However, neither answer addresses my actual question, which is "how does the STANDARD (PEP 249 in this case) say to handle this, or, baring that (since the standard probably doesn't explicitly say), how do the MAJORITY of PEP 249 compliant modules handle this?" Not what is the *best* way to handle it, but rather what is the normal, expected behavior for a Python DB API module when presented with bad data? That is, how does psycopg2 behave? pyodbc? pymssql (I think)? Etc. Or is that portion of the behavior completely arbitrary and different for every module? It may well be that one of the suggestions *IS* the normal, expected, behavior, but it sounds more like you are suggesting how you think would be best to handle it, which is appreciated but not actually what I'm asking :-) Sorry if I am being difficult. > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Efficient counting of results
I am working on developing a report that groups data into a two-dimensional array based on date and time. More specifically, date is grouped into categories: day, week-to-date, month-to-date, and year-to-date Then, for each of those categories, I need to get a count of records that fall into the following categories: 0 minutes late, 1-5 minutes late, and 6-15 minutes late where minutes late will be calculated based on a known scheduled time and the time in the record. To further complicate things, there are actually two times in each record, so under the day, week-to-date, month-to-date etc groups, there will be two sets of "late" bins, one for each time. In table form it would look something like this: | day | week-to-date | month-to-date | year-to-date | t1 0min| t1 1-5 min| ... t1 6-15 min | ... t2 0min| ... t2 1-5 min| ... t2 6-15 min | ... So in the extreme scenario of a record that is for the current day, it will be counted into 8 bins: once each for day, week-to-date, month-to-date and year-to-date under the proper "late" bin for the first time in the record, and once each into each of the time groups under the proper "late" bin for the second time in the record. An older record may only be counted twice, under the year-to-date group. A record with no matching schedule is discarded, as is any record that is "late" by more than 15 minutes (those are gathered into a separate report) My initial approach was to simply make dictionaries for each "row" in the table, like so: t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} . . t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} then loop through the records, find the schedule for that record (if any, if not move on as mentioned earlier), compare t1 and t2 against the schedule, and increment the appropriate bin counts using a bunch of if statements. Functional, if ugly. But then I got to thinking: I keep hearing about all these fancy numerical analysis tools for python like pandas and numpy - could something like that help? Might there be a way to simply set up a table with "rules" for the columns and rows, and drop my records into the table, having them automatically counted into the proper bins or something? Or am I over thinking this, and the "simple", if ugly approach is best? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Efficient counting of results
> On Oct 19, 2017, at 9:40 AM, Israel Brewster wrote: > > I am working on developing a report that groups data into a two-dimensional > array based on date and time. More specifically, date is grouped into > categories: > > day, week-to-date, month-to-date, and year-to-date > > Then, for each of those categories, I need to get a count of records that > fall into the following categories: > > 0 minutes late, 1-5 minutes late, and 6-15 minutes late > > where minutes late will be calculated based on a known scheduled time and the > time in the record. To further complicate things, there are actually two > times in each record, so under the day, week-to-date, month-to-date etc > groups, there will be two sets of "late" bins, one for each time. In table > form it would look something like this: > >| day | week-to-date | month-to-date | year-to-date | > > t1 0min| > t1 1-5 min| ... > t1 6-15 min | ... > t2 0min| ... > t2 1-5 min| ... > t2 6-15 min | ... > > So in the extreme scenario of a record that is for the current day, it will > be counted into 8 bins: once each for day, week-to-date, month-to-date and > year-to-date under the proper "late" bin for the first time in the record, > and once each into each of the time groups under the proper "late" bin for > the second time in the record. An older record may only be counted twice, > under the year-to-date group. A record with no matching schedule is > discarded, as is any record that is "late" by more than 15 minutes (those are > gathered into a separate report) > > My initial approach was to simply make dictionaries for each "row" in the > table, like so: > > t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > . > . > t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} > > then loop through the records, find the schedule for that record (if any, if > not move on as mentioned earlier), compare t1 and t2 against the schedule, > and increment the appropriate bin counts using a bunch of if statements. > Functional, if ugly. But then I got to thinking: I keep hearing about all > these fancy numerical analysis tools for python like pandas and numpy - could > something like that help? Might there be a way to simply set up a table with > "rules" for the columns and rows, and drop my records into the table, having > them automatically counted into the proper bins or something? Or am I over > thinking this, and the "simple", if ugly approach is best? I suppose I should mention: my data source is the results of a psycopg2 query, so a "record" is a tuple or dictionary (depending on how I want to set up the cursor) > > --- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > --- > > > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Efficient counting of results
> On Oct 19, 2017, at 10:02 AM, Stefan Ram wrote: > > Israel Brewster writes: >> t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,} >> increment the appropriate bin counts using a bunch of if statements. > > I can't really completely comprehend your requirements > specification, you might have perfectly described it all and > it's just too complicated for me to comprehend, but I just > would like to add that there are several ways to implement a > "two-dimensional" matrix. You can also imagine your > dictionary like this: > > example = > { 'd10': 0, 'd15': 0, 'd20': 0, 'd215': 0, > 'w10': 0, 'w15': 0, 'w20': 0, 'w215': 0, > 'm10': 0, 'm15': 0, 'm20': 0, 'm215': 0, > 'y10': 0, 'y15': 0, 'y20': 0, 'y215': 0 } > > Then, when the categories are already in two variables, say, > »a« (»d«, »w«, »m«, or »y«) and »b« (»10«, »15«, »20«, or > »215«), you can address the appropriate bin as Oh, I probably was a bit weak on the explanation somewhere. I'm still wrapping *my* head around some of the details. That's what makes it fun :-) If it helps, my data would look something like this: [ (date, key, t1, t2), (date, key, t1, t2) . . ] Where the date and the key are what is used to determine what "on-time" is for the record, and thus which "late" bin to put it in. So if the date of the first record was today, t1 was on-time, and t2 was 5 minutes late, then I would need to increment ALL of the following (using your data structure from above): d10, w10, m10, y10, d25, w25, m25 AND y25 Since this record counts not just for the current day, but also for week-to-date, month-to-date and year-to-date. Basically, as the time categories get larger, the percentage of the total records included in that date group also gets larger. The year-to-date group will include all records, grouped by lateness, the daily group will only include todays records. Maybe that will help clear things up. Or not. :-) > > example[ a + b ]+= 1 Not quite following the logic here. Sorry. > > . (And to not have to initialized the entries to zero, > class collections.defaultdict might come in handy.) Yep, those are handy in many places. Thanks for the suggestion. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Efficient counting of results
-time, 1-5 minutes late, etc? What about this entire month (including the given date)? What about this year (again, including the given date and month)? How about arrivals - same questions. As you can hopefully see now, if a departure happened this week, it probably also happened this month (although that is not necessarily the case, since weeks can cross month boundaries), and if it happened this date or this month, it *definitely* happened this year. As such, a given departure *likely* will be counted in multiple date "groups", if you will. The end result should be a table like the one I posted in the original question: time frame covered on the horizontal axis (YTD, MTD etc.), and "late" groups for T1 and T2 on the vertical. > You want to process all your records, and decide "as of > now, how late is each record", and then report *cumulative* subtotals for a > number of arbitrary groups: not late yet, five minutes late, one day late, > one year late, etc. Just to clarify, as stated, the late groups are not-late, 1-5 minutes late, and 6-15 minutes late. Also as stated in the original message, anything over 15 minutes late is dealt with separately, and therefore ignored for the purposes of this report. > > Suggestion: > > Start with just the "activation time" and "now", and calculate the difference. > If they are both given in seconds, you can just subtract: > >lateness = now - activation_time > > to determine how late that record is. If they are Datetime objects, use a > Timedelta object. > > That *single* computed field, the lateness, is enough to determine which > subtotals need to be incremented. Well, *two* computed fields, one for T1 and one for T2, which are counted separately. > Start by dividing all of time into named > buckets, in numeric order: > > ... > for record in records: >lateness = now - record.activation_date >for end, bucket in buckets: >if lateness <= end: >bucket.append(record) >else: >break > > And you're done! > > If you want the *number of records* in a particular bucket, you say: > > len(bucket) > > If you want the total record amount, you say: > > sum(record.total for record in bucket) > > > (assuming your records also have a "total" field, if they're invoices say). > > > I hope that's even vaguely helpful. In a sense, in that it supports my initial approach. As Stefan Ram pointed out, there is nothing wrong with the solution I have: simply using if statements around the calculated lateness of t1 and t2 to increment the appropriate counters. I was just thinking there might be tools to make the job easier/cleaner/more efficient. From the responses I have gotten, it would seem that that is likely not the case, so I'll just say "thank you all for your time", and let the matter rest. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > > > >> Maybe that will help clear things up. Or not. :-) > > > Not even a tiny bit :-( > > > > > > -- > Steve > “Cheer up,” they said, “things could be worse.” So I cheered up, and sure > enough, things got worse. > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Save non-pickleable variable?
tldr: I have an object that can't be picked. Is there any way to do a "raw" dump of the binary data to a file, and re-load it later? Details: I am using a java (I know, I know - this is a python list. I'm not asking about the java - honest!) library (Jasper Reports) that I access from python using py4j (www.py4j.org <http://www.py4j.org/>). At one point in my code I call a java function which, after churning on some data in a database, returns an object (a jasper report object populated with the final report data) that I can use (via another java call) to display the results in a variety of formats (HTML, PDF, XLS, etc). At the time I get the object back, I use it to display the results in HTML format for quick display, but the user may or may not also want to get a PDF copy in the near future. Since it can take some time to generate this object, and also since the data may change between when I do the HTML display and when the user requests a PDF (if they do at all), I would like to save this object for potential future re-use. Because it might be large, and there is actually a fairly good chance the user won't need it again, I'd like to save it in a temp file (tat would be deleted when the user logs out) rather than in memory. Unfortunately, since this is an object created by and returned from a java function, not a native python object, it is not able to be pickled (as the suggestion typically is), at least to my knowledge. Given that, is there any way I can write out the "raw" binary data to a file, and read it back in later? Or some other way to be able to save this object? It is theoretically possible that I could do it on the java side, i.e. the library may have some way of writing out the file, but obviously I wouldn't expect anyone here to know anything about that - I'm just asking about the python side :-) --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Save non-pickleable variable?
On Oct 20, 2017, at 11:09 AM, Stefan Ram wrote: > > Israel Brewster writes: >> Given that, is there any way I can write out the "raw" binary >> data to a file > > If you can call into the Java SE library, you can try > > docs.oracle.com/javase/9/docs/api/java/io/ObjectOutputStream.html#writeObject-java.lang.Object- > > , e.g.: > > public static void save > ( final java.lang.String path, final java.lang.Object object ) > { try > { final java.io.FileOutputStream fileOutputStream >= new java.io.FileOutputStream( path ); > >final java.io.ObjectOutputStream objectOutputStream >= new java.io.ObjectOutputStream( fileOutputStream ); > >objectOutputStream.writeObject( object ); > >objectOutputStream.close(); } > > catch( final java.io.IOException iOException ) > { /* application-specific code */ }} > > >> , and read it back in later? > > There's a corresponding »readObject« method in > »java.io.ObjectInputStream«. E.g., > > public static java.lang.Object load( final java.lang.String path ) > { > java.io.FileInputStream fileInputStream = null; > > java.io.ObjectInputStream objectInputStream = null; > > java.lang.Object object = null; > > try > { fileInputStream = new java.io.FileInputStream( path ); > >objectInputStream = new java.io.ObjectInputStream >( fileInputStream ); > >object = objectInputStream.readObject(); > >objectInputStream.close(); } > > catch( final java.io.IOException iOException ) > { java.lang.System.out.println( iOException ); } > > catch > ( final java.lang.ClassNotFoundException classNotFoundException ) > { java.lang.System.out.println( classNotFoundException ); } > > return object; } > > However, it is possible that not all objects can be > meaningfully saved and restored in that way. Thanks for the information. In addition to what you suggested, it may be possible that the Java library itself has methods for saving this object - I seem to recall the methods for displaying the data having options to read from files (rather than from the Java object directly like I'm doing), and it wouldn't make sense to load from a file unless you could first create said file by some method. I'll investigate solutions java-side. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Thread safety issue (I think) with defaultdict
A question that has arisen before (for example, here: https://mail.python.org/pipermail/python-list/2010-January/565497.html <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is the question of "is defaultdict thread safe", with the answer generally being a conditional "yes", with the condition being what is used as the default value: apparently default values of python types, such as list, are thread safe, whereas more complicated constructs, such as lambdas, make it not thread safe. In my situation, I'm using a lambda, specifically: lambda: datetime.min So presumably *not* thread safe. My goal is to have a dictionary of aircraft and when they were last "seen", with datetime.min being effectively "never". When a data point comes in for a given aircraft, the data point will be compared with the value in the defaultdict for that aircraft, and if the timestamp on that data point is newer than what is in the defaultdict, the defaultdict will get updated with the value from the datapoint (not necessarily current timestamp, but rather the value from the datapoint). Note that data points do not necessarily arrive in chronological order (for various reasons not applicable here, it's just the way it is), thus the need for the comparison. When the program first starts up, two things happen: 1) a thread is started that watches for incoming data points and updates the dictionary as per above, and 2) the dictionary should get an initial population (in the main thread) from hard storage. The behavior I'm seeing, however, is that when step 2 happens (which generally happens before the thread gets any updates), the dictionary gets populated with 56 entries, as expected. However, none of those entries are visible when the thread runs. It's as though the thread is getting a separate copy of the dictionary, although debugging says that is not the case - printing the variable from each location shows the same address for the object. So my questions are: 1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary. 2) How can I fix this? Note: I really don't care if the "initial" update happens after the thread receives a data point or two, and therefore overwrites one or two values. I just need the dictionary to be fully populated at some point early in execution. In usage, the dictionary is used to see of an aircraft has been seen "recently", so if the most recent datapoint gets overwritten with a slightly older one from disk storage, that's fine - it's just if it's still showing datetime.min because we haven't gotten in any datapoint since we launched the program, even though we have "recent" data in disk storage thats a problem. So I don't care about the obvious race condition between the two operations, just that the end result is a populated dictionary. Note also that as datapoint come in, they are being written to disk, so the disk storage doesn't lag significantly anyway. The framework of my code is below: File: watcher.py last_points = defaultdict(lambda:datetime.min) # This function is launched as a thread using the threading module when the first client connects def watch(): while true: pointtime= if last_points[] < pointtime: last_points[]=pointtime #DEBUGGING print("At update:", len(last_points)) File: main.py: from .watcher import last_points # This function will be triggered by a web call from a client, so could happen at any time # Client will call this function immediately after connecting, as well as in response to various user actions. def getac(): for record in aclist: last_points[]=record_timestamp #DEBUGGING print("At get AC:", len(last_points)) --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread safety issue (I think) with defaultdict
Let me rephrase the question, see if I can simplify it. I need to be able to access a defaultdict from two different threads - one thread that responds to user requests which will populate the dictionary in response to a user request, and a second thread that will keep the dictionary updated as new data comes in. The value of the dictionary will be a timestamp, with the default value being datetime.min, provided by a lambda: lambda: datetime.min At the moment my code is behaving as though each thread has a *separate* defaultdict, even though debugging shows the same addresses - the background update thread never sees the data populated into the defaultdict by the main thread. I was thinking race conditions or the like might make it so one particular loop of the background thread occurs before the main thread, but even so subsequent loops should pick up on the changes made by the main thread. How can I *properly* share a dictionary like object between two threads, with both threads seeing the updates made by the other? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > On Oct 31, 2017, at 9:38 AM, Israel Brewster wrote: > > A question that has arisen before (for example, here: > https://mail.python.org/pipermail/python-list/2010-January/565497.html > <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is > the question of "is defaultdict thread safe", with the answer generally being > a conditional "yes", with the condition being what is used as the default > value: apparently default values of python types, such as list, are thread > safe, whereas more complicated constructs, such as lambdas, make it not > thread safe. In my situation, I'm using a lambda, specifically: > > lambda: datetime.min > > So presumably *not* thread safe. > > My goal is to have a dictionary of aircraft and when they were last "seen", > with datetime.min being effectively "never". When a data point comes in for a > given aircraft, the data point will be compared with the value in the > defaultdict for that aircraft, and if the timestamp on that data point is > newer than what is in the defaultdict, the defaultdict will get updated with > the value from the datapoint (not necessarily current timestamp, but rather > the value from the datapoint). Note that data points do not necessarily > arrive in chronological order (for various reasons not applicable here, it's > just the way it is), thus the need for the comparison. > > When the program first starts up, two things happen: > > 1) a thread is started that watches for incoming data points and updates the > dictionary as per above, and > 2) the dictionary should get an initial population (in the main thread) from > hard storage. > > The behavior I'm seeing, however, is that when step 2 happens (which > generally happens before the thread gets any updates), the dictionary gets > populated with 56 entries, as expected. However, none of those entries are > visible when the thread runs. It's as though the thread is getting a separate > copy of the dictionary, although debugging says that is not the case - > printing the variable from each location shows the same address for the > object. > > So my questions are: > > 1) Is this what it means to NOT be thread safe? I was thinking of race > conditions where individual values may get updated wrong, but this apparently > is overwriting the entire dictionary. > 2) How can I fix this? > > Note: I really don't care if the "initial" update happens after the thread > receives a data point or two, and therefore overwrites one or two values. I > just need the dictionary to be fully populated at some point early in > execution. In usage, the dictionary is used to see of an aircraft has been > seen "recently", so if the most recent datapoint gets overwritten with a > slightly older one from disk storage, that's fine - it's just if it's still > showing datetime.min because we haven't gotten in any datapoint since we > launched the program, even though we have "recent" data in disk storage thats > a problem. So I don't care about the obvious race condition between the two > operations, just that the end result is a populated dictionary. Note also > that as datapoint come in, they are being written to disk, so the disk > storage doesn't lag significantly anyway. > > The framework of my code is below: > > File: watcher.py > > last_points = defaultdict(lambda:datetime.min) > > #
Re: Thread safety issue (I think) with defaultdict
On Nov 1, 2017, at 9:04 AM, Israel Brewster wrote: > > Let me rephrase the question, see if I can simplify it. I need to be able to > access a defaultdict from two different threads - one thread that responds to > user requests which will populate the dictionary in response to a user > request, and a second thread that will keep the dictionary updated as new > data comes in. The value of the dictionary will be a timestamp, with the > default value being datetime.min, provided by a lambda: > > lambda: datetime.min > > At the moment my code is behaving as though each thread has a *separate* > defaultdict, even though debugging shows the same addresses - the background > update thread never sees the data populated into the defaultdict by the main > thread. I was thinking race conditions or the like might make it so one > particular loop of the background thread occurs before the main thread, but > even so subsequent loops should pick up on the changes made by the main > thread. > > How can I *properly* share a dictionary like object between two threads, with > both threads seeing the updates made by the other? For what it's worth, if I insert a print statement in both threads (which I am calling "Get AC", since that is the function being called in the first thread, and "update", since that is the purpose of the second thread), I get the following output: Length at get AC: 54 ID: 4524152200 Time: 2017-11-01 09:41:24.474788 Length At update: 1 ID: 4524152200 Time: 2017-11-01 09:41:24.784399 Length At update: 2 ID: 4524152200 Time: 2017-11-01 09:41:25.228853 Length At update: 3 ID: 4524152200 Time: 2017-11-01 09:41:25.530434 Length At update: 4 ID: 4524152200 Time: 2017-11-01 09:41:25.532073 Length At update: 5 ID: 4524152200 Time: 2017-11-01 09:41:25.682161 Length At update: 6 ID: 4524152200 Time: 2017-11-01 09:41:26.807127 ... So the object ID hasn't changed as I would expect it to if, in fact, we have created a separate object for the thread. And the first call that populates it with 54 items happens "well" before the first update call - a full .3 seconds, which I would think would be an eternity is code terms. So it doesn't even look like it's a race condition causing the issue. It seems to me this *has* to be something to do with the use of threads, but I'm baffled as to what. > --- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > --- > > > > >> On Oct 31, 2017, at 9:38 AM, Israel Brewster wrote: >> >> A question that has arisen before (for example, here: >> https://mail.python.org/pipermail/python-list/2010-January/565497.html >> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is >> the question of "is defaultdict thread safe", with the answer generally >> being a conditional "yes", with the condition being what is used as the >> default value: apparently default values of python types, such as list, are >> thread safe, whereas more complicated constructs, such as lambdas, make it >> not thread safe. In my situation, I'm using a lambda, specifically: >> >> lambda: datetime.min >> >> So presumably *not* thread safe. >> >> My goal is to have a dictionary of aircraft and when they were last "seen", >> with datetime.min being effectively "never". When a data point comes in for >> a given aircraft, the data point will be compared with the value in the >> defaultdict for that aircraft, and if the timestamp on that data point is >> newer than what is in the defaultdict, the defaultdict will get updated with >> the value from the datapoint (not necessarily current timestamp, but rather >> the value from the datapoint). Note that data points do not necessarily >> arrive in chronological order (for various reasons not applicable here, it's >> just the way it is), thus the need for the comparison. >> >> When the program first starts up, two things happen: >> >> 1) a thread is started that watches for incoming data points and updates the >> dictionary as per above, and >> 2) the dictionary should get an initial population (in the main thread) from >> hard storage. >> >> The behavior I'm seeing, however, is that when step 2 happens (which >> generally happens before the thread gets any updates), the dictionary gets >> populated with 56 entries, as expected. However, none of those entries are >> visible when the thread runs. It's a
Re: Thread safety issue (I think) with defaultdict
On Nov 1, 2017, at 9:58 AM, Ian Kelly wrote: > > On Tue, Oct 31, 2017 at 11:38 AM, Israel Brewster > wrote: >> A question that has arisen before (for example, here: >> https://mail.python.org/pipermail/python-list/2010-January/565497.html >> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is >> the question of "is defaultdict thread safe", with the answer generally >> being a conditional "yes", with the condition being what is used as the >> default value: apparently default values of python types, such as list, are >> thread safe, > > I would not rely on this. It might be true for current versions of > CPython, but I don't think there's any general guarantee and you could > run into trouble with other implementations. Right, completely agreed. Kinda feels "dirty" to rely on things like this to me. > >> [...] > > [...] You could use a regular dict and just check if > the key is present, perhaps with the additional argument to .get() to > return a default value. True. Using defaultdict is simply saves having to stick the same default in every call to get(). DRY principal and all. That said, see below - I don't think the defaultdict is the issue. > > Individual lookups and updates of ordinary dicts are atomic (at least > in CPython). A lookup followed by an update is not, and this would be > true for defaultdict as well. > >> [...] >> 1) Is this what it means to NOT be thread safe? I was thinking of race >> conditions where individual values may get updated wrong, but this >> apparently is overwriting the entire dictionary. > > No, a thread-safety issue would be something like this: > >account[user] = account[user] + 1 > > where the value of account[user] could potentially change between the > time it is looked up and the time it is set again. That's what I thought - changing values/different values from expected, not missing values. All that said, I just had a bit of an epiphany: the main thread is actually a Flask app, running through UWSGI with multiple *processes*, and using the flask-uwsgi-websocket plugin, which further uses greenlets. So what I was thinking was simply a separate thread was, in reality, a completely separate *process*. I'm sure that makes a difference. So what's actually happening here is the following: 1) the main python process starts, which initializes the dictionary (since it is at a global level) 2) uwsgi launches off a bunch of child worker processes (10 to be exact, each of which is set up with 10 gevent threads) 3a) a client connects (web socket connection to be exact). This connection is handled by an arbitrary worker, and an arbitrary green thread within that worker, based on UWSGI algorithms. 3b) This connection triggers launching of a *true* thread (using the python threading library) which, presumably, is now a child thread of that arbitrary uwsgi worker. <== BAD THING, I would think 4) The client makes a request for the list, which is handled by a DIFFERENT (presumably) arbitrary worker process and green thread. So the end result is that the thread that "updates" the dictionary, and the thread that initially *populates* the dictionary are actually running in different processes. In fact, any given request could be in yet another process, which would seem to indicate that all bets are off as to what data is seen. Now that I've thought through what is really happening, I think I need to re-architect things a bit here. For one thing, the update thread should be launched from the main process, not an arbitrary UWSGI worker. I had launched it from the client connection because there is no point in having it running if there is no one connected, but I may need to launch it from the __init__.py file instead. For another thing, since this dictionary will need to be accessed from arbitrary worker processes, I'm thinking I may need to move it to some sort of external storage, such as a redis database. Oy, I made my life complicated :-) > That said it's not > obvious to me what your problem actually is. > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread safety issue (I think) with defaultdict
> On Nov 1, 2017, at 4:53 PM, Steve D'Aprano wrote: > > On Thu, 2 Nov 2017 05:53 am, Israel Brewster wrote: > > [...] >> So the end result is that the thread that "updates" the dictionary, and the >> thread that initially *populates* the dictionary are actually running in >> different processes. > > If they are in different processes, that would explain why the second > (non)thread sees an empty dict even after the first thread has populated it: > > > # from your previous post >> Length at get AC: 54 ID: 4524152200 Time: 2017-11-01 09:41:24.474788 >> Length At update: 1 ID: 4524152200 Time: 2017-11-01 09:41:24.784399 >> Length At update: 2 ID: 4524152200 Time: 2017-11-01 09:41:25.228853 > > > You cannot rely on IDs being unique across different processes. Its an > unfortunately coincidence(!) that they end up with the same ID. I think it's more than a coincidence, given that it is 100% reproducible. Plus, in an earlier debug test I was calling print() on the defaultdict object, which gives output like "", where presumably the 0x1066467f0 is a memory address (correct me if I am wrong in that). In every case, that address was the same. So still a bit puzzling. > > Or possibly there's some sort of weird side-effect or bug in Flask that, when > it shares the dict between two processes (how?) it clears the dict. Well, it's UWSGI that is creating the processes, not Flask, but that's semantics :-) The real question though is "how does python handle such situations?" because, really, there would be no difference (I wouldn't think) between what is happening here and what is happening if you were to create a new process using the multiprocessing library and reference a variable created outside that process. In fact, I may have to try exactly that, just to see what happens. > > Or... have you considered the simplest option, that your update thread clears > the dict when it is first called? Since you haven't shared your code with us, > I cannot rule out a simple logic error like this: > > def launch_update_thread(dict): >dict.clear() ># code to start update thread Actually, I did share my code. It's towards the end of my original message. I cut stuff out for readability/length, but nothing having to do with the dictionary in question. So no, clear is never called, nor any other operation that could clear the dict. > > >> In fact, any given request could be in yet another >> process, which would seem to indicate that all bets are off as to what data >> is seen. >> >> Now that I've thought through what is really happening, I think I need to >> re-architect things a bit here. > > Indeed. I've been wondering why you are using threads at all, since there > doesn't seem to be any benefit to initialising the dict and updating it in > different thread. Now I learn that your architecture is even more complex. I > guess some of that is unavailable, due to it being a web app, but still. What it boils down to is this: I need to update this dictionary in real time as data flows in. Having that update take place in a separate thread enables this update to happen without interfering with the operation of the web app, and offloads the responsibility for deciding when to switch to the OS. There *are* other ways to do this, such as using gevent greenlets or asyncio, but simply spinning off a separate thread is the easiest/simplest option, and since it is a long-running thread the overhead of spinning off the thread (as opposed to a gevent style interlacing) is of no consequence. As far as the initialization, that happens in response to a user request, at which point I am querying the data anyway (since the user asked for it). The idea is I already have the data, since the user asked for it, why not save it in this dict rather than waiting to update the dict until new data comes in? I could, of course, do a separate request for the data in the same thread that updates the dict, but there doesn't seem to be any purpose in that, since until someone requests the data, I don't need it for anything. > > >> For one thing, the update thread should be >> launched from the main process, not an arbitrary UWSGI worker. I had >> launched it from the client connection because there is no point in having >> it running if there is no one connected, but I may need to launch it from >> the __init__.py file instead. For another thing, since this dictionary will >> need to be accessed from arbitrary worker processes, I'm thinking I may need >> to move it to some sort of external storage, such as a redis database > > That sounds awful. What if the arbitrary worker
Share unpickleable object across processes
I have a Flask/UWSGI web app that serves up web socket connections. When a web socket connection is created, I want to store a reference to said web socket so I can do things like write messages to every connected socket/disconnect various sockets/etc. UWSGI, however, launches multiple child processes which handle incoming connections, so the data structure that stores the socket connections needs to be shared across all said processes. How can I do this? Tried so far: 1) using a multiprocessing Manager(), from which I have gotten a dict(). This just gives me "BlockingIOError: [Errno 35] Resource temporarily unavailable" errors whenever I try to access the dict object. 2) Using redis/some other third-party store. This fails because it requires you to be able to pickle the object, and the web socket objects I'm getting are not pickle able. In C I might do something like store a void pointer to the object, then cast it to the correct object type, but that's not an option in python. So how can I get around this issue? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Share unpickleable object across processes
On Nov 2, 2017, at 12:30 PM, Chris Angelico wrote: > > On Fri, Nov 3, 2017 at 5:54 AM, Israel Brewster wrote: >> I have a Flask/UWSGI web app that serves up web socket connections. When a >> web socket connection is created, I want to store a reference to said web >> socket so I can do things like write messages to every connected >> socket/disconnect various sockets/etc. UWSGI, however, launches multiple >> child processes which handle incoming connections, so the data structure >> that stores the socket connections needs to be shared across all said >> processes. How can I do this? >> > > You're basically going to need to have a single process that manages > all the socket connections. Do you actually NEED multiple processes to > do your work? If you can do it with multiple threads in a single > process, you'll be able to share your socket info easily. Otherwise, > you could have one process dedicated to managing the websockets, and > all the others message that process saying "please send this to all > processes". Ok, that makes sense, but again: it's UWSGI that creates the processes, not me. I'm not creating *any* processes or threads. Aside from telling UWSGI to only use a single worker, I have no control over what happens where. But maybe that's what I need to do? > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Share unpickleable object across processes
On Nov 2, 2017, at 11:15 AM, Stefan Ram wrote: > > Israel Brewster writes: >> the data structure that stores the socket connections needs >> to be shared across all said processes. > > IIRC that's the difference between threads and > processes: threads share a common memory. > > You can use the standard module mmap to share > data between processes. > > If it's not pickleable, but if you can write code > to serialize it to a text format yourself, you > can share that text representation via, er, sockets. If I could serialize it to a text format, then I could pickle said text format and store it in redis/some other third party store. :-) > >> In C I might do something like store a void pointer to the >> object, then cast it to the correct object type > > Restrictions of the OS or MMU even apply to > C code. Sure, I was just talking in general "ideas". I'm not saying I tried it or it would work. > >> , but that's not an option in python. So how can I get around >> this issue? > > You can always write parts of a CPython program > in C, for example, using Cython. True, but I just need to be able to share this piece of data - I don't want to reinvent the wheel just to write an app that uses web sockets! I *must* be thinking about this wrong. Take even a basic chat app that uses websockets. Client a, which connected to process 1, sends a message to the server. There are three other clients connected, each of which needs to receive said message. Given that the way UWSGI works said clients could have connected to any one of the worker processes, how can the server push the message out to *all* clients? What am I missing here? > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Share unpickleable object across processes
On Nov 2, 2017, at 12:36 PM, Chris Angelico wrote: > > On Fri, Nov 3, 2017 at 7:35 AM, Israel Brewster <mailto:isr...@ravnalaska.net>> wrote: >> On Nov 2, 2017, at 12:30 PM, Chris Angelico wrote: >>> >>> On Fri, Nov 3, 2017 at 5:54 AM, Israel Brewster >>> wrote: >>>> I have a Flask/UWSGI web app that serves up web socket connections. When a >>>> web socket connection is created, I want to store a reference to said web >>>> socket so I can do things like write messages to every connected >>>> socket/disconnect various sockets/etc. UWSGI, however, launches multiple >>>> child processes which handle incoming connections, so the data structure >>>> that stores the socket connections needs to be shared across all said >>>> processes. How can I do this? >>>> >>> >>> You're basically going to need to have a single process that manages >>> all the socket connections. Do you actually NEED multiple processes to >>> do your work? If you can do it with multiple threads in a single >>> process, you'll be able to share your socket info easily. Otherwise, >>> you could have one process dedicated to managing the websockets, and >>> all the others message that process saying "please send this to all >>> processes". >> >> Ok, that makes sense, but again: it's UWSGI that creates the processes, not >> me. I'm not creating *any* processes or threads. Aside from telling UWSGI to >> only use a single worker, I have no control over what happens where. But >> maybe that's what I need to do? >> > > That's exactly what I mean, yeah. UWSGI should be able to be told to > use threads instead of processes. I don't know it in detail, but a > cursory look at the docos suggests that it's happy to use either (or > even both). Gotcha, thanks. The hesitation I have there is that the UWSGI config is a user setting. Sure, I can set up my install to only run one process, but what if someone else tries to use my code, and they set up UWSGI to run multiple? I hate the idea of my code being so fragile that a simple user setting change which I have no control over can break it. But it is what it is, and if that's the only option, I'll just put a note in the readme to NEVER, under any circumstances, set UWSGI to use multiple processes when running this app and call it good :-) > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > <https://mail.python.org/mailman/listinfo/python-list> -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread safety issue (I think) with defaultdict
--- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > On Nov 3, 2017, at 7:11 AM, Rhodri James wrote: > > On 03/11/17 14:50, Chris Angelico wrote: >> On Fri, Nov 3, 2017 at 10:26 PM, Rhodri James wrote: >>> On 02/11/17 20:24, Chris Angelico wrote: >>>> >>>> Thank you. I've had this argument with many people, smart people (like >>>> Steven), people who haven't grokked that all concurrency has costs - >>>> that threads aren't magically more dangerous than other options. >>> >>> >>> I'm with Steven. To be fair, the danger with threads is that most people >>> don't understand thread-safety, and in particular don't understand either >>> that they have a responsibility to ensure that shared data access is done >>> properly or what the cost of that is. I've seen far too much thread-based >>> code over the years that would have been markedly less buggy and not much >>> slower if it had been written sequentially. >> Yes, but what you're seeing is that *concurrent* code is more >> complicated than *sequential* code. Would the code in question have >> been less buggy if it had used multiprocessing instead of >> multithreading? What if it used explicit yield points? > > My experience with situations where I can do a reasonable comparison is > limited, but the answer appears to be "Yes". > Multiprocessing >> brings with it a whole lot of extra complications around moving data >> around. > > People generally understand how to move data around, and the mistakes are > usually pretty obvious when they happen. I think the existence of this thread indicates otherwise :-) This mistake was far from obvious, and clearly I didn't understand properly how to move data around *between processes*. Unless you are just saying I am ignorant or something? :-) > People may not understand how to move data around efficiently, but that's a > separate argument. > > Multithreading brings with it a whole lot of extra >> complications around NOT moving data around. > > I think this involves more subtle bugs that are harder to spot. Again, the existence of this thread indicates otherwise. This bug was quite subtile and hard to spot. It was only when I started looking at how many times a given piece of code was called (specifically, the part that handled data coming in for which there wasn't an entry in the dictionary) that I spotted the problem. If I hadn't had logging in place in that code block, I would have never realized it wasn't working as intended. You don't get much more subtile than that. And, furthermore, it only existed because I *wasn't* using threads. This bug simply doesn't exist in a threaded model, only in a multiprocessing model. Yes, the *explanation* of the bug is simple enough - each process "sees" a different value, since memory isn't shared - but the bug in my code was neither obvious or easy to spot, at least until you knew what was happening. > People seem to find it harder to reason about atomicity and realising that > widely separated pieces of code may interact unexpectedly. > > Yield points bring with >> them the risk of locking another thread out unexpectedly (particularly >> since certain system calls aren't async-friendly on certain OSes). > > I've got to admit I find coroutines straightforward, but I did cut my teeth > on a cooperative OS. It certainly makes the atomicity issues easier to deal > with. I still can't claim to understand them. Threads? No problem. Obviously I'm still lacking some understanding of how data works in the multiprocessing model, however. > > All >> three models have their pitfalls. > > Assuredly. I just think threads are soggier and hard to light^W^W^W^W^W > prone to subtler and more mysterious-looking bugs. And yet, this thread exists because of a subtle and mysterious-looking bug with multiple *processes* that doesn't exist with multiple *threads*. Thus the point - threads are no *worse* - just different - than any other concurrency model. > > -- > Rhodri James *-* Kynesim Ltd > -- > https://mail.python.org/mailman/listinfo/python-list > <https://mail.python.org/mailman/listinfo/python-list> -- https://mail.python.org/mailman/listinfo/python-list
String matching based on sound?
I am working on a python program that, at one step, takes an input (string), and matches it to songs/artists in a users library. I'm having some difficulty, however, figuring out how to match when the input/library contains numbers/special characters. For example, take the group "All-4-One". In my library it might be listed exactly like that. I need to match this to ANY of the following inputs: • all-4-one (of course) • all 4 one (no dashes) • all 4 1 (all numbers) • all four one (all spelled out) • all for one Or, really, any other combination that sounds the same. The reasoning for this is that the input comes from a speech recognition system, so the user speaking, for example, "4", could be recognized as "for", "four" or "4". I'd imagine that Alexa/Siri/Google all do things like this (since you can ask them to play songs/artists), but I want to implement this in Python. In initial searching, I did find the "fuzzy" library, which at first glance appeared to be what I was looking for, but it, apparently, ignores numbers, with the result that "all 4 one" gave the same output as "all in", but NOT the same output as "all 4 1" - even though "all 4 1" sounds EXACTLY the same, while "all in" is only similar if you ignore the 4. So is there something similar that works with strings containing numbers? And that would only give me a match if the two strings sound identical? That is, even ignoring the numbers, I should NOT get a match between "all one" and "all in" - they are similar, but not identical, while "all one" and "all 1" would be identical. -- https://mail.python.org/mailman/listinfo/python-list
Packaging uwsgi flask app for non-programmers?
I have been working on writing an Alexa skill which, as part of it, requires a local web server on the end users machine - the Alexa skill sends commands to this server, which runs them on the local machine. I wrote this local server in Flask, and run it using uwsgi, using a command like: "uwsgi serverconfig.ini". The problem is that in order for this to work, the end user must: 1) Install python 3.6 (or thereabouts) 2) Install a number of python modules, and 3) run a command line (from the appropriate directory) Not terribly difficult, but when I think of my target audience (Alexa users), I could easily see even these steps being "too complicated". I was looking at pyinstaller to create a simple double-click application, but it appears that pyinstaller needs a python script as the "base" for the application, whereas my "base" is uwsgi. Also, I do need to leave a config file accessible for the end user to be able to edit. Is there a way to use pyinstaller in this scenario, or perhaps some other option that might work better to package things up? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Packaging uwsgi flask app for non-programmers?
On Feb 6, 2018, at 12:12 PM, Israel Brewster wrote: > > I have been working on writing an Alexa skill which, as part of it, requires > a local web server on the end users machine - the Alexa skill sends commands > to this server, which runs them on the local machine. I wrote this local > server in Flask, and run it using uwsgi, using a command like: "uwsgi > serverconfig.ini". > > The problem is that in order for this to work, the end user must: > > 1) Install python 3.6 (or thereabouts) > 2) Install a number of python modules, and > 3) run a command line (from the appropriate directory) > > Not terribly difficult, but when I think of my target audience (Alexa users), > I could easily see even these steps being "too complicated". I was looking at > pyinstaller to create a simple double-click application, but it appears that > pyinstaller needs a python script as the "base" for the application, whereas > my "base" is uwsgi. Also, I do need to leave a config file accessible for the > end user to be able to edit. Is there a way to use pyinstaller in this > scenario, or perhaps some other option that might work better to package > things up? A related question, should a way to create a full package not be available, would be Is there a way to do a "local" (as in, in the same directory) install of Python3.6, and to do it in such a way as I could script it from the shell (or python, whatever)? The idea would then be to basically set up a fully self-contained virtualenv on the users machine, such that they just have to run a "setup.sh" script or the like. BTW, this would be on a Mac - my local skill server works using AppleScript, so it's not actually portable to other OS's :-P > > --- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > --- > > > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Packaging uwsgi flask app for non-programmers?
> On Feb 6, 2018, at 8:24 PM, Dennis Lee Bieber wrote: > > On Tue, 6 Feb 2018 12:12:26 -0900, Israel Brewster > declaimed the following: > >> I have been working on writing an Alexa skill which, as part of it, requires >> a local web server on the end users machine - the Alexa skill sends commands >> to this server, which runs them on the local machine. I wrote this local >> server in Flask, and run it using uwsgi, using a command like: "uwsgi >> serverconfig.ini". >> > > > >> Not terribly difficult, but when I think of my target audience (Alexa >> users), I could easily see even these steps being "too complicated". I was >> looking at pyinstaller to create a simple double-click application, but it >> appears that pyinstaller needs a python script as the "base" for the >> application, whereas my "base" is uwsgi. Also, I do need to leave a config >> file accessible for the end user to be able to edit. Is there a way to use >> pyinstaller in this scenario, or perhaps some other option that might work >> better to package things up? > > Not mentioned is getting your end-user to possibly have to open up > fire-wall rules to allow INBOUND connections (even if, somehow, limited to > LAN -- don't want to leave a WAN port open). Not mentioned because it's not needed - I establish a ngrok tunnel to provide external https access to the local server. I just include the ngrok binary with my package, and run it using subprocess.Popen. Since it doesn't even require you to have an account to use it, that bypasses the need to set up port-forwards and firewall rules quite nicely. Also solves the problem of dynamic IP's without having to burden the end user with dyndns or the like - I just "register" the URL you get when connecting. Admittedly though, that was a large concern of mine until I was pointed to ngrok as a solution. Ideally, I'd just access across the local network, Alexa device to local machine, but that's not an option - at least, not yet. > -- > Wulfraed Dennis Lee Bieber AF6VN >wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/ > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Packaging uwsgi flask app for non-programmers?
> > On Feb 6, 2018, at 12:12 PM, Israel Brewster wrote: > > I have been working on writing an Alexa skill which, as part of it, requires > a local web server on the end users machine - the Alexa skill sends commands > to this server, which runs them on the local machine. I wrote this local > server in Flask, and run it using uwsgi, using a command like: "uwsgi > serverconfig.ini". > > The problem is that in order for this to work, the end user must: > > 1) Install python 3.6 (or thereabouts) > 2) Install a number of python modules, and > 3) run a command line (from the appropriate directory) > > Not terribly difficult, but when I think of my target audience (Alexa users), > I could easily see even these steps being "too complicated". I was looking at > pyinstaller to create a simple double-click application, but it appears that > pyinstaller needs a python script as the "base" for the application, whereas > my "base" is uwsgi. Also, I do need to leave a config file accessible for the > end user to be able to edit. Is there a way to use pyinstaller in this > scenario, or perhaps some other option that might work better to package > things up? So at the moment, since there have been no suggestions for packaging, I'm getting by with a bash script that: a) Makes sure python 3 is installed, prompting the user to install it if not b) Makes sure pip and virtualenv are installed, and installs them if needed c) Sets up a virtualenv in the distribution directory d) Installs all needed modules in the virtualenv - this step requires that dev tools are installed, a separate install. e) modifies the configuration files to match the user and directory, and f) Installs a launchd script to run the uwsgi application This actually seems to work fairly well, and by giving the script a .command extension, which automatically gets associated with terminal under OS X, the end user can simply double-click setup.command without having to go into terminal themselves. The main stumbling block then is the install of python3 - the user still has to manually download and install it in addition to my code, which I'd prefer to avoid - having to install my code separate from the Alexa skill is already an annoyance. As such, I'm considering three possible solutions: 1) Make some sort of installer package that includes the python3 installer 2) Somehow automate the download and install of Python3, or 3) re-write my code to be python 2 compatible (since python 2 is included with the OS) If anyone has any suggestions on how I could accomplish 1 or 2, I'd appreciate it. Thanks! --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > --- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > --- > > > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Packaging uwsgi flask app for non-programmers?
> On Feb 13, 2018, at 10:02 AM, Dan Stromberg wrote: > > On Tue, Feb 13, 2018 at 9:28 AM, Israel Brewster > wrote: >> As such, I'm considering three possible solutions: >> >> 1) Make some sort of installer package that includes the python3 installer >> 2) Somehow automate the download and install of Python3, or >> 3) re-write my code to be python 2 compatible (since python 2 is included >> with the OS) >> >> If anyone has any suggestions on how I could accomplish 1 or 2, I'd >> appreciate it. Thanks! > > Would using homebrew help? > > http://docs.python-guide.org/en/latest/starting/install3/osx/ > <http://docs.python-guide.org/en/latest/starting/install3/osx/> That's a thought. I could offer the user the option of either a) automatically installing homebrew and then installing python3 via homebrew, or b) manually downloading and running the python3 installer. > > BTW, you might use curl | bash to get the ball rolling. On that note, is there a fixed url that will always get the latest python3 installer? Of course, I might not want to do that, for example after 3.7 or 4.0 (I know, not for a while) is released, just in case something breaks with a newer release. > > I wouldn't recommend moving from 3.x to 2.x, unless perhaps you use a > common subset. Yeah, that idea kinda left a sour taste in my mouth, but I figured I'd throw it out there as it would solve the python install issue. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Filtering XArray Datasets?
I have some large (>100GB) datasets loaded into memory in a two-dimensional (X and Y) NumPy array backed XArray dataset. At one point I want to filter the data using a boolean array created by performing a boolean operation on the dataset that is, I want to filter the dataset for all points with a longitude value greater than, say, 50 and less than 60, just to give an example (hopefully that all makes sense?). Currently I am doing this by creating a boolean array (data[‘latitude’]>50, for example), and then applying that boolean array to the dataset using .where(), with drop=True. This appears to work, but has two issues: 1) It’s slow. On my large datasets, applying where can take several minutes (vs. just seconds to use a boolean array to index a similarly sized numpy array) 2) It uses large amounts of memory (which is REALLY a problem when the array is already using 100GB+) What it looks like is that values corresponding to True in the boolean array are copied to a new XArray object, thereby potentially doubling memory usage until it is complete, at which point the original object can be dropped, thereby freeing the memory. Is there any solution for these issues? Some way to do an in-place filtering? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Shapely Polygon creating empty polygon
I’m running into an issue with shapely that is baffling me. Perhaps someone here can help out? When running shapely directly from a python 3.8 interpreter, it works as expected: >>> import shapely >>> shapely.__version__ '1.8.0' >>> from shapely.geometry import Polygon >>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, >>> 54.845450778742546] >>> print(Polygon.from_bounds(*bounds)) POLYGON ((-164.2963582166963 54.64251856269729, -164.2963582166963 54.84545077874255, -163.7631779798799 54.84545077874255, -163.7631779798799 54.64251856269729, -164.2963582166963 54.64251856269729)) However, if I put this exact same code into my Flask app (currently running under the Flask development environment) as part of handling a request, I get an empty polygon: >>>import shapely >>>print(shapely.__version__) >>>from shapely.geometry import Polygon >>>print(Polygon.from_bounds(*bounds)) Output: 1.8.0 POLYGON EMPTY In fact, *any* attempt to create a polygon gives the same result: >>> test = Polygon(((1, 1), (2, 1), (2, 2))) >>> print(test) POLYGON EMPTY What am I missing here? Why doesn’t it work as part of a Flask request call? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Re: Shapely Polygon creating empty polygon
Found it! Apparently, it’s an import order issue. This works: >>> from shapely.geometry import Polygon >>> from osgeo import osr >>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, >>> 54.845450778742546] >>> print(Polygon.from_bounds(*bounds)) POLYGON ((-164.2963582166963 54.64251856269729, -164.2963582166963 54.84545077874255, -163.7631779798799 54.84545077874255, -163.7631779798799 54.64251856269729, -164.2963582166963 54.64251856269729)) But this doesn’t: >>> from osgeo import osr >>> from shapely.geometry import Polygon >>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, >>> 54.845450778742546] >>> print(Polygon.from_bounds(*bounds)) POLYGON EMPTY …So apparently I have to make sure to import shapely *before* I import anything from osgeo. Why? I have no idea... --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > On Jan 4, 2022, at 1:57 PM, Israel Brewster wrote: > > I’m running into an issue with shapely that is baffling me. Perhaps someone > here can help out? > > When running shapely directly from a python 3.8 interpreter, it works as > expected: > > >>> import shapely > >>> shapely.__version__ > '1.8.0' > >>> from shapely.geometry import Polygon > >>> bounds = [-164.29635821669632, 54.64251856269729, -163.7631779798799, > >>> 54.845450778742546] > >>> print(Polygon.from_bounds(*bounds)) > POLYGON ((-164.2963582166963 54.64251856269729, -164.2963582166963 > 54.84545077874255, -163.7631779798799 54.84545077874255, -163.7631779798799 > 54.64251856269729, -164.2963582166963 54.64251856269729)) > > However, if I put this exact same code into my Flask app (currently running > under the Flask development environment) as part of handling a request, I get > an empty polygon: > > >>>import shapely > >>>print(shapely.__version__) > >>>from shapely.geometry import Polygon > >>>print(Polygon.from_bounds(*bounds)) > > Output: > > 1.8.0 > POLYGON EMPTY > > In fact, *any* attempt to create a polygon gives the same result: > >>> test = Polygon(((1, 1), (2, 1), (2, 2))) > >>> print(test) > POLYGON EMPTY > > What am I missing here? Why doesn’t it work as part of a Flask request call? > --- > Israel Brewster > Software Engineer > Alaska Volcano Observatory > Geophysical Institute - UAF > 2156 Koyukuk Drive > Fairbanks AK 99775-7320 > Work: 907-474-5172 > cell: 907-328-9145 > -- https://mail.python.org/mailman/listinfo/python-list
Testing POST in cherrypy
When testing CherryPy using a cherrypy.text.helper.CPWebCase subclass, I can test a page request by calling "self.getPage()", and in that call I can specify a method (GET/POST etc). When specifying a POST, how do I pass the parameters? I know for a POST the parameters are in the body of the request, but in what format? Do I just url lib.urlencode() a dictionary and pass that as the body, or is there some other method I should use? Thanks! ------- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Error handling in context managers
I generally use context managers for my SQL database connections, so I can just write code like: with psql_cursor() as cursor: And the context manager takes care of making a connection (or getting a connection from a pool, more likely), and cleaning up after the fact (such as putting the connection back in the pool), even if something goes wrong. Simple, elegant, and works well. The problem is that, from time to time, I can't get a connection, the result being that cursor is None, and attempting to use it results in an AttributeError. So my instinctive reaction is to wrap the potentially offending code in a try block, such that if I get that AttributeError I can decide how I want to handle the "no connection" case. This, of course, results in code like: try: with psql_cursor() as cursor: except AttributeError as e: I could also wrap the code within the context manager in an if block checking for if cursor is not None, but while perhaps a bit clearer as to the purpose, now I've got an extra check that will not be needed most of the time (albeit a quite inexpensive check). The difficulty I have with either of these solutions, however, is that they feel ugly to me - and wrapping the context manager in a try block almost seems to defeat the purpose of the context manager in the first place - If I'm going to be catching errors anyway, why not just do the cleanup there rather than hiding it in the context manager? Now don't get me wrong - neither of these issues is terribly significant to me. I'll happily wrap all the context manager calls in a try block and move on with life if that it in fact the best option. It's just my gut says "there should be a better way", so I figured I'd ask: *is* there a better way? Perhaps some way I could handle the error internally to the context manager, such that it just dumps me back out? Of course, that might not work, given that I may need to do something different *after* the context manager, depending on if I was able to get a connection, but it's a thought. Options? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Error handling in context managers
On Jan 16, 2017, at 1:27 PM, Terry Reedy wrote: > > On 1/16/2017 1:06 PM, Israel Brewster wrote: >> I generally use context managers for my SQL database connections, so I can >> just write code like: >> >> with psql_cursor() as cursor: >> >> >> And the context manager takes care of making a connection (or getting a >> connection from a pool, more likely), and cleaning up after the fact (such >> as putting the connection back in the pool), even if something goes wrong. >> Simple, elegant, and works well. >> >> The problem is that, from time to time, I can't get a connection, the result >> being that cursor is None, > > This would be like open('bad file') returning None instead of raising > FileNotFoundError. > >> and attempting to use it results in an AttributeError. > > Just as None.read would. > > Actually, I have to wonder about your claim. The with statement would look > for cursor.__enter__ and then cursor.__exit__, and None does not have those > methods. In other words, the expression following 'with' must evaluate to a > context manager and None is not a context manager. > > >>> with None: pass > > Traceback (most recent call last): > File "", line 1, in >with None: pass > AttributeError: __enter__ > > Is psql_cursor() returning a fake None object with __enter__ and __exit__ > methods? No, the *context manager*, which I call in the with *does* have __enter__ and __exit__ methods. It's just that the __enter__ method returns None when it can't get a connection. So the expression following with *does* evaluate to a context manager, but the expression following as evaluates to None. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > -- > Terry Jan Reedy > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Error handling in context managers
On Jan 16, 2017, at 8:01 PM, Gregory Ewing wrote: > > Israel Brewster wrote: >> The problem is that, from time to time, I can't get a connection, the result >> being that cursor is None, > > That's your problem right there -- you want a better-behaved > version of psql_cursor(). > > def get_psql_cursor(): > c = psql_cursor() > if c is None: > raise CantGetAConnectionError() > return c > > with get_psql_cursor() as c: > ... > Ok, fair enough. So I get a better exception, raised at the proper time. This is, in fact, better - but doesn't actually change how I would *handle* the exception :-) --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > -- > Greg > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Error handling in context managers
On Jan 16, 2017, at 11:34 PM, Peter Otten <__pete...@web.de> wrote: > > Gregory Ewing wrote: > >> Israel Brewster wrote: >>> The problem is that, from time to time, I can't get a connection, the >>> result being that cursor is None, >> >> That's your problem right there -- you want a better-behaved >> version of psql_cursor(). >> >> def get_psql_cursor(): >>c = psql_cursor() >>if c is None: >> raise CantGetAConnectionError() >>return c >> >> with get_psql_cursor() as c: >>... > > You still need to catch the error -- which leads to option (3) in my zoo, > the only one that is actually usable. If one contextmanager cannot achieve > what you want, use two: > > $ cat conditional_context_raise.py > import sys > from contextlib import contextmanager > > class NoConnection(Exception): >pass > > class Cursor: >def execute(self, sql): >print("EXECUTING", sql) > > @contextmanager > def cursor(): >if "--fail" in sys.argv: >raise NoConnection >yield Cursor() > > @contextmanager > def no_connection(): >try: >yield >except NoConnection: >print("no connection") > > with no_connection(), cursor() as cs: >cs.execute("insert into...") > $ python3 conditional_context_raise.py > EXECUTING insert into... > $ python3 conditional_context_raise.py --fail > no connection > > If you want to ignore the no-connection case use > contextlib.suppress(NoConnection) instead of the custom no_connection() > manager. Fun :-) I'll have to play around with that. Thanks! :-) --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
AssertionError without traceback?
I have a flask application deployed on CentOS 7 using Python 3.6.7 and uwsgi 2.0.17.1, proxied behind nginx. uwsgi is configured to listed on a socket in /tmp. The app uses gevent and the flask_uwsgi_websockets plugin as well as various other third-party modules, all installed via pip in a virtualenv. The environment was set up using pip just a couple of days ago, so everything should be fully up-to-date. The application *appears* to be running properly (it is in moderate use and there have been no reports of issues, nor has my testing turned up any problems), however I keep getting entries like the following in the error log: AssertionError 2019-01-14T19:16:32Z failed with AssertionError There is no additional information provided, just that. I was running the same app (checked out from a GIT repository, so exact same code) on CentOS 6 for years without issue, it was only since I moved to CentOS 7 that I've seen the errors. I have not so far been able to correlate this error with any specific request. Has anyone seen anything like this before such that you can give me some pointers to fixing this? As the application *appears* to be functioning normally, it may not be a big issue, but it has locked up once since the move (no errors in the log, just not responding on the socket), so I am a bit concerned. --- Israel Brewster Systems Analyst II 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- [cid:05a3a602-0c27-4749-91b8-096a5857d984@flyravn.com] [cid:bbc82752-6db4-44cf-b919-421ed304e1d1@flyravn.com] -- https://mail.python.org/mailman/listinfo/python-list
Re: AssertionError without traceback?
--- Israel Brewster Systems Analyst II 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- [cid:bfa5c323-b100-481d-96b1-fc256ef2eb39@flyravn.com] [cid:8c891973-9e67-47b3-aa14-5f58b9b93607@flyravn.com] On Jan 14, 2019, at 10:40 PM, dieter mailto:die...@handshake.de>> wrote: Israel Brewster mailto:ibrews...@flyravn.com>> writes: I have a flask application deployed on CentOS 7 using Python 3.6.7 and uwsgi 2.0.17.1, proxied behind nginx. uwsgi is configured to listed on a socket in /tmp. The app uses gevent and the flask_uwsgi_websockets plugin as well as various other third-party modules, all installed via pip in a virtualenv. The environment was set up using pip just a couple of days ago, so everything should be fully up-to-date. The application *appears* to be running properly (it is in moderate use and there have been no reports of issues, nor has my testing turned up any problems), however I keep getting entries like the following in the error log: AssertionError 2019-01-14T19:16:32Z failed with AssertionError I would try to find out where the log message above has been generated and ensure it does not only log the information above but also the associated traceback. Any tips as to how? I tried putting in additional logging at a couple places where I called gevent.spawn() to see if that additional logging lined up with the assertions, but no luck. I guess I could just start peppering my code with logging commands, and hope something pops, but this seems quite...inelegant. I have not been able to reproduce the error, unfortunately. I assume that the log comes from some framework -- maybe "uwsgi" or "gevent". It is a weakness to log exceptions without the associated traceback. Fully agreed on both points. The reference to the callback for some reason puts me in mind of C code, but of course AssertionError is python, so maybe not. For what it's worth, the issue only seems to happen when the server is under relatively heavy load. During the night, when it is mostly idle, I don't get many (if any) errors. And this has only been happening since I upgraded to CentOS7 and the latest versions of all the frameworks. Hopefully it isn't a version incompatibility... There is no additional information provided, just that. I was running the same app (checked out from a GIT repository, so exact same code) on CentOS 6 for years without issue, it was only since I moved to CentOS 7 that I've seen the errors. I have not so far been able to correlate this error with any specific request. Has anyone seen anything like this before such that you can give me some pointers to fixing this? As the application *appears* to be functioning normally, it may not be a big issue, but it has locked up once since the move (no errors in the log, just not responding on the socket), so I am a bit concerned. --- Israel Brewster Systems Analyst II 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- [cid:05a3a602-0c27-4749-91b8-096a5857d984@flyravn.com] [cid:bbc82752-6db4-44cf-b919-421ed304e1d1@flyravn.com] -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: AssertionError without traceback?
> On Jan 14, 2019, at 10:40 PM, dieter wrote: > > Israel Brewster writes: >> I have a flask application deployed on CentOS 7 using Python 3.6.7 and uwsgi >> 2.0.17.1, proxied behind nginx. uwsgi is configured to listed on a socket in >> /tmp. The app uses gevent and the flask_uwsgi_websockets plugin as well as >> various other third-party modules, all installed via pip in a virtualenv. >> The environment was set up using pip just a couple of days ago, so >> everything should be fully up-to-date. The application *appears* to be >> running properly (it is in moderate use and there have been no reports of >> issues, nor has my testing turned up any problems), however I keep getting >> entries like the following in the error log: >> >> AssertionError >> 2019-01-14T19:16:32Z failed with >> AssertionError > > I would try to find out where the log message above has been generated > and ensure it does not only log the information above but also the > associated traceback. > > I assume that the log comes from some framework -- maybe "uwsgi" > or "gevent". It is a weakness to log exceptions without the > associated traceback. After extensive debugging, it would appear the issue arrises due to a combination of my use of gevent.spawn to run a certain function, and the portion of that function that sends web socket messages. If I remove either the gevent.spawn and just call the function directly, or keep the gevent.spawn but don't try to send any messages via web sockets, the error goes away. With the combination, I *occasionally* get the message - most of the time it works. So I guess I just run everything synchronously for now, and as log as the performance isn't hurt noticeably, call it good. I still find it strange that this never happened on CentOS 6, but whatever. The gevent.spawn calls were probably pre-mature optimization anyway. > >> There is no additional information provided, just that. I was running the >> same app (checked out from a GIT repository, so exact same code) on CentOS 6 >> for years without issue, it was only since I moved to CentOS 7 that I've >> seen the errors. I have not so far been able to correlate this error with >> any specific request. Has anyone seen anything like this before such that >> you can give me some pointers to fixing this? As the application *appears* >> to be functioning normally, it may not be a big issue, but it has locked up >> once since the move (no errors in the log, just not responding on the >> socket), so I am a bit concerned. >> --- >> Israel Brewster >> Systems Analyst II >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7293 >> --- >> >> [cid:05a3a602-0c27-4749-91b8-096a5857d984@flyravn.com] >> >> >> >> [cid:bbc82752-6db4-44cf-b919-421ed304e1d1@flyravn.com] > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Multiprocessing performance question
I have the following code running in python 3.7: def create_box(x_y): return geometry.box(x_y[0] - 1, x_y[1], x_y[0], x_y[1] - 1) x_range = range(1, 1001) y_range = range(1, 801) x_y_range = list(itertools.product(x_range, y_range)) grid = list(map(create_box, x_y_range)) Which creates and populates an 800x1000 “grid” (represented as a flat list at this point) of “boxes”, where a box is a shapely.geometry.box(). This takes about 10 seconds to run. Looking at this, I am thinking it would lend itself well to parallelization. Since the box at each “coordinate" is independent of all others, it seems I should be able to simply split the list up into chunks and process each chunk in parallel on a separate core. To that end, I created a multiprocessing pool: pool = multiprocessing.Pool() And then called pool.map() rather than just “map”. Somewhat to my surprise, the execution time was virtually identical. Given the simplicity of my code, and the presumable ease with which it should be able to be parallelized, what could explain why the performance did not improve at all when moving from the single-process map() to the multiprocess map()? I am aware that in python3, the map function doesn’t actually produce a result until needed, but that’s why I wrapped everything in calls to list(), at least for testing. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing performance question
> On Feb 18, 2019, at 6:37 PM, Ben Finney wrote: > > I don't have anything to add regarding your experiments with > multiprocessing, but: > > Israel Brewster writes: > >> Which creates and populates an 800x1000 “grid” (represented as a flat >> list at this point) of “boxes”, where a box is a >> shapely.geometry.box(). This takes about 10 seconds to run. > > This seems like the kind of task NumPy http://www.numpy.org/> is > designed to address: Generating and manipulating large-to-huge arrays of > numbers, especially numbers that are representable directly in the > machine's basic number types (such as moderate-scale integers). > > Have you tried using that library and timing the result? Sort of. I am using that library, and in fact once I get the result I am converting it to a NumPy array for further use/processing, however I am still a NumPy newbie and have not been able to find a function that generates a numpy array from a function. There is the numpy.fromfunction() command, of course, but “…the function is called with … each parameter representing the coordinates of the array varying along a specific axis…”, which basically means (if my understanding/inital testing is correct) that my function would need to work with *arrays* of x,y coordinates. But the geometry.box() function needs individual x,y coordinates, not arrays, so I’d have to loop through the arrays and append to a new one or something to produce the output that numpy needs, which puts me back pretty much to the same code I already have. There may be a way to make it work, but so far I haven’t been able to figure it out any better than the code I’ve got followed by converting to a numpy array. You do bring up a good point though: there is quite possibly a better way to do this, and knowing that would be just as good as knowing why multiprocessing doesn’t improve performance. Thanks! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > -- > \ “You don't need a book of any description to help you have some | > `\kind of moral awareness.” —Dr. Francesca Stavrakoloulou, bible | > _o__) scholar, 2011-05-08 | > Ben Finney > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing performance question
Actually not a ’toy example’ at all. It is simply the first step in gridding some data I am working with - a problem that is solved by tools like SatPy, but unfortunately I can’t use SatPy because it doesn’t recognize my file format, and you can’t load data directly. Writing a custom file importer for SatPy is probably my next step. That said, the entire process took around 60 seconds to run. As this step was taking 10, I figured it would be low-hanging fruit for speeding up the process. Obviously I was wrong. For what it’s worth, I did manage to re-factor the code, so instead of generating the entire grid up-front, I generate the boxes as needed to calculate the overlap with the data grid. This brought the processing time down to around 40 seconds, so a definite improvement there. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > On Feb 20, 2019, at 4:30 PM, DL Neil wrote: > > George > > On 21/02/19 1:15 PM, george trojan wrote: >> def create_box(x_y): >> return geometry.box(x_y[0] - 1, x_y[1], x_y[0], x_y[1] - 1) >> x_range = range(1, 1001) >> y_range = range(1, 801) >> x_y_range = list(itertools.product(x_range, y_range)) >> grid = list(map(create_box, x_y_range)) >> Which creates and populates an 800x1000 “grid” (represented as a flat list >> at this point) of “boxes”, where a box is a shapely.geometry.box(). This >> takes about 10 seconds to run. >> Looking at this, I am thinking it would lend itself well to >> parallelization. Since the box at each “coordinate" is independent of all >> others, it seems I should be able to simply split the list up into chunks >> and process each chunk in parallel on a separate core. To that end, I >> created a multiprocessing pool: > > > I recall a similar discussion when folk were being encouraged to move away > from monolithic and straight-line processing to modular functions - it is > more (CPU-time) efficient to run in a straight line; than it is to repeatedly > call, set-up, execute, and return-from a function or sub-routine! ie there is > an over-head to many/all constructs! > > Isn't the 'problem' that it is a 'toy example'? That the amount of computing > within each parallel process is small in relation to the inherent 'overhead'. > > Thus, if the code performed a reasonable analytical task within each box > after it had been defined (increased CPU load), would you then notice the > expected difference between the single- and multi-process implementations? > > > > From AKL to AK > -- > Regards =dn > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
uWISGI with Qt for Python
I’m working on a Qt for python app that needs to run a local web server. For the web server portion I’m using flask and uWISGI, and at the moment I have my application launching uWISGI using subprocess before firing off the Qt QApplication instance and entering the Qt event loop. Some sample code to illustrate the process: If __name__ ==“__main__”: CUR_DIRECTORY = os.path.dirname(__file__) UWSGI_CONFIG = os.path.realpath(os.path.join(CUR_DIRECTORY, 'Other Files/TROPOMI.ini')) UWSGI_EXE = os.path.realpath(os.path.join(CUR_DIRECTORY, 'bin/uwsgi')) uwsgi_proc = subprocess.Popen([UWSGI_EXE, UWSGI_CONFIG]) qt_app = QApplication(sys.argv) …. res = qt_app.exec_() Now this works, but it strikes me as kinda kludgy, as the uWISGI is effectively a separate application needed. More to the point, however, it’s a bit fragile, in that if the main application crashes (really, ANY sort of unclean exit), you get stray uWISGI processes hanging around that prevent proper functioning of the app the next time you try to launch it. Unfortunately as the app is still in early days, this happens occasionally. So I have two questions: 1) Is there a “better way”? This GitHub repo: https://github.com/unbit/uwsgi-qtloop seems to indicate that it should be possible to run a Qt event loop from within a uWSGI app, thus eliminating the extra “subprocess” spinoff, but it hasn’t been updated in 5 years and I have been unable to get it to work with my current Qt/Python/OS setup 2) Baring any “better way”, is there a way to at least ensure that the subprocess is killed in the event of parent death, or alternately to look for and kill any such lingering processes on application startup? P.S. The purpose of running the web server is to be able to load and use Plotly charts in my app (via a QWebEngineView). So a “better way” may be using a different plotting library that can essentially “cut out” the middle man. I’ve tried Matplotlib, but I found its performance to be worse than Plotly - given the size of my data sets, performance matters. Also I had some glitches with it when using a lasso selector (plot going black). Still, with some work, it may be an option. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Re: uWISGI with Qt for Python
Never mind this request. I realized that for what I am doing, the web server was unnecessary. I could just load local HTML files directly into the QWebEngineView with no need of an intermediate server. Thanks anyway, and sorry for the noise! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > On Mar 13, 2019, at 1:42 PM, Israel Brewster wrote: > > I’m working on a Qt for python app that needs to run a local web server. For > the web server portion I’m using flask and uWISGI, and at the moment I have > my application launching uWISGI using subprocess before firing off the Qt > QApplication instance and entering the Qt event loop. Some sample code to > illustrate the process: > > If __name__ ==“__main__”: > CUR_DIRECTORY = os.path.dirname(__file__) > > UWSGI_CONFIG = os.path.realpath(os.path.join(CUR_DIRECTORY, 'Other > Files/TROPOMI.ini')) > UWSGI_EXE = os.path.realpath(os.path.join(CUR_DIRECTORY, 'bin/uwsgi')) > uwsgi_proc = subprocess.Popen([UWSGI_EXE, UWSGI_CONFIG]) > > qt_app = QApplication(sys.argv) > …. > res = qt_app.exec_() > > > Now this works, but it strikes me as kinda kludgy, as the uWISGI is > effectively a separate application needed. More to the point, however, it’s a > bit fragile, in that if the main application crashes (really, ANY sort of > unclean exit), you get stray uWISGI processes hanging around that prevent > proper functioning of the app the next time you try to launch it. > Unfortunately as the app is still in early days, this happens occasionally. > So I have two questions: > > 1) Is there a “better way”? This GitHub repo: > https://github.com/unbit/uwsgi-qtloop <https://github.com/unbit/uwsgi-qtloop> > seems to indicate that it should be possible to run a Qt event loop from > within a uWSGI app, thus eliminating the extra “subprocess” spinoff, but it > hasn’t been updated in 5 years and I have been unable to get it to work with > my current Qt/Python/OS setup > > 2) Baring any “better way”, is there a way to at least ensure that the > subprocess is killed in the event of parent death, or alternately to look for > and kill any such lingering processes on application startup? > > P.S. The purpose of running the web server is to be able to load and use > Plotly charts in my app (via a QWebEngineView). So a “better way” may be > using a different plotting library that can essentially “cut out” the middle > man. I’ve tried Matplotlib, but I found its performance to be worse than > Plotly - given the size of my data sets, performance matters. Also I had some > glitches with it when using a lasso selector (plot going black). Still, with > some work, it may be an option. > > --- > Israel Brewster > Software Engineer > Alaska Volcano Observatory > Geophysical Institute - UAF > 2156 Koyukuk Drive > Fairbanks AK 99775-7320 > Work: 907-474-5172 > cell: 907-328-9145 > -- https://mail.python.org/mailman/listinfo/python-list
Multiprocessing and memory management
I have a script that benefits greatly from multiprocessing (it’s generating a bunch of images from data). Of course, as expected each process uses a chunk of memory, and the more processes there are, the more memory used. The amount used per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 GB, depending on the amount of data being processed (usually closer to 10GB, the 40/50 is fairly rare). This puts me in a position of needing to balance the number of processes with memory usage, such that I maximize resource utilization (running one process at a time would simply take WAY to long) while not overloading RAM (which at best would slow things down due to swap). Obviously this process will be run on a machine with lots of RAM, but as I don’t know how large the datasets that will be fed to it are, I wanted to see if I could build some intelligence into the program such that it doesn’t overload the memory. A couple of approaches I thought of: 1) Determine the total amount of RAM in the machine (how?), assume an average of 10GB per process, and only launch as many processes as calculated to fit. Easy, but would run the risk of under-utilizing the processing capabilities and taking longer to run if most of the processes were using significantly less than 10GB 2) Somehow monitor the memory usage of the various processes, and if one process needs a lot, pause the others until that one is complete. Of course, I’m not sure if this is even possible. 3) Other approaches? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Proper way to pass Queue to process when using multiprocessing.imap()?
When using pool.imap to apply a function over a list of values, what is the proper way to pass additional arguments to the function, specifically in my case a Queue that the process can use to communicate back to the main thread (for the purpose of reporting progress)? I have seen suggestions of using starmap, but this doesn’t appear to have a “lazy” variant, which I have found to be very beneficial in my use case. The Queue is the same one for all processes, if that makes a difference. I could just make the Queue global, but I have always been told not too. Perhaps this is an exception? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Re: Proper way to pass Queue to process when using multiprocessing.imap()?
> > On Sep 3, 2019, at 10:49 AM, Peter Otten <__pete...@web.de> wrote: > > Israel Brewster wrote: > >> When using pool.imap to apply a function over a list of values, what is >> the proper way to pass additional arguments to the function, specifically >> in my case a Queue that the process can use to communicate back to the >> main thread (for the purpose of reporting progress)? I have seen >> suggestions of using starmap, but this doesn’t appear to have a “lazy” >> variant, which I have found to be very beneficial in my use case. The >> Queue is the same one for all processes, if that makes a difference. >> >> I could just make the Queue global, but I have always been told not too. >> Perhaps this is an exception? > > How about wrapping the function into another function that takes only one > argument? A concise way is to do that with functools.partial(): > > def f(value, queue): ... > > pool.imap(partial(f, queue=...), values) That looks like exactly what I was looking for. I’ll give it a shot. Thanks! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > > >> >> --- >> Israel Brewster >> Software Engineer >> Alaska Volcano Observatory >> Geophysical Institute - UAF >> 2156 Koyukuk Drive >> Fairbanks AK 99775-7320 >> Work: 907-474-5172 >> cell: 907-328-9145 >> > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Proper way to pass Queue to process when using multiprocessing.imap()?
> > On Sep 3, 2019, at 9:27 AM, Rob Gaddi > wrote: > > On 9/3/19 10:17 AM, Israel Brewster wrote: >> When using pool.imap to apply a function over a list of values, what is the >> proper way to pass additional arguments to the function, specifically in my >> case a Queue that the process can use to communicate back to the main thread >> (for the purpose of reporting progress)? I have seen suggestions of using >> starmap, but this doesn’t appear to have a “lazy” variant, which I have >> found to be very beneficial in my use case. The Queue is the same one for >> all processes, if that makes a difference. >> I could just make the Queue global, but I have always been told not too. >> Perhaps this is an exception? >> --- >> Israel Brewster >> Software Engineer >> Alaska Volcano Observatory >> Geophysical Institute - UAF >> 2156 Koyukuk Drive >> Fairbanks AK 99775-7320 >> Work: 907-474-5172 >> cell: 907-328-9145 > > The first rule is to never use global variables. The second is to never put > too much stock in sweeping generalizations. So long as you can keep that > Queue's usage pattern fairly well constrained, go ahead and make it global. > > One thing to think about that might make this all easier though; have you > looked at the concurrent.futures module? I find it does a fantastic job of > handling this sort of parallelization in a straightforward way. I’ve only briefly looked at it in other situations. I’ll go ahead and take another look for this one. Thanks for the suggestion! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > -- > Rob Gaddi, Highland Technology -- www.highlandtechnology.com > Email address domain is currently out of order. See above to fix. > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Proper way to pass Queue to process when using multiprocessing.imap()?
> > On Sep 3, 2019, at 11:09 AM, Israel Brewster wrote: > >> >> On Sep 3, 2019, at 10:49 AM, Peter Otten <__pete...@web.de> wrote: >> >> Israel Brewster wrote: >> >>> When using pool.imap to apply a function over a list of values, what is >>> the proper way to pass additional arguments to the function, specifically >>> in my case a Queue that the process can use to communicate back to the >>> main thread (for the purpose of reporting progress)? I have seen >>> suggestions of using starmap, but this doesn’t appear to have a “lazy” >>> variant, which I have found to be very beneficial in my use case. The >>> Queue is the same one for all processes, if that makes a difference. >>> >>> I could just make the Queue global, but I have always been told not too. >>> Perhaps this is an exception? >> >> How about wrapping the function into another function that takes only one >> argument? A concise way is to do that with functools.partial(): >> >> def f(value, queue): ... >> >> pool.imap(partial(f, queue=...), values) > > That looks like exactly what I was looking for. I’ll give it a shot. Thanks! So as it turns out, this doesn’t work after all. I get an error stating that “Queue objects should only be shared between processes through inheritance”. Still a good technique to know though! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > --- > Israel Brewster > Software Engineer > Alaska Volcano Observatory > Geophysical Institute - UAF > 2156 Koyukuk Drive > Fairbanks AK 99775-7320 > Work: 907-474-5172 > cell: 907-328-9145 > >> >> >> >>> >>> --- >>> Israel Brewster >>> Software Engineer >>> Alaska Volcano Observatory >>> Geophysical Institute - UAF >>> 2156 Koyukuk Drive >>> Fairbanks AK 99775-7320 >>> Work: 907-474-5172 >>> cell: 907-328-9145 >>> >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> <https://mail.python.org/mailman/listinfo/python-list> -- https://mail.python.org/mailman/listinfo/python-list
Make warning an exception?
I was running some code and I saw this pop up in the console: 2019-12-06 11:53:54.087 Python[85524:39651849] WARNING: nextEventMatchingMask should only be called from the Main Thread! This will throw an exception in the future. The only problem is, I have no idea what is generating that warning - I never call nextEventMatchingMask directly, so it must be getting called from one of the libraries I’m calling. Is there some way I can force python to throw an exception now, so my debugger can catch it and let me know where in my code the originating call is? I’ve tried stepping through the obvious options, with no luck so far. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Multiprocessing, join(), and crashed processes
In a number of places I have constructs where I launch several processes using the multiprocessing library, then loop through said processes calling join() on each one to wait until they are all complete. In general, this works well, with the *apparent* exception of if something causes one of the child processes to crash (not throw an exception, actually crash). In that event, it appears that the call to join() hangs indefinitely. How can I best handle this? Should I put a timeout on the join, and put it in a loop, such that every 5 seconds or so it breaks, checks to see if the process is still actually running, and if so goes back and calls join again? Or is there a better option to say “wait until this process is done, however long that may be, unless it crashes”? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Multiprocessing queue sharing and python3.8
Under python 3.7 (and all previous versions I have used), the following code works properly, and produces the expected output: import multiprocessing as mp mp_comm_queue = None #Will be initalized in the main function mp_comm_queue2=mp.Queue() #Test pre-initalized as well def some_complex_function(x): print(id(mp_comm_queue2)) assert(mp_comm_queue is not None) print(x) return x*2 def main(): global mp_comm_queue #initalize the Queue mp_comm_queue=mp.Queue() #Set up a pool to process a bunch of stuff in parallel pool=mp.Pool() values=range(20) data=pool.imap(some_complex_function,values) for val in data: print(f"**{val}**") if __name__=="__main__": main() - mp_comm_queue2 has the same ID for all iterations of some_complex_function, and the assert passes (mp_comm_queue is not None). However, under python 3.8, it fails - mp_comm_queue2 is a *different* object for each iteration, and the assert fails. So what am I doing wrong with the above example block? Assuming that it broke in 3.8 because I wasn’t sharing the Queue properly, what is the proper way to share a Queue object among multiple processes for the purposes of inter-process communication? The documentation (https://docs.python.org/3.8/library/multiprocessing.html#exchanging-objects-between-processes <https://docs.python.org/3.8/library/multiprocessing.html#exchanging-objects-between-processes>) appears to indicate that I should pass the queue as an argument to the function to be executed in parallel, however that fails as well (on ALL versions of python I have tried) with the error: Traceback (most recent call last): File "test_multi.py", line 32, in main() File "test_multi.py", line 28, in main for val in data: File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 748, in next raise value File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/queues.py", line 58, in __getstate__ context.assert_spawning(self) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 356, in assert_spawning ' through inheritance' % type(obj).__name__ RuntimeError: Queue objects should only be shared between processes through inheritance after I add the following to the code to try passing the queue rather than having it global: #Try by passing queue values=[(x,mp_comm_queue) for x in range(20)] data=pool.imap(some_complex_function,values) for val in data: print(f"**{val}**") So if I can’t pass it as an argument, and having it global is incorrect (at least starting with 3.8), what is the proper method of getting multiprocessing queues to child processes? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 -- https://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing queue sharing and python3.8
> On Apr 6, 2020, at 12:27 PM, David Raymond wrote: > > Looks like this will get what you need. > > > def some_complex_function(x): >global q >#stuff using q > > def pool_init(q2): >global q >q = q2 > > def main(): >#initalize the Queue >mp_comm_queue = mp.Queue() > >#Set up a pool to process a bunch of stuff in parallel >pool = mp.Pool(initializer = pool_init, initargs = (mp_comm_queue,)) >... > > Gotcha, thanks. I’ll look more into that initializer argument and see how I can leverage it to do multiprocessing using spawn rather than fork in the future. Looks straight-forward enough. Thanks again! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > -Original Message- > From: David Raymond > Sent: Monday, April 6, 2020 4:19 PM > To: python-list@python.org > Subject: RE: Multiprocessing queue sharing and python3.8 > > Attempting reply as much for my own understanding. > > Are you on Mac? I think this is the pertinent bit for you: > Changed in version 3.8: On macOS, the spawn start method is now the default. > The fork start method should be considered unsafe as it can lead to crashes > of the subprocess. See bpo-33725. > > When you start a new process (with the spawn method) it runs the module just > like it's being imported. So your global " mp_comm_queue2=mp.Queue()" creates > a new Queue in each process. Your initialization of mp_comm_queue is also > done inside the main() function, which doesn't get run in each process. So > each process in the Pool is going to have mp_comm_queue as None, and have its > own version of mp_comm_queue2. The ID being the same or different is the > result of one or more processes in the Pool being used repeatedly for the > multiple steps in imap, probably because the function that the Pool is > executing finishes so quickly. > > Add a little extra info to the print calls (and/or set up logging to stdout > with the process name/id included) and you can see some of this. Here's the > hacked together changes I did for that. > > import multiprocessing as mp > import os > > mp_comm_queue = None #Will be initalized in the main function > mp_comm_queue2 = mp.Queue() #Test pre-initalized as well > > def some_complex_function(x): >print("proc id", os.getpid()) >print("mp_comm_queue", mp_comm_queue) >print("queue2 id", id(mp_comm_queue2)) >mp_comm_queue2.put(x) >print("queue size", mp_comm_queue2.qsize()) >print("x", x) >return x * 2 > > def main(): >global mp_comm_queue >#initalize the Queue >mp_comm_queue = mp.Queue() > >#Set up a pool to process a bunch of stuff in parallel >pool = mp.Pool() >values = range(20) >data = pool.imap(some_complex_function, values) > >for val in data: >print(f"**{val}**") >print("final queue2 size", mp_comm_queue2.qsize()) > > if __name__ == "__main__": >main() > > > > When making your own Process object and stating it then the Queue should be > passed into the function as an argument, yes. The error text seems to be part > of the Pool implementation, which I'm not as familiar with enough to know the > best way to handle it. (Probably something using the "initializer" and > "initargs" arguments for Pool)(maybe) > > > > -Original Message- > From: Python-list > On Behalf Of Israel Brewster > Sent: Monday, April 6, 2020 1:24 PM > To: Python > Subject: Multiprocessing queue sharing and python3.8 > > Under python 3.7 (and all previous versions I have used), the following code > works properly, and produces the expected output: > > import multiprocessing as mp > > mp_comm_queue = None #Will be initalized in the main function > mp_comm_queue2=mp.Queue() #Test pre-initalized as well > > def some_complex_function(x): >print(id(mp_comm_queue2)) >assert(mp_comm_queue is not None) >print(x) >return x*2 > > def main(): >global mp_comm_queue >#initalize the Queue >mp_comm_queue=mp.Queue() > >#Set up a pool to process a bunch of stuff in parallel >pool=mp.Pool() >values=range(20) >data=pool.imap(some_complex_function,values) > >for val in data: >print(f"**{val}**") > > if __name__=="__main__": >main() > > - mp_comm_queue2 has the same ID for all iter
Re: Multiprocessing queue sharing and python3.8
> On Apr 6, 2020, at 12:19 PM, David Raymond wrote: > > Attempting reply as much for my own understanding. > > Are you on Mac? I think this is the pertinent bit for you: > Changed in version 3.8: On macOS, the spawn start method is now the default. > The fork start method should be considered unsafe as it can lead to crashes > of the subprocess. See bpo-33725. Ahhh, yep, that would do it! Using spawn rather than fork completely explains all the issues I was suddenly seeing. Didn’t even occur to me that the os I was running might make a difference. And yes, forcing it back to using fork does indeed “fix” the issue. Of course, as is noted there, the fork start method should be considered unsafe, so I guess I get to re-architect everything I do using multiprocessing that relies on data-sharing between processes. The Queue example was just a minimum working example that illustrated the behavioral differences I was seeing :-) Thanks for the pointer! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > When you start a new process (with the spawn method) it runs the module just > like it's being imported. So your global " mp_comm_queue2=mp.Queue()" creates > a new Queue in each process. Your initialization of mp_comm_queue is also > done inside the main() function, which doesn't get run in each process. So > each process in the Pool is going to have mp_comm_queue as None, and have its > own version of mp_comm_queue2. The ID being the same or different is the > result of one or more processes in the Pool being used repeatedly for the > multiple steps in imap, probably because the function that the Pool is > executing finishes so quickly. > > Add a little extra info to the print calls (and/or set up logging to stdout > with the process name/id included) and you can see some of this. Here's the > hacked together changes I did for that. > > import multiprocessing as mp > import os > > mp_comm_queue = None #Will be initalized in the main function > mp_comm_queue2 = mp.Queue() #Test pre-initalized as well > > def some_complex_function(x): >print("proc id", os.getpid()) >print("mp_comm_queue", mp_comm_queue) >print("queue2 id", id(mp_comm_queue2)) >mp_comm_queue2.put(x) >print("queue size", mp_comm_queue2.qsize()) >print("x", x) >return x * 2 > > def main(): >global mp_comm_queue >#initalize the Queue >mp_comm_queue = mp.Queue() > >#Set up a pool to process a bunch of stuff in parallel >pool = mp.Pool() >values = range(20) >data = pool.imap(some_complex_function, values) > >for val in data: >print(f"**{val}**") >print("final queue2 size", mp_comm_queue2.qsize()) > > if __name__ == "__main__": >main() > > > > When making your own Process object and stating it then the Queue should be > passed into the function as an argument, yes. The error text seems to be part > of the Pool implementation, which I'm not as familiar with enough to know the > best way to handle it. (Probably something using the "initializer" and > "initargs" arguments for Pool)(maybe) > > > > -Original Message- > From: Python-list <mailto:python-list-bounces+david.raymond=tomtom@python.org>> On Behalf > Of Israel Brewster > Sent: Monday, April 6, 2020 1:24 PM > To: Python mailto:python-list@python.org>> > Subject: Multiprocessing queue sharing and python3.8 > > Under python 3.7 (and all previous versions I have used), the following code > works properly, and produces the expected output: > > import multiprocessing as mp > > mp_comm_queue = None #Will be initalized in the main function > mp_comm_queue2=mp.Queue() #Test pre-initalized as well > > def some_complex_function(x): >print(id(mp_comm_queue2)) >assert(mp_comm_queue is not None) >print(x) >return x*2 > > def main(): >global mp_comm_queue >#initalize the Queue >mp_comm_queue=mp.Queue() > >#Set up a pool to process a bunch of stuff in parallel >pool=mp.Pool() >values=range(20) >data=pool.imap(some_complex_function,values) > >for val in data: >print(f"**{val}**") > > if __name__=="__main__": >main() > > - mp_comm_queue2 has the same ID for all iterations of some_complex_function, > and the assert passes (mp_comm_queue is not None). However, under python 3.8, > it fails - mp_comm_queue2 is a *diff
Re: Python concurrent.futures.ProcessPoolExecutor
> On Dec 16, 2020, at 7:04 AM, Rob Rosengard wrote: > > Warning: I am new to this group > Warning: I am not an expert at Python, I've written a few small programs, > and spend 20 hours of online classes, and maybe a book or two. > Warning: I am new to trying to use concurrent.futures.ProcessPoolExecutor > - Prior to writing this question I updated to Python 3.9 and PyCharm 2020.3. > And confirmed the problem still exists. > - Running on Windows 10 Professional > - I've been trying to run a simple piece of code to exactly match what I have > seen done in various training videos. By I am getting a different and > unexpected set of results. I.e. the instructor got different results than I > did on my computer. My code is very simple: > > import concurrent.futures > import time > > > start = time.perf_counter() > > > def task(myarg): >print(f'Sleeping one second...{myarg}') >time.sleep(1) >return 'Done sleeping...' > > > if __name__ == '__main__': >with concurrent.futures.ProcessPoolExecutor() as executor: >future1 = executor.submit(task, 1) >future2 = executor.submit(task, 2) > finish = time.perf_counter() > print(f'Finished in {round(finish-start,2)} seconds') > > And the output is: > Finished in 0.0 seconds > Finished in 0.0 seconds > Sleeping one second...1 > Sleeping one second...2 > Finished in 1.14 seconds > > Process finished with exit code 0 > > --- > QUESTIONS and CONCERNS that I have... > It seems that both calls to task not only runs that function, but then keeps > executing the rest of the main line code. I only expected it to run the > function and then immediately quit/disappear. That is, I expect the output > to look like (i.e. not having the three lines of "Finished in x.x seconds", > rather, just one line like that): > Sleeping one second...1 > Sleeping one second...2 > Finished in 1.14 seconds > > Goal: I need the executor tasks to only run that one function, and then > completely go away and stop. Not keep executing more code that doesn't > belong to the task function. > > I've tried many iterations of this issue, and placed PRINT statements all > over to try to track what is going on. And I used if/else statements in the > main code, which caused even more problems. I.e. both the IF and the ELSE > was executed each time through the code. Which completely blows my mind. > > Any thoughts on this? Thanks for your time and help! Assuming the code above is indented exactly as you run it, you have an indentation error. That is, the finish and print() are not indented to be part of the if __name__… call. As such, they run on import. When you launch a new process, it imports the module, which then runs those lines, since they are not guarded by the if statement. Indent those last two lines to be under the if (they don’t need to be indented to be under the with, just the if), and it should work as intended. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > R > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Handling transactions in Python DBI module
I am working on implementing a Python DB API module, and am hoping I can get some help with figuring out the workflow of handling transactions. In my experience (primarily with psycopg2) the workflow goes like this: - When you open a connection (or is it when you get a cursor? I *think* it is on opening a connection), a new transaction is started - When you close a connection, an implicit ROLLBACK is performed - After issuing SQL statements that modify the database, you call commit() on the CONNECTION object, not the cursor. My primary confusion is that at least for the DB I am working on, to start/rollback/commit a transaction, you execute the appropriate SQL statement (the c library I'm using doesn't have any transactional commands, not that it should). However, to execute the statement, you need a cursor. So how is this *typically* handled? Does the connection object keep an internal cursor that it uses to manage transactions? I'm assuming, since it is called on the connection, not the cursor, that any COMMIT/ROLLBACK commands called affect all cursors on that connection. Is that correct? Or is this DB specific? Finally, how do other DB API modules, like psycopg2, ensure that ROLLBACK is called if the user never explicitly calls close()? Thanks for any assistance that can be provided. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling transactions in Python DBI module
On Feb 10, 2016, at 8:14 PM, Chris Angelico wrote: > > On Thu, Feb 11, 2016 at 4:06 PM, Frank Millman wrote: >> A connection has 2 possible states - 'in transaction', or 'not in >> transaction'. When you create the connection it starts off as 'not'. >> >> When you call cur.execute(), it checks to see what state it is in. If the >> state is 'not', it silently issues a 'BEGIN TRANSACTION' before executing >> your statement. This applies for SELECT as well as other statements. >> >> All subsequent statements form part of the transaction, until you issue >> either conn.commit() or conn.rollback(). This performs the required action, >> and resets the state to 'not'. >> >> I learned the hard way that it is important to use conn.commit() and not >> cur.execute('commit'). Both succeed in committing, but the second does not >> reset the state, therefore the next statement does not trigger a 'BEGIN', >> with possible unfortunate side-effects. > > When I advise my students on basic databasing concepts, I recommend > this structure: > > conn = psycopg2.connect(...) > > with conn, conn.cursor() as cur: >cur.execute(...) And that is the structure I tend to use in my programs as well. I could, of course, roll the transaction control into that structure. However, that is a usage choice of the end user, whereas I am looking at the design of the connection/cursor itself. If I use psycopg, I get the transaction - even if I don't use a with block. > > The transaction block should always start at the 'with' block and end > when it exits. As long as you never nest them (including calling other > database-using functions from inside that block), it's easy to reason > about the database units of work - they always correspond perfectly to > the code blocks. > > Personally, I'd much rather the structure were "with > conn.transaction() as cur:", because I've never been able to > adequately explain what a cursor is/does. It's also a bit weird that > "with conn:" doesn't close the connection at the end (just closes the > transaction within that connection). But I guess we don't need a > "Python DB API 3.0". In my mind, cursors are simply query objects containing (potentially) result sets - so you could have two cursors, and loop through them something like "for result_1,result_2 in zip(cursor_1,cursor_2): ". Personally, I've never had a need for more than one cursor, but if you are working with large data sets, and need to work with multiple queries simultaneously without the overhead of loading the results into memory, I could see them being useful. Of course, someone else might have a completely different explanation :-) > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling transactions in Python DBI module
On Feb 10, 2016, at 8:06 PM, Frank Millman wrote: > > "Israel Brewster" wrote in message > news:92d3c964-0323-46ee-b770-b89e7e7e6...@ravnalaska.net... > >> I am working on implementing a Python DB API module, and am hoping I can get >> some help with figuring out the workflow of handling transactions. In my >> experience (primarily with >> psycopg2) the workflow goes like this: >> >> - When you open a connection (or is it when you get a cursor? I *think* it >> is on opening a connection), a new transaction is started >> - When you close a connection, an implicit ROLLBACK is performed >> - After issuing SQL statements that modify the database, you call commit() >> on the CONNECTION object, not the cursor. >> >> My primary confusion is that at least for the DB I am working on, to >> start/rollback/commit a transaction, you execute the appropriate SQL >> statement (the c library I'm using doesn't >> have any transactional commands, not that it should). However, to execute >> the statement, you need a cursor. So how is this *typically* handled? Does >> the connection object keep an > internal cursor that it uses to manage >> transactions? >> >> I'm assuming, since it is called on the connection, not the cursor, that any >> COMMIT/ROLLBACK commands called affect all cursors on that connection. Is >> that correct? Or is this DB >> specific? >> >> Finally, how do other DB API modules, like psycopg2, ensure that ROLLBACK is >> called if the user never explicitly calls close()? > > Rather than try to answer your questions point-by-point, I will describe the > results of some investigations I carried out into this subject a while ago. > > I currently support 3 databases, so I use 3 DB API modules - > PostgreSQL/psycopg2, Sql Server/pyodbc, and sqlite3/sqlite3. The following > applies specifically to psycopg2, but I applied the lessons learned to the > other 2 as well, and have had no issues. > > A connection has 2 possible states - 'in transaction', or 'not in > transaction'. When you create the connection it starts off as 'not'. > > When you call cur.execute(), it checks to see what state it is in. If the > state is 'not', it silently issues a 'BEGIN TRANSACTION' before executing > your statement. This applies for SELECT as well as other statements. > > All subsequent statements form part of the transaction, until you issue > either conn.commit() or conn.rollback(). This performs the required action, > and resets the state to 'not'. > > I learned the hard way that it is important to use conn.commit() and not > cur.execute('commit'). Both succeed in committing, but the second does not > reset the state, therefore the next statement does not trigger a 'BEGIN', > with possible unfortunate side-effects. Thanks - that is actually quite helpful. So the way I am looking at it now is that the connection would have an internal cursor as I suggested. From your response, I'll add a "state" flag as well. If the state flag is not set when execute is called on a cursor, the cursor itself will start a transaction and set the flag (this could happen from any cursor, though, so that could potentially cause a race condition, correct?). In any case, there is now a transaction open, until such a time as commit() or rollback() is called on the connection, or close is called, which executes a rollback(), using the connection's internal cursor. Hopefully that all sounds kosher. > > HTH > > Frank Millman > > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
CFFI distribution questions
So I have a python module that I have written which uses CFFI to link against a C library I have compiled. Specifically, it is a Database driver for the 4th dimension database, using an open-source C library distributed by the 4D company. I have tested the module and C code on a couple of different platforms, but I have a few questions regarding where to go from here.1) Currently, I am manually compiling the C library, and placing it in a subfolder of the module. So the top level of the module directory contains the python code and the __init__.py file, and then there is a sub-directory (lib4d_sql) containing the C code and compiled C library. I point CFFI to the library using a construct like this:_CWD = os.path.dirname(os.path.realpath(__file__))ffi.verifier(..., Library_dirs=["{}/lib4d_sql/].format(_CWD), ...)Obviously I have left out a lot of code there. Is this sort of construct considered kosher? Or is there a better way to say "this directory relative to your location"? I can't just use relative paths, because that would be relative to the execution directory (I think).2) What is the proper way to compile and distribute the C library with the python module? The examples I've found about distributing a CFFI module all assume you are are using some brain-dead-simple built-in C command where you don't have to worry about compiling or installing a library. I found some documentation related to building C and C++ extensions with distutils, and following that managed to get the library to compile, but I can't figure out what it does with the compiled library, or how to get it into the proper location relative to my module. I also found some more "raw" distutil code here: https://coderwall.com/p/mjrepq/easy-static-shared-libraries-with-distutils that I managed to use by overriding the finalize_options function of the setup tools install class (using the cmdclass option of setup), and this allowed me to build the library in the proper location in the tmp install directory, but it doesn't seem to keep the library when installing the module in the final location. So far, the only way I have found to work around that is by including a "dummy" copy of the library in the package_data option to setup, such that when the library is built it replaces this dummy file. Is there a better/more canonical way to do this?3) The majority of the setup/packing procedure I pulled from here: https://caremad.io/2014/11/distributing-a-cffi-project/ and it seems to work - mostly. The one problem I am running into is with the implicit compile that CFFI does. Some of the code (most, really) on that page is designed to work around this by doing the compile on install so it doesn't have to be done at runtime, however this doesn't appear to be working for me. I see signs that it is doing the compile at install time, however when I try to run the module it still tries to compile at that time, an operation that fails due to lack of permissions. Might anyone have some thoughts on how I can fix this?Finally, let me ask you guys this: is distributing this as a CFFI module even the right approach? Or should I be looking at something else entirely? C code isn't needed for speed - the database itself will be way slower than the interface code. It's just that the driver is distributed as C code and I didn't want to have to reverse engineer their code and re-write it all. I've read over https://docs.python.org/2/extending/extending.html, but I think I'm missing something when it comes to actually interfacing with python, and dealing with the non-python types (pointers to a struct, for example) that the C library uses. Would I have to put 100% of my code, including things like the cursor and connection classes, in C code? When I call the connect function from python (which should return an instance of a connection class), what exactly should my C function return? All the examples on that page show basic C types, not pointers to class instances or the like. Maybe I just need to read over the documentation a few more times :-)Thanks for any help anyone can provide on any of these questions! :-) Sorry for being so long-winded. ---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD -- https://mail.python.org/mailman/listinfo/python-list
Cherrypy - prevent browser "prefetch"?
I don't know if this is a cherrypy specific question (although it will be implemented in cherrypy for sure), or more of a general http protocol question, but when using cherrypy to serve a web app, is there anyway to prevent browser prefetch? I'm running to a problem, specifically from Safari on the Mac, where I start to type a URL, and Safari auto-fills the rest of a random URL matching what I started to type, and simultaneously sends a request for that URL to my server, occasionally causing unwanted effects.For example, I have a URL on my Cherrypy app that updates some local caches. It is accessed at http:///admin/updatecaches So if I start typing http:///a, for example, safari may auto-fill the "dmin/updatecaches", and trigger a cache refresh on the server - even though I was just trying to get to the main admin page at /admin. Or, it might auto-fill "uth/logout" instead (http:///auth/logout), and log me out of my session. While the former may be acceptable (after all, a cache update, even if not strictly needed, is at least non-harmfull), the latter could cause serious issues with usability. So how can cherrypy tell the difference between the "prefetch" and an actual request, and not respond to the prefetch? -------Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD -- https://mail.python.org/mailman/listinfo/python-list
Re: Cherrypy - prevent browser "prefetch"?
On Dec 1, 2014, at 12:50 PM, Ned Batchelder wrote: > On 12/1/14 4:26 PM, Tim Chase wrote: >> On 2014-12-01 11:28, Israel Brewster wrote: >>> I don't know if this is a cherrypy specific question (although it >>> will be implemented in cherrypy for sure), or more of a general >>> http protocol question, but when using cherrypy to serve a web app, >>> is there anyway to prevent browser prefetch? I'm running to a >>> problem, specifically from Safari on the Mac, where I start to type >>> a URL, and Safari auto-fills the rest of a random URL matching what >>> I started to type, and simultaneously sends a request for that URL >>> to my server, occasionally causing unwanted effects. >> >> All this to also say that performing non-idempotent actions on a GET >> request is just begging for trouble. ;-) >> > > This is the key point: your web application shouldn't be doing these kinds of > actions in response to a GET request. Make them POST requests, and Safari > won't give you any trouble. > > Trying to stop Safari from making the GET requests might work for Safari, but > then you will find another browser, or a proxy server, or an edge-caching > accelerator, etc, that makes the GET requests when you don't want them. > > The way to indicate to a browser that it shouldn't pre-fetch a URL is to make > it a POST request. Ok, that makes sense. The only difficulty I have with that answer is that to the best of my knowledge the only way to make a HTML link do a POST is to use the onclick function to run a javascript, while having the "link" itself point to nothing. Just feels a bit ugly to me, but if that's the Right Way™ to do it, then that's fine. Thanks! > > -- > Ned Batchelder, http://nedbatchelder.com > > -- > https://mail.python.org/mailman/listinfo/python-list --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Cherrypy - prevent browser "prefetch"?
On Dec 1, 2014, at 1:12 PM, Tim Chase wrote: > On 2014-12-01 16:50, Ned Batchelder wrote: >> On 12/1/14 4:26 PM, Tim Chase wrote: >>> All this to also say that performing non-idempotent actions on a >>> GET request is just begging for trouble. ;-) >> >> This is the key point: your web application shouldn't be doing >> these kinds of actions in response to a GET request. Make them >> POST requests, and Safari won't give you any trouble. > > Though to be fair, based on the reading I did, Safari also pulls in > the various JS and executes it too, meaning that merely > (pre)"viewing" the page triggers any Google Analytics (or other > analytics) code you have on that page, sending "page views" with a > high bounce rate (looks like you only hit one page and never browsed > elsewhere on the site). > > Additionally, if the target GET URL involves high processing load on > the server, it might be worthwhile to put a caching proxy in front of > it to serve (semi)stale data for any preview request rather than > impose additional load on the server just so a preview can be updated. Right, and there are probably some URL's in my app where this may be the case - I still need to go back and audit the code now that I'm aware of this going on. In general, though, it does sound as though changing things to POST requests, and disallowing GET requests for those URLS in my CherryPy app is the way to go. Thanks! > > So I can see at least two cases in which you might want to sniff the > "are you just previewing, or do you actually want the page" > information. Perhaps there are more. > > -tkc > > > > -- > https://mail.python.org/mailman/listinfo/python-list --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Cherrypy - prevent browser "prefetch"?
> On Dec 2, 2014, at 4:33 AM, random...@fastmail.us wrote: > > On Mon, Dec 1, 2014, at 15:28, Israel Brewster wrote: >> For example, I have a URL on my Cherrypy app that updates some local >> caches. It is accessed at http:///admin/updatecaches So if I >> start typing http:///a, for example, safari may auto-fill the >> "dmin/updatecaches", and trigger a cache refresh on the server - even >> though I was just trying to get to the main admin page at /admin. Or, it >> might auto-fill "uth/logout" instead (http:///auth/logout), and >> log me out of my session. While the former may be acceptable (after all, >> a cache update, even if not strictly needed, is at least non-harmfull), >> the latter could cause serious issues with usability. So how can cherrypy >> tell the difference between the "prefetch" and an actual request, and not >> respond to the prefetch? > > Why is your logout form - or, your update caches form, etc - a GET > instead of a POST? Primary because they aren’t forms, they are links. And links are, by definition, GET’s. That said, as I mentioned in earlier replies, if using a form for a simple link is the Right Way to do things like this, then I can change it. Thanks! — Israel Brewster > The key problem is that a GET request is assumed by > browser designers to not have any harmful side effects. > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Cherrypy - prevent browser "prefetch"?
Ah, I see. That makes sense. Thanks. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD On Dec 2, 2014, at 9:17 PM, Gregory Ewing wrote: > Israel Brewster wrote: >> Primary because they aren’t forms, they are links. And links are, by >> definition, GET’s. That said, as I mentioned in earlier replies, if using a >> form for a simple link is the Right Way to do things like this, then I can >> change it. > > I'd look at it another way and say that an action with side > effects shouldn't appear as a simple link to the user. Links > are for requesting information; buttons are for triggering > actions. > > -- > Greg > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Track down SIGABRT
I have a long-running python/CherryPy Web App server process that I am running on Mac OS X 10.8.5. Python 2.7.2 running in 32-bit mode (for now, I have the code in place to change over to 64 bit, but need to schedule the downtime to do it). On the 6th of this month, during normal operation from what I can tell, and after around 33 days of trouble-free uptime, the python process crashed with a SIGABRT. I restarted the process, and everything looked good again until yesterday, when it again crashed with a SIGABRT. The crash dump the system gave me doesn't tell me much, other than that it looks like python is calling some C function when it crashes. I've attached the crash report, in case it can mean something more to someone else.Can anyone give me some hints as to how to track down the cause of this crash? It's especially problematic since I can't mess with the live server for testing, and it is quite a while between crashes, making it difficult, if not impossible, to reproduce in testing. Thanks. -------Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD Python_2015-01-08-152219_minilogger.crash Description: Binary data -- https://mail.python.org/mailman/listinfo/python-list
Re: Track down SIGABRT
On Jan 12, 2015, at 5:51 PM, Jason Friedman wrote: >> I have a long-running python/CherryPy Web App server process that I am >> running on Mac OS X 10.8.5. Python 2.7.2 running in 32-bit mode (for now, I >> have the code in place to change over to 64 bit, but need to schedule the >> downtime to do it). On the 6th of this month, during normal operation from >> what I can tell, and after around 33 days of trouble-free uptime, the python >> process crashed with a SIGABRT. I restarted the process, and everything >> looked good again until yesterday, when it again crashed with a SIGABRT. > > Can you monitor disk and memory on the host? Perhaps it is climbing > towards an unacceptable value right before crashing. Good thought. I'm pretty sure that the system monitor still showed a couple of gigs free memory before the last crash, but the process could still be using unacceptable amounts of resources > > Do you have the option of stopping and starting your process every > night or every week? Yes, that's an option, and as a work-around I'll consider it. Of course, I'd much rather not have the thing crash in the first place :-) > -- > https://mail.python.org/mailman/listinfo/python-list --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Track down SIGABRT
On Jan 13, 2015, at 6:27 AM, William Ray Wing wrote: > >> On Jan 9, 2015, at 12:40 PM, Israel Brewster wrote: >> >> I have a long-running python/CherryPy Web App server process that I am >> running on Mac OS X 10.8.5. Python 2.7.2 running in 32-bit mode (for now, I >> have the code in place to change over to 64 bit, but need to schedule the >> downtime to do it). On the 6th of this month, during normal operation from >> what I can tell, and after around 33 days of trouble-free uptime, the python >> process crashed with a SIGABRT. I restarted the process, and everything >> looked good again until yesterday, when it again crashed with a SIGABRT. The >> crash dump the system gave me doesn't tell me much, other than that it looks >> like python is calling some C function when it crashes. I've attached the >> crash report, in case it can mean something more to someone else. >> >> Can anyone give me some hints as to how to track down the cause of this >> crash? It's especially problematic since I can't mess with the live server >> for testing, and it is quite a while between crashes, making it difficult, >> if not impossible, to reproduce in testing. Thanks. >> --- >> Israel Brewster >> Systems Analyst II >> Ravn Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7293 >> --- > > > Can you run the application in an IDE? Yes - I run it through Wing during development. I don't think that would be such a good option for my production machine, however. If it gets really bad I'll consider it though - that should at least tell me where it is crashing. > > -Bill --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Track down SIGABRT
On Jan 13, 2015, at 8:26 AM, Skip Montanaro wrote: > Assuming you have gdb available, you should be able to attach to the > running process, then set a breakpoint in relevant functions (like > exit() or abort()). Once there, you can pick through the C stack > manually (kind of tedious) or use the gdbinit file which comes with > Python to get a Python stack trace (much less tedious, once you've > made sure any version dependencies have been eliminated). Or, with the > latest versions of gdb (7.x I think), you get more stuff built into > gdb itself. > > More details here: > > https://wiki.python.org/moin/DebuggingWithGdb Thanks, I'll look into that. Hopefully running with the debugger attached won't slow things down to much. The main thing I think will be getting the python extensions installed - the instructions only talk about doing this for linux packages. > > Skip -- https://mail.python.org/mailman/listinfo/python-list
Best way to do background calculations?
tl;dr: I've been using the multiprocessing module to run some calculations in the background of my CherryPy web app, but apparently this process sometimes gets stuck, causing problems with open sockets piling up and blocking the app. Is there a better way?The (rather wordy) details:I have a moderately busy web app written in python using the CherryPy framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One of the primary purposes of this web app is to track user-entered flight logs, and keep a running tally of hours/cycles/landings for each aircraft. To that end, whenever a user enters or modifies a log, I "recalculate" the totals for that aircraft, and update all records with the new totals. There are probably ways to optimize this process, but so far I haven't seen a need to spend the time.Ideally, this recalculation process would happen in the background. There is no need for the user to wait around while the system crunches numbers - they should be able to move on with entering another log or whatever else they need to do. To that end, I implemented the call to the recalc function using the multiprocessing module, so it could start in the background and the main process move on.Lately, though, I've been running into a problem where, when looking at the process list on my server (Mac OS X 10.10.5), I'll see two or more "copies" of my server process running - one master and one or more child processes. As the above described process is the only place I am using the multiprocessing module, I am making the assumption that this is what these additional processes are. If they were only there for a few minutes I would think this is normal, and it wouldn't be a problem. However, what I am seeing is that from time to time (once or twice every couple of days) these additional processes will get "stuck", and when that happens sockets opened by the web app don't get properly closed and start piling up. Looking at a list of open sockets on the server when I have one of these "hung" processes shows a steadily increasing number of sockets in a "CLOSE_WAIT" state (normally I see none in that state). Killing off the hung process(es) clears out these sockets, but if I don't catch it quickly enough these sockets can build up to the point that I am unable to open any more, and the server starts rejecting connections.I'm told this happens because the process retains a reference to all open files/sockets from the parent process, thus preventing the sockets from closing until the process terminates. Regardless of the reason, it can cause a loss of service if I don't catch it quickly enough. As such, I'm wondering if there is a better way. Should I be looking at using the threading library rather than the multiprocessing library? My understanding is that the GIL would prevent that approach from being of any real benefit for a calculation intensive type task, but maybe since the rest of the application is CherryPy threads, it would still work well?. Or perhaps there is a way to not give the child process any references to the parent's files/sockets - although that may not help with the process hanging? Maybe there is a way to "monitor" the process, and automatically kill it if it stops responding? Or am I totally barking up the wrong tree here?Thanks for any insight anyone can provide! ---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD -- https://mail.python.org/mailman/listinfo/python-list
Re: Best way to do background calculations?
On Oct 25, 2015, at 4:05 PM, MRAB wrote: > > On 2015-10-23 17:35, Israel Brewster wrote: >> tl;dr: I've been using the multiprocessing module to run some >> calculations in the background of my CherryPy web app, but apparently >> this process sometimes gets stuck, causing problems with open sockets >> piling up and blocking the app. Is there a better way? >> >> The (rather wordy) details: >> >> I have a moderately busy web app written in python using the CherryPy >> framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One >> of the primary purposes of this web app is to track user-entered flight >> logs, and keep a running tally of hours/cycles/landings for each >> aircraft. To that end, whenever a user enters or modifies a log, I >> "recalculate" the totals for that aircraft, and update all records with >> the new totals. There are probably ways to optimize this process, but so >> far I haven't seen a need to spend the time. >> >> Ideally, this recalculation process would happen in the background. >> There is no need for the user to wait around while the system crunches >> numbers - they should be able to move on with entering another log or >> whatever else they need to do. To that end, I implemented the call to >> the recalc function using the multiprocessing module, so it could start >> in the background and the main process move on. >> >> Lately, though, I've been running into a problem where, when looking at >> the process list on my server (Mac OS X 10.10.5), I'll see two or more >> "copies" of my server process running - one master and one or more child >> processes. As the above described process is the only place I am using >> the multiprocessing module, I am making the assumption that this is what >> these additional processes are. If they were only there for a few >> minutes I would think this is normal, and it wouldn't be a problem. >> >> However, what I am seeing is that from time to time (once or twice every >> couple of days) these additional processes will get "stuck", and when >> that happens sockets opened by the web app don't get properly closed and >> start piling up. Looking at a list of open sockets on the server when I >> have one of these "hung" processes shows a steadily increasing number of >> sockets in a "CLOSE_WAIT" state (normally I see none in that state). >> Killing off the hung process(es) clears out these sockets, but if I >> don't catch it quickly enough these sockets can build up to the point >> that I am unable to open any more, and the server starts rejecting >> connections. >> >> I'm told this happens because the process retains a reference to all >> open files/sockets from the parent process, thus preventing the sockets >> from closing until the process terminates. Regardless of the reason, it >> can cause a loss of service if I don't catch it quickly enough. As such, >> I'm wondering if there is a better way. Should I be looking at using the >> threading library rather than the multiprocessing library? My >> understanding is that the GIL would prevent that approach from being of >> any real benefit for a calculation intensive type task, but maybe since >> the rest of the application is CherryPy threads, it would still work >> well?. Or perhaps there is a way to not give the child process any >> references to the parent's files/sockets - although that may not help >> with the process hanging? Maybe there is a way to "monitor" the process, >> and automatically kill it if it stops responding? Or am I totally >> barking up the wrong tree here? >> > It sounds like the multiprocessing module is forking the new process, > which inherits the handles. > > Python 3.4 added the ability to spawn the new process, which won't inherit > the handles. Well, that might be a reason to look at moving to 3 then. It's been on my to-do list :-) > > It's unfortunate that you're using Python 2.7.6! > > Could you start the background process early, before any of those > sockets have been opened, and then communicate with it via queues? Possibly. Simply have the process always running, and tell it to kick off calculations as needed via queues. It's worth investigating for sure. > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Best way to do background calculations?
On Oct 25, 2015, at 3:40 PM, Dennis Lee Bieber wrote: > > On Fri, 23 Oct 2015 08:35:06 -0800, Israel Brewster > declaimed the following: > >> tl;dr: I've been using the multiprocessing module to run some calculations >> in the background of my CherryPy web app, but apparently this process >> sometimes gets stuck, causing problems with open sockets piling up and >> blocking the app. Is there a better way? >> >> The (rather wordy) details: >> > The less wordy first impression... > >> I have a moderately busy web app written in python using the CherryPy >> framework (CherryPy v 3.8.0 with ws4py v 0.3.4 and Python 2.7.6) . One of >> the primary purposes of this web app is to track user-entered flight logs, >> and keep a running tally of hours/cycles/landings for each aircraft. To that >> end, whenever a user enters or modifies a log, I "recalculate" the totals >> for that aircraft, and update all records with the new totals. There are >> probably ways to optimize this process, but so far I haven't seen a need to >> spend the time. >> > Off-hand -- this sounds like something that should be in a database... > Unless your calculations are really nasty, rather than just aggregates, a > database engine should be able to apply them in SQL queries or stored > procedures. Sounds like a potentially valid approach. Would require some significant re-tooling, but could work. I'll look into it. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > -- > Wulfraed Dennis Lee Bieber AF6VN >wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/ > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Best way to do background calculations?
On Oct 25, 2015, at 6:48 PM, Chris Angelico wrote: > > On Sat, Oct 24, 2015 at 3:35 AM, Israel Brewster > wrote: >> >> Ideally, this recalculation process would happen in the background. There is >> no need for the user to wait around while the system crunches numbers - they >> should be able to move on with entering another log or whatever else they >> need to do. To that end, I implemented the call to the recalc function using >> the multiprocessing module, so it could start in the background and the main >> process move on. > > One way to get around this would be to separate the processes > completely, and simply alert the other process (maybe via a socket) to > ask it to do the recalculation. That way, the background process would > never have any of the main process's sockets, and can't affect them in > any way. Sounds similar to MRAB's suggestion of starting the process before any sockets have been opened. Certainly worth investigating, and I think it should be doable. Thanks! --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
CherryPy cpstats and ws4py
I posted this to the CherryPy and ws4py mailing lists, but in the week since I did that I've only gotten two or three views on each list, and no responses, so as a last-ditch effort I thought I'd post here. Maybe someone with more general python knowledge than me can figure out the traceback and from there a solution. Is it possible to use ws4py in conjunction with the cpstats CherryPy tool? I have a CherryPy (3.8.0) web app that uses web sockets via ws4py. Tested and working. I am now trying to get a little more visibility into the functioning of the server, so to that end I enabled the cpstats tool by adding the following line to my '/' configuration: tools.cpstats.on=True Unfortunately, as soon as I do that, attempts to connect a web socket start failing with the following traceback: [28/Oct/2015:08:18:48] Traceback (most recent call last): File "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", line 104, in run hook() File "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", line 63, in __call__ return self.callback(**self.kwargs) File "build/bdist.macosx-10.10-intel/egg/ws4py/server/cherrypyserver.py", line 200, in upgrade ws_conn = get_connection(request.rfile.rfile) File "build/bdist.macosx-10.10-intel/egg/ws4py/compat.py", line 43, in get_connection return fileobj._sock AttributeError: 'KnownLengthRFile' object has no attribute '_sock' [28/Oct/2015:08:18:48] HTTP Request Headers: PRAGMA: no-cache COOKIE: autoTabEnabled=true; fleetStatusFilterCompany=7H; fleetStatusFilterLocation=ALL; fleetStatusRefreshInterval=5; inputNumLegs=5; session_id=5c8303896aff419c175c79dfadbfdc9d75e6c45a UPGRADE: websocket HOST: flbubble.ravnalaska.net:8088 ORIGIN: http://flbubble.ravnalaska.net CONNECTION: Upgrade CACHE-CONTROL: no-cache SEC-WEBSOCKET-VERSION: 13 SEC-WEBSOCKET-EXTENSIONS: x-webkit-deflate-frame USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.7 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.7 SEC-WEBSOCKET-KEY: Szh6Uoe+WzqKR1DgW8JcXA== Remote-Addr: 10.9.1.59 [28/Oct/2015:08:18:48] HTTP Traceback (most recent call last): File "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", line 661, in respond self.hooks.run('before_request_body') File "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", line 114, in run raise exc AttributeError: 'KnownLengthRFile' object has no attribute '_sock' Disable tools.cpstats.on, and the sockets start working again. Is there some way I can fix this so I can use sockets as well as gather stats from my application? Thanks. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: CherryPy cpstats and ws4py
Ok, let me ask a different question: the impression I have gotten when trying to find help with CherryPy in general and ws4py specifically is that these frameworks are not widely used or well supported. Is that a fair assessment, or do I just have issues that are outside the realm of experience for other users? If it is a fair assessment, should I be looking at a different product for my next project? I know there are a number of options, CherryPy was simply the first one suggested to me, and ws4py is what is listed in their docs as the framework to use for Web Sockets. Thanks for any feedback that can be provided. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > On Nov 3, 2015, at 8:05 AM, Israel Brewster wrote: > > I posted this to the CherryPy and ws4py mailing lists, but in the week since > I did that I've only gotten two or three views on each list, and no > responses, so as a last-ditch effort I thought I'd post here. Maybe someone > with more general python knowledge than me can figure out the traceback and > from there a solution. > > Is it possible to use ws4py in conjunction with the cpstats CherryPy tool? I > have a CherryPy (3.8.0) web app that uses web sockets via ws4py. Tested and > working. I am now trying to get a little more visibility into the functioning > of the server, so to that end I enabled the cpstats tool by adding the > following line to my '/' configuration: > > tools.cpstats.on=True > > Unfortunately, as soon as I do that, attempts to connect a web socket start > failing with the following traceback: > > [28/Oct/2015:08:18:48] > Traceback (most recent call last): > File > "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", > line 104, in run >hook() > File > "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", > line 63, in __call__ >return self.callback(**self.kwargs) > File "build/bdist.macosx-10.10-intel/egg/ws4py/server/cherrypyserver.py", > line 200, in upgrade >ws_conn = get_connection(request.rfile.rfile) > File "build/bdist.macosx-10.10-intel/egg/ws4py/compat.py", line 43, in > get_connection >return fileobj._sock > AttributeError: 'KnownLengthRFile' object has no attribute '_sock' > [28/Oct/2015:08:18:48] HTTP > Request Headers: > PRAGMA: no-cache > COOKIE: autoTabEnabled=true; fleetStatusFilterCompany=7H; > fleetStatusFilterLocation=ALL; fleetStatusRefreshInterval=5; inputNumLegs=5; > session_id=5c8303896aff419c175c79dfadbfdc9d75e6c45a > UPGRADE: websocket > HOST: flbubble.ravnalaska.net:8088 > ORIGIN: http://flbubble.ravnalaska.net > CONNECTION: Upgrade > CACHE-CONTROL: no-cache > SEC-WEBSOCKET-VERSION: 13 > SEC-WEBSOCKET-EXTENSIONS: x-webkit-deflate-frame > USER-AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) > AppleWebKit/601.2.7 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.7 > SEC-WEBSOCKET-KEY: Szh6Uoe+WzqKR1DgW8JcXA== > Remote-Addr: 10.9.1.59 > [28/Oct/2015:08:18:48] HTTP > Traceback (most recent call last): > File > "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", > line 661, in respond >self.hooks.run('before_request_body') > File > "/Library/Python/2.7/site-packages/CherryPy-3.8.0-py2.7.egg/cherrypy/_cprequest.py", > line 114, in run >raise exc > AttributeError: 'KnownLengthRFile' object has no attribute '_sock' > > Disable tools.cpstats.on, and the sockets start working again. Is there some > way I can fix this so I can use sockets as well as gather stats from my > application? Thanks. > > --- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > --- > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Designing DBI compliant SQL parameters for module
My company uses a database (4th dimension) for which there was no python DBI compliant driver available (I had to use ODBC, which I felt was cludgy). However, I did discover that the company had a C driver available, so I went ahead and used CFFI to wrap this driver into a DBI compliant python module (https://pypi.python.org/pypi/p4d). This works well (still need to make it python 3.x compatible), but since the underlying C library uses "qmark" style parameter markers, that's all I implemented in my module. I would like to expand the module to be able to use the more-common (or at least easier for me) "format" and "pyformat" parameter markers, as indicated in the footnote to PEP-249 (https://www.python.org/dev/peps/pep-0249/#id2 at least for the pyformat markers). Now I am fairly confidant that I can write code to convert such placeholders into the qmark style markers that the underlying library provides, but before I go and re-invent the wheel, is there already code that does this which I can simply use, or modify? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Bi-directional sub-process communication
I have a multi-threaded python app (CherryPy WebApp to be exact) that launches a child process that it then needs to communicate with bi-driectionally. To implement this, I have used a pair of Queues: a child_queue which I use for master->child communication, and a master_queue which is used for child->master communication. The way I have the system set up, the child queue runs a loop in a tread that waits for messages on child_queue, and when received responds appropriately depending on the message received, which sometimes involves posting a message to master_queue. On the master side, when it needs to communicate with the child process, it posts a message to child_queue, and if the request requires a response it will then immediately start waiting for a message on master_queue, typically with a timeout. While this process works well in testing, I do have one concern (maybe unfounded) and a real-world issue Concern: Since the master process is multi-threaded, it seems likely enough that multiple threads on the master side would make requests at the same time. I understand that the Queue class has locks that make this fine (one thread will complete posting the message before the next is allowed to start), and since the child process only has a single thread processing messages from the queue, it should process them in order and post the responses (if any) to the master_queue in order. But now I have multiple master processes all trying to read master_queue at the same time. Again, the locks will take care of this and prevent any overlapping reads, but am I guaranteed that the threads will obtain the lock and therefore read the responses in the right order? Or is there a possibility that, say, thread three will get the response that should have been for thread one? Is this something I need to take into consideration, and if so, how? Real-world problem: While as I said this system worked well in testing, Now that I have gotten it out into production I've occasionally run into a problem where the master thread waiting for a response on master_queue times out while waiting. This causes a (potentially) two-fold problem, in that first off the master process doesn't get the information it had requested, and secondly that I *could* end up with an "orphaned" message on the queue that could cause problems the next time I try to read something from it. I currently have the timeout set to 3 seconds. I can, of course, increase that, but that could lead to a bad user experience - and might not even help the situation if something else is going on. The actual exchange is quite simple: On the master side, I have this code: config.socket_queue.put('GET_PORT') try: port = config.master_queue.get(timeout=3) #wait up to three seconds for a response except Empty: port = 5000 # default. Can't hurt to try. Which, as you might have been able to guess, tries to ask the child process (an instance of a tornado server, btw) what port it is listening on. The child process then, on getting this message from the queue, runs the following code: elif item == 'GET_PORT': port = utils.config.getint('global', 'tornado.port') master_queue.put(port) So nothing that should take any significant time. Of course, since this is a single thread handling any number of requests, it is possible that the thread is tied up responding to a different request (or that the GIL is preventing the thread from running at all, since another thread might be commandeering the processor), but I find it hard to believe that it could be tied up for more than three seconds. So is there a better way to do sub-process bi-directional communication that would avoid these issues? Or do I just need to increase the timeout (or remove it altogether, at the risk of potentially causing the thread to hang if no message is posted)? And is my concern justified, or just paranoid? Thanks for any information that can be provided! --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Bi-directional sub-process communication
On Nov 23, 2015, at 11:51 AM, Ian Kelly wrote: > > On Mon, Nov 23, 2015 at 12:55 PM, Ian Kelly wrote: >> On Mon, Nov 23, 2015 at 10:54 AM, Israel Brewster >> wrote: >>> Concern: Since the master process is multi-threaded, it seems likely enough >>> that multiple threads on the master side would make requests at the same >>> time. I understand that the Queue class has locks that make this fine (one >>> thread will complete posting the message before the next is allowed to >>> start), and since the child process only has a single thread processing >>> messages from the queue, it should process them in order and post the >>> responses (if any) to the master_queue in order. But now I have multiple >>> master processes all trying to read master_queue at the same time. Again, >>> the locks will take care of this and prevent any overlapping reads, but am >>> I guaranteed that the threads will obtain the lock and therefore read the >>> responses in the right order? Or is there a possibility that, say, thread >>> three will get the response that should have been for thread one? Is this >>> something I need to take into consideration, and if so, how? >> >> Yes, if multiple master threads are waiting on the queue, it's >> possible that a master thread could get a response that was not >> intended for it. As far as I know there's no guarantee that the >> waiting threads will be woken up in the order that they called get(), >> but even if there are, consider this case: >> >> Thread A enqueues a request. >> Thread B preempts A and enqueues a request. >> Thread B calls get on the response queue. >> Thread A calls get on the response queue. >> The response from A's request arrives and is given to B. >> >> Instead of having the master threads pull objects off the response >> queue directly, you might create another thread whose sole purpose is >> to handle the response queue. That could look like this: >> >> >> request_condition = threading.Condition() >> response_global = None >> >> def master_thread(): >>global response_global >>with request_condition: >>request_queue.put(request) >>request_condition.wait() >># Note: the Condition should remain acquired until >> response_global is reset. >>response = response_global >>response_global = None >>if wrong_response(response): >>raise RuntimeError("got a response for the wrong request") >>handle_response(response) >> >> def response_thread(): >>global response_global >>while True: >>response = response_queue.get() >>with request_condition: >>response_global = response >>request_condition.notify() > > Actually I realized that this fails because if two threads get > notified at about the same time, they could reacquire the Condition in > the wrong order and so get the wrong responses. > > Concurrency, ugh. > > It's probably better just to have a Condition/Event per thread and > have the response thread identify the correct one to notify, rather > than just notify a single shared Condition and hope the threads wake > up in the right order. Tell me about it :-) I've actually never worked with conditions or notifications (actually even this bi-drectional type of communication is new to me), so I'll have to look into that and figure it out. Thanks for the information! > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Bi-directional sub-process communication
On Nov 23, 2015, at 12:45 PM, Cameron Simpson wrote: > > On 23Nov2015 12:22, Israel Brewster wrote: >> On Nov 23, 2015, at 11:51 AM, Ian Kelly wrote: >>> Concurrency, ugh. > > I'm a big concurrency fan myself. > >>> It's probably better just to have a Condition/Event per thread and >>> have the response thread identify the correct one to notify, rather >>> than just notify a single shared Condition and hope the threads wake >>> up in the right order. >> >> Tell me about it :-) I've actually never worked with conditions or >> notifications (actually even this bi-drectional type of communication is new >> to me), so I'll have to look into that and figure it out. Thanks for the >> information! > > I include a tag with every request, and have the responses include the tag; > the request submission function records the response hander in a mapping by > tag and the response handing thread looks up the mapping and passes the > response to the right handler. > > Works just fine and avoids all the worrying about ordering etc. > > Israel, do you have control over the protocol between you and your > subprocess? If so, adding tags is easy and effective. I do, and the basic concept makes sense. The one difficulty I am seeing is getting back to the thread that requested the data. Let me know if this makes sense or I am thinking about it wrong: - When a thread requests some data, it sends the request as a dictionary containing a tag (unique to the thread) as well as the request - When the child processes the request, it encodes the response as a dictionary containing the tag and the response data - A single, separate thread on the "master" side parses out responses as they come in and puts them into a dictionary keyed by tag - The requesting threads, after putting the request into the Queue, would then block waiting for data to appear under their key in the dictionary Of course, that last step could be interesting - implementing the block in such a way as to not tie up the processor, while still getting the data "as soon" as it is available. Unless there is some sort of built-in notification system I could use for that? I.e. the thread would "subscribe" to a notification based on its tag, and then wait for notification. When the master processing thread receives data with said tag, it adds it to the dictionary and "publishes" a notification to that tag. Or perhaps the notification itself could contain the payload? Thanks for the information! > > Cheers, > Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: Bi-directional sub-process communication
On Nov 23, 2015, at 1:43 PM, Chris Kaynor wrote: > > On Mon, Nov 23, 2015 at 2:18 PM, Israel Brewster > wrote: > >> Of course, that last step could be interesting - implementing the block in >> such a way as to not tie up the processor, while still getting the data "as >> soon" as it is available. Unless there is some sort of built-in >> notification system I could use for that? I.e. the thread would "subscribe" >> to a notification based on its tag, and then wait for notification. When >> the master processing thread receives data with said tag, it adds it to the >> dictionary and "publishes" a notification to that tag. Or perhaps the >> notification itself could contain the payload? > > > There are a few ways I could see handling this, without having the threads > spinning and consuming CPU: > > 1. Don't worry about having the follow-up code run in the same thread, > and use a simple callback. This callback could be dispatched to a thread > via a work queue, however you may not get the same thread as the one that > made the request. This is probably the most efficient method to use, as the > threads can continue doing other work while waiting for a reply, rather > than blocking. It does make it harder to maintain state between the pre- > and post-request functions, however. > 2. Have a single, global, event variable that wakes all threads waiting > on a reply, each of which then checks to see if the reply is for it, or > goes back to sleep. This is good if most of the time, only a few threads > will be waiting for a reply, and checking if the correct reply came in is > cheap. This is probably good enough, unless you have a LOT of threads > (hundreds). > 3. Have an event per thread. This will use less CPU than the second > option, however does require more memory and OS resources, and so will not > be viable for huge numbers of threads, though if you hit the limit, you are > probably using threads wrong. > 4. Have an event per request. This is only better than #3 if a single > thread may make multiple requests at once, and can do useful work when any > of them get a reply back (if they need all, it will make no difference). > > Generally, I would use option #1 or #2. Option 2 has the advantage of > making it easy to write the functions that use the functionality, while > option 1 will generally use fewer resources, and allows threads to continue > to be used while waiting for replies. How much of a benefit that is depends > on exactly what you are doing. While I would agree with #1 in general, the threads, in this case, are CherryPy threads, so I need to get the data and return it to the client in the same function call, which of course means the thread needs to block until the data is ready - it can't return and let the result be processed "later". Essentially there are times that the web client needs some information that only the Child process has. So the web client requests the data from the master process, and the master process then turns around and requests the data from the child, but it needs to get the data back before it can return it to the web client. So it has to block waiting for the data. Thus we come to option #2 (or 3), which sounds good but I have no clue how to implement :-) Maybe something like http://pubsub.sourceforge.net ? I'll dig into that. > > Option #4 would probably be better implemented using option #1 in all cases > to avoid problems with running out of OS memory - threading features > generally require more limited OS resources than memory. Option #3 will > also often run into the same issues as option #4 in the cases it will > provide any benefit over option #2. > > Chris > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Bi-directional sub-process communication
On Nov 23, 2015, at 3:05 PM, Dennis Lee Bieber wrote: > > On Mon, 23 Nov 2015 08:54:38 -0900, Israel Brewster > declaimed the following: > >> Concern: Since the master process is multi-threaded, it seems likely enough >> that multiple threads on the master side would make requests at the same >> time. I understand that the Queue class has locks that make > > Multiple "master" threads, to me, means you do NOT have a "master > process". But I do: the CherryPy "application", which has multiple threads - one per request (and perhaps a few more) to be exact. It's these request threads that generate the calls to the child process. > > Let there be a Queue for EVERY LISTENER. > > Send the Queue as part of the request packet. No luck: "RuntimeError: Queue objects should only be shared between processes through inheritance" This IS a master process, with multiple threads, trying to communicate with a child process. That said, with some modifications this sort of approach could still work. --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > > Let the subthread reply to the queue that was provided via the packet > > Voila! No intermixing of "master/slave" interaction; each slave only > replies to the master that sent it a command; each master only receives > replies from slaves it has commanded. Slaves can still be shared, as they > are given the information of which master they need to speak with. > > > > -- > Wulfraed Dennis Lee Bieber AF6VN >wlfr...@ix.netcom.comHTTP://wlfraed.home.netcom.com/ > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: [Python] Matrix convergence
Zhengzheng Pan <[EMAIL PROTECTED]> writes: > Hi all, > > I'm trying to check whether a (stochastic/transition) matrix > converges, i.e. a function/method that will return True if the input > matrix sequence shows convergence and False otherwise. The background > is a Markov progress, so the special thing about the transition matrix > is the values of elements in each row add up to be 1. Fail to find any > relevant build-in methods in Python... > > Currently the standard theorem on convergence in Markov chain > literature is involved with the properties of aperiodic and > connectivity, which I'm not able to implement with Python either... > Therefore I'm actually also looking for alternative conditions to > check for convergence, and am willing to sacrifice a little bit of > precision. > > If you have any ideas/suggestions on how to examine the convergence of > a matrix in general or specifically about this kind of matrices, and > are willing to share, that'd be really great. > Do you mean you have an n x n stochastic matrix T and you're trying to find out whether the sequence T^j converges? Let A = (I + T)^(n-1) and B = T^((n-1)^2+1). Powers of matrices are easy to compute using repeated squaring. i and j are in the same class iff A_{ij} > 0 and A_{ji} > 0. A class containing state i is recurrent iff A_{ij} = 0 for all j not in the class. A recurrent class containing state i is aperiodic iff B_{ij} > 0 for all j in the class. T^j converges iff all recurrent classes are aperiodic. Thus the test can be written as follows: For each i, either there is some j for which A_{ij} > 0 but A_{ji} = 0, or there is no j for which A_{ij} > 0 but B_{ij} = 0. -- Robert Israel [EMAIL PROTECTED] Department of Mathematicshttp://www.math.ubc.ca/~israel University of British ColumbiaVancouver, BC, Canada -- http://mail.python.org/mailman/listinfo/python-list
database query - logic question
Thanks for anyone who takes the time to read this. If I posted to the wrong list, I apologize and you can disregard. I need help with a script to pull data from a postgres database. I'm ok with the database connection just not sure how to parse the data to get the results I need. I'm running Python 2.4.4. For what it's worth, once I can get my logic correct I'll be publishing the reports mentioned below via zope for web clients. Here is a small sample of the records in the table: namedatetimestatus machine101/01/2008 13:00:00system ok machine101/01/2008 13:05:00system ok machine101/01/2008 13:10:00status1 machine101/01/2008 13:10:30status1 machine101/01/2008 13:11:00system ok machine101/01/2008 13:16:30status2 machine101/01/2008 13:17:00status2 machine101/01/2008 13:17:30status2 machine101/01/2008 13:18:00status2 machine101/01/2008 13:18:30status2 machine101/01/2008 13:19:00system ok machine101/01/2008 13:24:00status2 machine101/01/2008 13:24:30status2 machine101/01/2008 13:25:00system ok I need to report from this data. The detail report needs to be something like: machine101/01/2008 13:10:00 status1 00:01:30 machine101/01/2008 13:16:30 status2 00:02:30 machine101/01/2008 13:24:00 status2 00:01:00 and the summary needs to be machine101/01/2008 total 'status1' time = 00:01:30 machine101/01/2008 total 'status2' time = 00:03:30 _ machine101/01/2008 total 'non-OK' time = 00:05:00 #this is the sum of status1 and status2 times The 'machine1' system is periodically checked and the system status is written to the database table with the machinename/date/time/status. Everything that isn't a 'system ok' status is bad. For me to determine the amount of time a machine was in a bad status I'm taking the first time a machine has a 'system ok' status after a bad status and subtracting from that time the time that a machine first went into that bad status. From my table above: machine1 went into 'status2' status at 13:16:30 and came out of 'status2' to a 'system ok' status at 13:19:00. So the downtime would be 13:19:00 - 13:16:30 = 00:02:30 I'm not sure how to query when a 'bad' status is found to find the next 'good' status and calculate based on the times. Essentially, I need help creating the reports mentioned above. Your questions may also help clarify my fuzzy description. Thanks for any help. Reply with questions. Israel -- http://mail.python.org/mailman/listinfo/python-list
__eq__ problem with subclasses
I am very confused by the following behavior. I have a base class which defines __eq__. I then have a subclass which does not. When I evaluate the expression a==b, where a and b are elements of these classes, __eq__ is always called with the subclass as the first argument, regardless of the order I write my expression. I can't see why this would be desired behavior. A sample code for clarity: class c1(object): def __eq__(self, op): print "Calling c1.__eq__("+str(type(self))+","+str(type(op))+")" return False class c2(c1): pass a1=c1() a2=c2() a1==a2 a2==a1 The output is: Calling c1.__eq__(,) Calling c1.__eq__(,) Why does a1==a2 generate a call to c1.__eq__(a2, a1) instead of c1.__eq__(a1, e2)? This is important because I am writing a math library in which '==' is being used for assignment, i.e., 'a==b+c' set 'a' equal to the sum of 'b' and 'c'. So 'a==b' is very different from 'b==a'. -- Daniel M. Israel [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: __eq__ problem with subclasses
Scott David Daniels wrote: Daniel Israel wrote: I am very confused by the following behavior. I have a base class which defines __eq__. I then have a subclass which does not. When I evaluate the expression a==b, where a and b are elements of these classes, __eq__ is always called with the subclass as the first argument, regardless of the order I write my expression. I can't see why this would be desired behavior. This is the quickest way to make sure that an over-ridden __eq__ gets called ... The cost of allowing the expression order to determine the call made when no comparison override is provided would be more computation before finally dispatching on the method. Would you want to slow down the comparison to get the behavior you seem to want? Well, yes, actually. But then I am trying to get my program to work. :) Seriously, thank you, now I at least understand the rational. There is an implementation philosophy here, I think. What is the purpose of allowing over-rides of these operators? If it is purely to allow code to mimic numeric types, then it is justified the demand that over-ride code respect certain properties of these operators. (Although, for example, * is not commutative for general matricies. So even then it is not totally obvious what properties should be required.) But the reality is that many programmers will use these operators for other purposes. If the language designer's intent is to allow for this, then there shouldn't be surprising exceptional behavior like this, and you have to live with the cost penalty. Personally, I think it is worth the cost to allow the programmer flexibility. But if that is not the decision of the language designers, then I would say this is a documentation bug. This behavior needs to be clear to anyone reading the manual. -- Daniel M. Israel [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
BECOME A DOT.COM MILLIONAIRE WITH ONLY $5.99Cents or $1K.
-- http://mail.python.org/mailman/listinfo/python-list
BECOME A DOT.COM MILLIONAIRE WITH ONLY $5.99Cents or $1K.
BECOME A DOT.COM MILLIONAIREInvest $1,000. Get back up to $3,000 a day ,$100,000 monthly. for 1 year. Silent Partners. Do no work.www.vosar.net416-903-5685775-333-1125[EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Proper way to run CherryPy app as a daemon?
I am wanting to run a CherryPy app as a daemon on CentOS 6 using an init.d script. By subscribing to the "Daemonizer" and PIDFile cherrypy plugins, I have been able to write an init.d script that starts and stops my CherryPy application. There's only one problem: it would appear that the program daemonizes, thus allowing the init.d script to return a good start, as soon as I call cherrypy.engine.start(), but *before* the cherrypy app has actually started. Particularly, this occurs before cherrypy has bound to the desired port. The end result is that running "service start" returns OK, indicating that the app is now running, even when it cannot bind to the port, thus preventing it from actually starting. This is turn causes issues with my clustering software which thinks it started just fine, when in fact it never *really* started. As such, is there a way to delay the demonization until I call cherrypy.engine.block()? Or some other way to prevent the init.d script from indicating a successful start until the process has actually bound to the needed port and fully started? What is the proper way of doing this? Thanks! ------- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
CherryPy Session object creation logic
I have a CherryPy app, for which I am using a PostgreSQL session. To be more exact, I modified a MySQL session class I found to work with PostgreSQL instead, and then I put this line in my code: cherrypy.lib.sessions.PostgresqlSession = PostgreSQLSession And this works fine. One thing about its behavior is bugging me, however: accessing a page instantiates (and deletes) *many* instances of this class, all for the same session. Doing some debugging, I counted 21 calls to the __init__ function when loading a single page. Logging in and displaying the next page hit it an additional 8 times. My theory is that essentially every time I try to read from or write to the session, CherryPy is instantiating a new PostgreSQLSession object, performing the request, and deleting the session object. In that simple test, that means 29 connections to the database, 29 instantiations, etc - quite a bit of overhead, not to mention the load on my database server making/breaking those connections (although it handles it fine). Is this "normal" behavior? Or did I mess something up with my session class? I'm thinking that ideally CherryPy would only create one object - and therefore, one DB connection - for a given session, and then simply hold on to that object until that session expired. But perhaps not? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Psycopg2 pool clarification
I've been using the psycopg2 pool class for a while now, using code similar to the following: >>> pool=ThreadedConnectionPool(0,5,) >>> conn1=pool.getconn() >>> >>> pool.putconn(conn1) repeat later, or perhaps "simultaneously" in a different thread. and my understanding was that the pool logic was something like the following: - create a "pool" of connections, with an initial number of connections equal to the "minconn" argument - When getconn is called, see if there is an available connection. If so, return it. If not, open a new connection and return that (up to "maxconn" total connections) - When putconn is called, return the connection to the pool for re-use, but do *not* close it (unless the close argument is specified as True, documentation says default is False) - On the next request to getconn, this connection is now available and so no new connection will be made - perhaps (or perhaps not), after some time, unused connections would be closed and purged from the pool to prevent large numbers of only used once connections from laying around. However, in some testing I just did, this doesn't appear to be the case, at least based on the postgresql logs. Running the following code: >>> pool=ThreadedConnectionPool(0,5,) >>> conn1=pool.getconn() >>> conn2=pool.getconn() >>> pool.putconn(conn1) >>> pool.putconn(conn2) >>> conn3=pool.getconn() >>> pool.putconn(conn3) produced the following output in the postgresql log: 2017-06-02 14:30:26 AKDT LOG: connection received: host=::1 port=64786 2017-06-02 14:30:26 AKDT LOG: connection authorized: user=logger database=flightlogs 2017-06-02 14:30:35 AKDT LOG: connection received: host=::1 port=64788 2017-06-02 14:30:35 AKDT LOG: connection authorized: user=logger database=flightlogs 2017-06-02 14:30:46 AKDT LOG: disconnection: session time: 0:00:19.293 user=logger database=flightlogs host=::1 port=64786 2017-06-02 14:30:53 AKDT LOG: disconnection: session time: 0:00:17.822 user=logger database=flightlogs host=::1 port=64788 2017-06-02 14:31:15 AKDT LOG: connection received: host=::1 port=64790 2017-06-02 14:31:15 AKDT LOG: connection authorized: user=logger database=flightlogs 2017-06-02 14:31:20 AKDT LOG: disconnection: session time: 0:00:05.078 user=logger database=flightlogs host=::1 port=64790 Since I set the maxconn parameter to 5, and only used 3 connections, I wasn't expecting to see any disconnects - and yet as soon as I do putconn, I *do* see a disconnection. Additionally, I would have thought that when I pulled connection 3, there would have been two connections available, and so it wouldn't have needed to connect again, yet it did. Even if I explicitly say close=False in the putconn call, it still closes the connection and has to open What am I missing? From this testing, it looks like I get no benefit at all from having the connection pool, unless you consider an upper limit to the number of simultaneous connections a benefit? :-) Maybe a little code savings from not having to manually call connect and close after each connection, but that's easily gained by simply writing a context manager. I could get *some* limited benefit by raising the minconn value, but then I risk having connections that are *never* used, yet still taking resources on the DB server. Ideally, it would open as many connections as are needed, and then leave them open for future requests, perhaps with an "idle" timeout. Is there any way to achieve this behavior? --- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Psycopg2 pool clarification
--- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 --- > On Jun 7, 2017, at 10:31 PM, dieter wrote: > > israel writes: >> On 2017-06-06 22:53, dieter wrote: >> ... >> As such, using psycopg2's pool is essentially >> worthless for me (plenty of use for it, i'm sure, just not for me/my >> use case). > > Could you not simply adjust the value for the "min" parameter? > If you want at least "n" open connections, then set "min" to "n". Well, sure, if I didn't care about wasting resources (which, I guess many people don't). I could set "n" to some magic number that would always give "enough" connections, such that my application never has to open additional connections, then adjust that number every few months as usage changes. In fact, now that I know how the logic of the pool works, that's exactly what I'm doing until I am confident that my caching replacement is solid. Of course, in order to avoid having to open/close a bunch of connections during the times when it is most critical - that is, when the server is under heavy load - I have to set that number arbitrarily high. Furthermore, that means that much of the time many, if not most, of those connections would be idle. Each connection uses a certain amount of RAM on the server, not to mention using up limited connection slots, so now I've got to think about if my server is sized properly to be able to handle that load not just occasionally, but constantly - when reducing server load by reducing the frequency of connections being opened/closed was the goal in the first place. So all I've done is trade dynamic load for static load - increasing performance at the cost of resources, rather than more intelligently using the available resources. All-in-all, not the best solution, though it does work. Maybe if load was fairly constant it would make more sense though. So like I said *my* use case, which is a number of web apps with varying loads, loads that also vary from day-to-day and hour-to-hour. On the other hand, a pool that caches connections using the logic I laid out in my original post would avoid the issue. Under heavy load, it could open additional connections as needed - a performance penalty for the first few users over the min threshold, but only the first few, rather than all the users over a certain threshold ("n"). Those connections would then remain available for the duration of the load, so it doesn't need to open/close numerous connections. Then, during periods of lighter load, the unused connections can drop off, freeing up server resources for other uses. A well-written pool could even do something like see that the available connection pool is running low, and open a few more connections in the background, thus completely avoiding the connection overhead on requests while never having more than a few "extra" connections at any given time. Even if you left of the expiration logic, it would still be an improvement, because while unused connections wouldn't d rop, the "n" open connections could scale up dynamically until you have "enough" connections, without having to figure out and hard-code that "magic number" of open connections. Why wouldn't I want something like that? It's not like its hard to code - took me about an hour and a half to get to a working prototype yesterday. Still need to write tests and add some polish, but it works. Perhaps, though, the common thought is just "throw more hardware at it and keep a lot of connections open at all time?" Maybe I was raised to conservatively, or the company I work for is too poor :-D > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re-running unittest
Hi I'm writing some code that automatically execute some registered unit test in a way to automate the process. A sample code follows to illustrate what I'm doing: class PruebasDePrueba(unittest.TestCase): def testUnTest(self): a = 2 b = 1 self.assertEquals(a, b) def runTests(): loader = unittest.TestLoader() result = unittest.TestResult() suite = loader.loadTestsFromName("import_tests.PruebasDePrueba") suite.run(result) print "Errores: ", len(result.errors) print "Fallos: ", len(result.failures) if __name__ == "__main__": runTests() raw_input("Modify [fix] the test and press ENTER to continue") The code executes the tests from the class PruebasDePrueba, as the user to "fix" the failing test and then executes the tests again after ENTER is pressed. The test's initial state is "fail" so, equaling the values of a or b in the second execution I wait the test does not fails again, but it does. I've changed the original code in very different ways trying to figure out what is wrong with it but no success The problem is not reproduced if instead of loading the test from the TestCase (import_tests.PruebasDePrueba) they are loaded referring the container module and this behaves like this because I wrote a class that inherits from unittest.TestLoader abd re-defines the loadTestsFromModule(module) then every time this method is called, the module is reloaded via "reload" python's function. I would like to do the same with TestCases. I have written this problem to several other python lists but I have not received any answer, hope this time is different, I'd like to thaks in advance, regards -- Israel Fdez. Cabrera [EMAIL PROTECTED] . 0 . . . 0 0 0 0 -- http://mail.python.org/mailman/listinfo/python-list
Re: Re-running unittest
Hi Gabriel > Perhaps the best option is to run the tests in another process. Use any of > the available IPC mechanisms to gather the test results. This has the > added advantage of isolating the tested code from your GUI testing > framework; all requested resources are released to the OS; the tests run > in a predictable environment; etc. This could be a solution it will complicate the GUI design, for example the current test number to implement a progress bar or something like that, but I think a well designed IPC may help again. Thks again for the tips, regards -- ____ Israel Fdez. Cabrera [EMAIL PROTECTED] Linux registered user No.: 270292 [http://counter.li.org] . 0 . . . 0 0 0 0 -- http://mail.python.org/mailman/listinfo/python-list