On Nov 1, 2017, at 9:04 AM, Israel Brewster <isr...@ravnalaska.net> wrote: > > Let me rephrase the question, see if I can simplify it. I need to be able to > access a defaultdict from two different threads - one thread that responds to > user requests which will populate the dictionary in response to a user > request, and a second thread that will keep the dictionary updated as new > data comes in. The value of the dictionary will be a timestamp, with the > default value being datetime.min, provided by a lambda: > > lambda: datetime.min > > At the moment my code is behaving as though each thread has a *separate* > defaultdict, even though debugging shows the same addresses - the background > update thread never sees the data populated into the defaultdict by the main > thread. I was thinking race conditions or the like might make it so one > particular loop of the background thread occurs before the main thread, but > even so subsequent loops should pick up on the changes made by the main > thread. > > How can I *properly* share a dictionary like object between two threads, with > both threads seeing the updates made by the other?
For what it's worth, if I insert a print statement in both threads (which I am calling "Get AC", since that is the function being called in the first thread, and "update", since that is the purpose of the second thread), I get the following output: Length at get AC: 54 ID: 4524152200 Time: 2017-11-01 09:41:24.474788 Length At update: 1 ID: 4524152200 Time: 2017-11-01 09:41:24.784399 Length At update: 2 ID: 4524152200 Time: 2017-11-01 09:41:25.228853 Length At update: 3 ID: 4524152200 Time: 2017-11-01 09:41:25.530434 Length At update: 4 ID: 4524152200 Time: 2017-11-01 09:41:25.532073 Length At update: 5 ID: 4524152200 Time: 2017-11-01 09:41:25.682161 Length At update: 6 ID: 4524152200 Time: 2017-11-01 09:41:26.807127 ... So the object ID hasn't changed as I would expect it to if, in fact, we have created a separate object for the thread. And the first call that populates it with 54 items happens "well" before the first update call - a full .3 seconds, which I would think would be an eternity is code terms. So it doesn't even look like it's a race condition causing the issue. It seems to me this *has* to be something to do with the use of threads, but I'm baffled as to what. > ----------------------------------------------- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > ----------------------------------------------- > > > > >> On Oct 31, 2017, at 9:38 AM, Israel Brewster <isr...@ravnalaska.net> wrote: >> >> A question that has arisen before (for example, here: >> https://mail.python.org/pipermail/python-list/2010-January/565497.html >> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is >> the question of "is defaultdict thread safe", with the answer generally >> being a conditional "yes", with the condition being what is used as the >> default value: apparently default values of python types, such as list, are >> thread safe, whereas more complicated constructs, such as lambdas, make it >> not thread safe. In my situation, I'm using a lambda, specifically: >> >> lambda: datetime.min >> >> So presumably *not* thread safe. >> >> My goal is to have a dictionary of aircraft and when they were last "seen", >> with datetime.min being effectively "never". When a data point comes in for >> a given aircraft, the data point will be compared with the value in the >> defaultdict for that aircraft, and if the timestamp on that data point is >> newer than what is in the defaultdict, the defaultdict will get updated with >> the value from the datapoint (not necessarily current timestamp, but rather >> the value from the datapoint). Note that data points do not necessarily >> arrive in chronological order (for various reasons not applicable here, it's >> just the way it is), thus the need for the comparison. >> >> When the program first starts up, two things happen: >> >> 1) a thread is started that watches for incoming data points and updates the >> dictionary as per above, and >> 2) the dictionary should get an initial population (in the main thread) from >> hard storage. >> >> The behavior I'm seeing, however, is that when step 2 happens (which >> generally happens before the thread gets any updates), the dictionary gets >> populated with 56 entries, as expected. However, none of those entries are >> visible when the thread runs. It's as though the thread is getting a >> separate copy of the dictionary, although debugging says that is not the >> case - printing the variable from each location shows the same address for >> the object. >> >> So my questions are: >> >> 1) Is this what it means to NOT be thread safe? I was thinking of race >> conditions where individual values may get updated wrong, but this >> apparently is overwriting the entire dictionary. >> 2) How can I fix this? >> >> Note: I really don't care if the "initial" update happens after the thread >> receives a data point or two, and therefore overwrites one or two values. I >> just need the dictionary to be fully populated at some point early in >> execution. In usage, the dictionary is used to see of an aircraft has been >> seen "recently", so if the most recent datapoint gets overwritten with a >> slightly older one from disk storage, that's fine - it's just if it's still >> showing datetime.min because we haven't gotten in any datapoint since we >> launched the program, even though we have "recent" data in disk storage >> thats a problem. So I don't care about the obvious race condition between >> the two operations, just that the end result is a populated dictionary. Note >> also that as datapoint come in, they are being written to disk, so the disk >> storage doesn't lag significantly anyway. >> >> The framework of my code is below: >> >> File: watcher.py >> >> last_points = defaultdict(lambda:datetime.min) >> >> # This function is launched as a thread using the threading module when the >> first client connects >> def watch(): >> while true: >> <wait for datapoint> >> pointtime= <extract/parse timestamp from datapoint> >> if last_points[<aircraft_identifier>] < pointtime: >> <do stuff> >> last_points[<aircraft_identifier>]=pointtime >> #DEBUGGING >> print("At update:", len(last_points)) >> >> >> File: main.py: >> >> from .watcher import last_points >> >> # This function will be triggered by a web call from a client, so could >> happen at any time >> # Client will call this function immediately after connecting, as well as in >> response to various user actions. >> def getac(): >> <load list of aircraft and times from disk> >> <do stuff to send the list to the client> >> for record in aclist: >> last_points[<aircraft_identifier>]=record_timestamp >> #DEBUGGING >> print("At get AC:", len(last_points)) >> >> >> ----------------------------------------------- >> Israel Brewster >> Systems Analyst II >> Ravn Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7293 >> ----------------------------------------------- >> >> >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list