On Nov 1, 2017, at 9:04 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
> 
> Let me rephrase the question, see if I can simplify it. I need to be able to 
> access a defaultdict from two different threads - one thread that responds to 
> user requests which will populate the dictionary in response to a user 
> request, and a second thread that will keep the dictionary updated as new 
> data comes in. The value of the dictionary will be a timestamp, with the 
> default value being datetime.min, provided by a lambda:
> 
> lambda: datetime.min
> 
> At the moment my code is behaving as though each thread has a *separate* 
> defaultdict, even though debugging shows the same addresses - the background 
> update thread never sees the data populated into the defaultdict by the main 
> thread. I was thinking race conditions or the like might make it so one 
> particular loop of the background thread occurs before the main thread, but 
> even so subsequent loops should pick up on the changes made by the main 
> thread.
> 
> How can I *properly* share a dictionary like object between two threads, with 
> both threads seeing the updates made by the other?

For what it's worth, if I insert a print statement in both threads (which I am 
calling "Get AC", since that is the function being called in the first thread, 
and "update", since that is the purpose of the second thread), I get the 
following output:

Length at get AC:  54 ID: 4524152200  Time: 2017-11-01 09:41:24.474788
Length At update:  1 ID: 4524152200  Time: 2017-11-01 09:41:24.784399
Length At update:  2 ID: 4524152200  Time: 2017-11-01 09:41:25.228853
Length At update:  3 ID: 4524152200  Time: 2017-11-01 09:41:25.530434
Length At update:  4 ID: 4524152200  Time: 2017-11-01 09:41:25.532073
Length At update:  5 ID: 4524152200  Time: 2017-11-01 09:41:25.682161
Length At update:  6 ID: 4524152200  Time: 2017-11-01 09:41:26.807127
...

So the object ID hasn't changed as I would expect it to if, in fact, we have 
created a separate object for the thread. And the first call that populates it 
with 54 items happens "well" before the first update call - a full .3 seconds, 
which I would think would be an eternity is code terms. So it doesn't even look 
like it's a race condition causing the issue.

It seems to me this *has* to be something to do with the use of threads, but 
I'm baffled as to what.

> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
> 
> 
> 
> 
>> On Oct 31, 2017, at 9:38 AM, Israel Brewster <isr...@ravnalaska.net> wrote:
>> 
>> A question that has arisen before (for example, here: 
>> https://mail.python.org/pipermail/python-list/2010-January/565497.html 
>> <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is 
>> the question of "is defaultdict thread safe", with the answer generally 
>> being a conditional "yes", with the condition being what is used as the 
>> default value: apparently default values of python types, such as list, are 
>> thread safe, whereas more complicated constructs, such as lambdas, make it 
>> not thread safe. In my situation, I'm using a lambda, specifically:
>> 
>> lambda: datetime.min
>> 
>> So presumably *not* thread safe.
>> 
>> My goal is to have a dictionary of aircraft and when they were last "seen", 
>> with datetime.min being effectively "never". When a data point comes in for 
>> a given aircraft, the data point will be compared with the value in the 
>> defaultdict for that aircraft, and if the timestamp on that data point is 
>> newer than what is in the defaultdict, the defaultdict will get updated with 
>> the value from the datapoint (not necessarily current timestamp, but rather 
>> the value from the datapoint). Note that data points do not necessarily 
>> arrive in chronological order (for various reasons not applicable here, it's 
>> just the way it is), thus the need for the comparison.
>> 
>> When the program first starts up, two things happen:
>> 
>> 1) a thread is started that watches for incoming data points and updates the 
>> dictionary as per above, and
>> 2) the dictionary should get an initial population (in the main thread) from 
>> hard storage.
>> 
>> The behavior I'm seeing, however, is that when step 2 happens (which 
>> generally happens before the thread gets any updates), the dictionary gets 
>> populated with 56 entries, as expected. However, none of those entries are 
>> visible when the thread runs. It's as though the thread is getting a 
>> separate copy of the dictionary, although debugging says that is not the 
>> case - printing the variable from each location shows the same address for 
>> the object.
>> 
>> So my questions are:
>> 
>> 1) Is this what it means to NOT be thread safe? I was thinking of race 
>> conditions where individual values may get updated wrong, but this 
>> apparently is overwriting the entire dictionary.
>> 2) How can I fix this?
>> 
>> Note: I really don't care if the "initial" update happens after the thread 
>> receives a data point or two, and therefore overwrites one or two values. I 
>> just need the dictionary to be fully populated at some point early in 
>> execution. In usage, the dictionary is used to see of an aircraft has been 
>> seen "recently", so if the most recent datapoint gets overwritten with a 
>> slightly older one from disk storage, that's fine - it's just if it's still 
>> showing datetime.min because we haven't gotten in any datapoint since we 
>> launched the program, even though we have "recent" data in disk storage 
>> thats a problem. So I don't care about the obvious race condition between 
>> the two operations, just that the end result is a populated dictionary. Note 
>> also that as datapoint come in, they are being written to disk, so the disk 
>> storage doesn't lag significantly anyway.
>> 
>> The framework of my code is below:
>> 
>> File: watcher.py
>> 
>> last_points = defaultdict(lambda:datetime.min)
>> 
>> # This function is launched as a thread using the threading module when the 
>> first client connects
>> def watch():
>>      while true:
>>              <wait for datapoint>
>>              pointtime= <extract/parse timestamp from datapoint>
>>              if last_points[<aircraft_identifier>] < pointtime:
>>                      <do stuff>
>>                      last_points[<aircraft_identifier>]=pointtime
>>                      #DEBUGGING
>>                      print("At update:", len(last_points))
>> 
>> 
>> File: main.py:
>> 
>> from .watcher import last_points
>> 
>> # This function will be triggered by a web call from a client, so could 
>> happen at any time
>> # Client will call this function immediately after connecting, as well as in 
>> response to various user actions.
>> def getac():
>>      <load list of aircraft and times from disk>
>>      <do stuff to send the list to the client>
>>      for record in aclist:
>>              last_points[<aircraft_identifier>]=record_timestamp
>>      #DEBUGGING
>>      print("At get AC:", len(last_points))
>> 
>> 
>> -----------------------------------------------
>> Israel Brewster
>> Systems Analyst II
>> Ravn Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7293
>> -----------------------------------------------
>> 
>> 
>> 
>> 
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to