On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdl...@gmail.com> wrote: > > > On 3/29/23 02:08, Chris Angelico wrote: > > On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list > > <python-list@python.org> wrote: > >> On 28/03/23 2:25 pm, Travis Griggs wrote: > >>> Interestingly the error also only started showing up when I switched from > >>> running a statistics.mean() on one of these, instead of what I had been > >>> using, a statistics.median(). Apparently the kind of iteration done in a > >>> mean, is more conflict prone than a median? > >> It may be a matter of whether the GIL is held or not. I had a look > >> at the source for deque, and it doesn't seem to explicitly do > >> anything about locking, it just relies on the GIL. > >> > >> So maybe statistics.median() is implemented in C and statistics.mean() > >> in Python, or something like that? > >> > > Both functions are implemented in Python, but median() starts out with > > this notable line: > > > > data = sorted(data) > > > > which gives back a copy, iterated over rapidly in C. All subsequent > > work is done on that copy. > > > > The same effect could be had with mean() by taking a snapshot using > > list(q) and, I believe, would have the same effect (the source code > > for the sorted() function begins by calling PySequence_List). > > > > In any case, it makes *conceptual* sense to do your analysis on a copy > > of the queue, thus ensuring that your stats are stable. The other > > threads can keep going while you do your calculations, even if that > > means changing the queue. > > > > ChrisA > Sorry for any injected confusion here, but that line "data = > sorted(data)" appears as though it takes the value of the variable named > _data_, sorts it and returns it to the same variable store, so no copy > would be created. Am I missing something there?
The variable name "data" is the parameter to median(), so it's whatever you ask for the median of. (I didn't make that obvious in my previous post - an excess of brevity on my part.) The sorted() function, UNlike list.sort(), returns a sorted copy of what it's given. I delved into the CPython source code for that, and it begins with the PySequence_List call to (effectively) call list(data) to get a copy of it. It ought to be a thread-safe copy due to holding the GIL the entire time. I'm not sure what would happen in a GIL-free world but most likely the lock on the input object would still ensure thread safety. ChrisA -- https://mail.python.org/mailman/listinfo/python-list