On 20Mar2020 00:37, duncan smith <duncan@invalid.invalid> wrote:
On 19/03/2020 20:40, MRAB wrote:
On 2020-03-19 20:08, duncan smith wrote:
I have generator code along the following lines,
Q = queue.Queue()
Q.put(x)
while not Q.empty():
x = Q.get()
if <some condition that depends on x>:
yield x
else:
Q.put(<some value that depends on x>)
If I change it to,
Q = []
Q.append(x)
for x in Q:
if <some condition that depends on x>:
yield x
else:
Q.append(<some value that depends on x>)
then it runs approximately 3 times more quickly.
A list is a very simple data structure, and therefore fast (for things
that are inherently fast, like append; insert is expensive).
A queue takes a lock (cheap but an overhead) and might be expensive for
get or put (this dependings on the data structure storing the queue
items). A circular list (essentially a list using a base offset) can be
fast until it needs resizing, a doubly linked list of nodes is always
O(1) but is more expensive to add/remove (allocate node, adjust
pointers); makye 3 times as slow :-) I haven't looked to see how Queue
stores its items.
I seem to remember
(from somewhere) that the language specification doesn't guarantee that
the latter will continue to work, but that it's unlikely that future
implementations will break it.
[...snip...]
[MRAB] 1. I'm not sure it's a good idea to mutate a list while
iterating over it.
AFAICR that's the bit that depends on the implementation of lists. As
long as iteration involves incrementing an index and indexing into the
list it will work.
If you're the only person using the list, as you are only _appending_ to
it then the for loop is probably safe.
If you were inserting elements into the list there would be trouble (an
iterator basicly needs keep an index as you suggest below, and an insert
at or before the current index invalidates that, causing the iterator to
issue the same element again (for example).
Apart from that, is there any good reason
why I shouldn't use the list? Is there another approach that might avoid
the overhead of using a queue? Results from profiling the two
implementations below in case it makes anything clearer. TIA.
Using a list seems fine for me if your code is as simple in reality as
listed above.
2. The example with the list retains all of the results, whereas the
one with the queue consumes them.
3. Queues are thread-safe because they're used for communication between
threads, but lists aren't thread-safe; that might be something to do
with the difference.
4. If it doesn't need to be thread-safe, why not try deques instead?
Bingo. Performance is indistinguishable from that of the list.
A deque is implement using a list.
Thread
safety is unimportant (for my purposes), but the docs at
https://docs.python.org/2/library/collections.html#collections.deque
state "Deques support thread-safe, memory efficient appends and pops
from either side of the deque with approximately the same O(1)
performance in either direction". Looking at the output from the
profiler for my implementation with the queue it looks like a deque is
being used but there's various other stuff happening relating to thread
safety. Is the lesson from this that if all you're doing is appending
and popping, then use a deque (rather than a queue) even if you want
thread safety? (It makes quite a difference in performance for my use
case.) Cheers.
A deque will change the ordering of the elements you process. Your list
loop behaves in FIFO order, as does the queue. A dequeue (you you're
doing a heappush and a heappop) will issue you the lowest current
element on each iteration. If you merely need to process all the
elements you're good. If the order matters you need to consider that.
A deque, being thread safe, will be using a lock. There will be a (quite
small) overhead for that.
Cheers,
Cameron Simpson <c...@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list