New submission from Raymond Hettinger:

Dummy entries created by deletions are currently handled in two places.  The 
code in set_add_entry() tracks a freeslot to allow dummy slots to be reused 
when new elements are inserted. The dummies are also eliminated during resizing.

The freeslot checks are made in the inner-loop of set_add_entry() and will only 
eliminate a few of the dummies (ones that happen to be in a slot that we want 
to use again).  In contrast, the resize routine eliminates 100% of the dummy 
entries -- this is where most dummies get reclaimed.

This is a preliminary draft of an idea to only use resizing for dummy 
reclamation.  It proposes eliminating the freeslot reuse step in 
set_add_entry().  One advantage is that the patch nicely simplifies the code in 
set_add_entry and makes its inner-loop tighter (eliminating one memory access 
out of the two on each iteration).  With the advent of linear probing and the 
60% set load factor, set resizing has become very cheap.

One issue with the current design is the while a few dummies get reclaimed 
earlier, others are left alive longer, causing more collisions during lookups. 
The reuse of a few dummies defers the resizing step that would actually clear 
out all of the dummies.

Another thought is that the proposal gives a better separation of concerns.  
Insertion focuses on adding entries without making inner loop checks for an 
uncommon case (the relevant note in dictobject.c says this is uncommon by a 
factor of hundreds).  The patch leaves the dummy cleanup to the fast, efficient 
resizing step.

Some areas for exploration:

* We know that the benefit to set_add_entry() will reduce the cost of building 
a set (which is the common case).  It simply does less work by using less code 
overall and by having less code in the inner loop (see the disassembly below 
(1) to see that we're now down to one memory access per iteration).  That said, 
it would be nice to quantify whether this is a small improvement or a big one 
(I expect the answer will be different for CLang vs GCC vs MSC, for 32-bit 
builds versus 64-builds, and for ARM versus x86, etc).

* It would also be nice see if there is an observable small benefit or small 
detriment to the uncommon case of building a set followed by cycling back and 
forth between removing elements and adding new elements.

In any case, the simplification of the code in set_add_entry() will be nice and 
it would be good to have a separation of concerns with only set_resize being 
responsible for clearing out dummies in a single high-speed pass.

(1) Inner-loop for the new set_add_entry() featuring only one memory access per 
iteration:

    L383:
        cmpq    %r9, %rbx       ; entry < start + LINEAR_PROBES
        je  L406                
    L388:
        addq    $16, %rbx       ; entry++ 
        movq    8(%rbx), %rax   ; load entry->hash from memory
        testq   %rax, %rax      ; entry->hash == 0
        jne L382                ; until zero found skip next test  
        cmpq    $0, (%rbx)      ; entry->key == NULL
        je  L374                ; return entry
    L382:
        cmpq    %rax, %r15      ; entry->hash == hash
        jne L383                ; loop back
    <... process to equality test >

----------
assignee: rhettinger
components: Interpreter Core
files: set_no_dummy_reuse.diff
keywords: patch
messages: 287272
nosy: rhettinger, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Simplify set_add_entry()
versions: Python 3.7
Added file: http://bugs.python.org/file46568/set_no_dummy_reuse.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29476>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to