This is actually a very good issue you have raised. As far as I am able to
read the code for Dictionary and HashedCollection, the implementation is
not very well suited for deletions at all.

First, as you have discovered, the remove has an implementation which is
unsound (performance wise). Second, HashedCollections grow when you add to
them, but they do not shrink when you remove from them.

That is, in scenarios where your dictionaries grow huge, and then later
shrink, there is a potential “memory leak” in that the array is kept at its
largest size.

Btw, thanks for the shuffle function using sort.  I do believe the built-in
shuffle in SequenceableCollection is faster though (O(n·logn) vs. O(n)).

It seems from the wikipedia page on hash tables, that the there are
algorithms with efficient removal also for the "open addressing with linear
probing” implementation used in HashedCollection. But it only seems to work
if the hash table is also implementing a “shrink” function which is where
the re-hashing is done.

Interesting stuff. It would be a good programming exercise to get that to
work.

— Kasper


On 7 February 2020 at 11.17.27, LABORDE Pierre (
pierre.labo...@fr.thalesgroup.com) wrote:

Thanks Peter.



Here some informations about execution time :



*dic := Dictionary new: 10000.*



*Time microsecondsToRun:[*

*            1 to: 10000 do:[ :i |*

*                        dic at: i put: i printString.*

*            ].*

*]. "6000µs"*

*Time microsecondsToRun:[*

*            1 to: 10000 do:[ :i |*

*                        dic removeKey: i.      *

*            ].*

*]. "2683000µs, up to 447x"*



Regarding your comment, I have wrote a too simple example.

Here a new test with a better representation of my need and datas
(randomization) :



*dic := Dictionary new: 10000.*



*pool := SortedCollection new: 10000.*

*random := Random new.*

*pool sort:[ :a :b | random next < 0.5 ].*

*1 to: 10000 do:[ :i | pool add: i ].*



*Time microsecondsToRun:[*

*            pool do:[ :i |*

*                        dic at: i put: i printString.*

*            ].*

*]. "6000µs => same time, good point"*

*Time microsecondsToRun:[*

*            pool do:[ :i |*

*                        dic removeKey: i.      *

*            ].*

*]. "17000µs, up to 2,8x => huge changes !"*



It’s appear more reasonable !

I need to test it with real datas, to be continued.



Thanks for all.



Pierre



*De :* Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] *De la part
de* PBKResearch
*Envoyé :* lundi 3 février 2020 12:25
*À :* 'Any question about pharo is welcome'
*Objet :* Re: [Pharo-users] Dictionary removeKey: very low



Pierre



It’s all to do with rehashing. A dictionary is stored as an Array of
Association. After a key and its corresponding association is removed, the
remaining entries are rehashed from the removal point to the end of the
array. By doing the removals in natural order, you rehash the whole of the
remaining array after each removal. Try doing the removals in reverse
order; on my machine it reduces the time from 6 seconds to 6 milliseconds.



HTH



Peter Kenny



*From:* Pharo-users <pharo-users-boun...@lists.pharo.org> *On Behalf
Of *LABORDE
Pierre
*Sent:* 03 February 2020 10:18
*To:* pharo-users@lists.pharo.org
*Subject:* [Pharo-users] Dictionary removeKey: very low



Hi all,



I have a problem with Dictionaries :



*dic := Dictionary new: 10000.*



*1 to: 10000 do:[ :i |*

*            dic at: i put: i printString.*

*].*



*1 to: 10000 do:[ :i |*

*            dic removeKey: i.      *

*]*



Removing each keys is very slooow, time execution difference between add
and remove is crazy.

Any idea for fix that ? Can I use another Dictionary implementation or
settings ?



Thanks.



Pierre

Reply via email to