On Wed, May 25, 2011 at 16:08, C. Michael Pilato <cmpil...@collab.net> wrote: > On 05/25/2011 04:05 PM, C. Michael Pilato wrote: >> On 05/25/2011 03:49 PM, Greg Stein wrote: >>> On Wed, May 25, 2011 at 15:33, <cmpil...@apache.org> wrote: >>>> ... >>>> + /* A mapping of svn_revnum_t * dump stream revisions to their >>>> + corresponding svn_revnum_t * target repository revisions. */ >>>> + apr_hash_t *rev_map; >>> >>> How big can this grow? ie. what happens when there are several million >>> revisions. >> >> It gets big. (This logic and approach are copied from 'svnadmin load', >> which doesn't excuse it, but might explain it.) > > Actually, I don't really know for sure how big it gets. It's a mapping of > of sizeof(svn_revnum_t) to sizeof(svn_revnum_t), plus all the hash > internals. Anybody have any guesses?
struct apr_hash_entry_t is generally 20 bytes. Add in the two revnums (4 bytes each), and you get 28 bytes for each *used* entry. Now we also have to account for unused entries. APR has a pretty poor hash table implementation. It allocates *upwards* to the nearest power of two. So the internal size will grow like: 1048576 2097152 4194304 One saving grace is that APR only grows when the entry count matches the internal table size. It uses a "closed hash" algorithm with linked lists at each bucket, so the actual load on the buckets is not possible to compute. The hand-wave means that you can put in 4 million mappings before it grows it up to 8 million buckets. So... 4 million buckets (pointers) at 4 bytes each is 80 megabytes. Each mapping will add another 28 bytes. So: 4 million mappings is about 134 megabytes. But also recognize that *reaching* that point will use and toss approx the same amount of memory. So about 260 meg total. On a 64-bit architecture, all these values are likely to be doubled. Not a machine crusher, in retrospect. But not exactly a winner either. Cheers, -g