On Tue, Apr 29, 2014 at 4:29 PM, Ivan Zhakov <i...@visualsvn.com> wrote:
> On 29 April 2014 17:54, Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> > wrote: > > On Mon, Apr 28, 2014 at 8:11 AM, Ivan Zhakov <i...@visualsvn.com> wrote: > >> > >> eOn 27 April 2014 19:27, <stef...@apache.org> wrote: > >> > Author: stefan2 > >> > Date: Sun Apr 27 15:27:46 2014 > >> > New Revision: 1590405 > >> > > >> > URL: http://svn.apache.org/r1590405 > >> > Log: > >> > More 'svn log -g' memory usage reduction. We use a hash to keep track > >> > of all revisions reported so far, i.e. easily a million. > >> > > > * Some system provided APR (1.5+ in particular) uses mmap > > to allocate memory. I.e. for every block, e.g. 8k, there is a > > separate mmap call. The Linux default is 65530 (sic!) mmap > > regions per process. Slowly allocating pools can trigger OOM > > errors after only 512MB actual memory usage (sum across > > all threads). I already prepared a patch for that. > > > Ouch, I didn't know that. I was thinking that MMAP APR pool allocator > is experimental and is not enabled by default. > It is not enabled by default, I guess but the package responsible decided to enable it anyway. > > >> > We introduce a simple packed bit array data structure to replace > >> > the hash. For repos < 100M revs, the initialization overhead is less > >> > than 1ms and will amortize as soon as more than 1% of all revs are > >> > reported. > >> > > >> > >> It may be worth implement the same trick like we done with > >> membuffer_cache: use array of bit arrays for every 100k of revisions > >> for example and initialize them lazy. I mean: > >> [0...99999] - bit array 0 > >> [100000....199999] -- bit array 1 > >> ... > >> > >> It should be easy to implement. > > > > > > I gave it a try and it turned out not too horribly complex. > > See r1590982. > Great! > > But it may be worth to keep original svn_bit_array and add new > svn_sparse_bit_array() with array of svn_bit_array() objects So things > will be separated in two micro layers. > I think the new implementation is flexible enough for a wide range of future usages. I'd prefer having a single implementation for now as it is simply less code that could break. -- Stefan^2.