On 16 May 2015 at 07:48, Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> wrote: > On Fri, May 15, 2015 at 7:25 PM, Philip Martin <philip.mar...@wandisco.com> > wrote: >> >> Philip Martin <philip.mar...@wandisco.com> writes: >> >> > Philip Martin <philip.mar...@wandisco.com> writes: >> > >> >> Prompted by the warnings I think there are some issues to fix. For >> >> APR_HASH_KEY_STRING keys there is no protection against abnormally long >> >> keys. combined_long_key() will allocate strlen() memory even if it is >> >> many GB. The item will not get cached if key+data length is more than >> >> 4GB but the memory for the key, which could be more than 4GB, will be >> >> permanently allocated in the cache. There is also a problem with >> >> overflow in membuffer_cache_set_internal() when calculating key+data >> >> length, although in practice a key large enough to trigger this will >> >> probably fail memory allocation first. >> > >> > Another issue: >> > >> > static svn_boolean_t >> > entry_keys_match(const entry_key_t *lhs, >> > const entry_key_t *rhs) >> > { >> > return (lhs->fingerprint[0] == rhs->fingerprint[0]) >> > && (lhs->fingerprint[1] == rhs->fingerprint[1]) >> > && (lhs->key_len == rhs->key_len); >> > } >> > >> > I think the key_len comparison is wrong and should be removed. If two >> > keys have fingerprints that collide it does not mean the key lengths are >> > the same. When attempting to use two keys with colliding fingerprints >> > the behaviour when the key lengths vary should be same as when the key >> > lengths are the same and the key data varies. > > > Not necessary. It is true that we could decide to *only* use > the fingerprint as discriminator. But the implementation here > includes the key length; it reduces the number of conflicts > that result in entries that evict each other. > > Most fixed-length keys are 16 bytes long and for them, the > fingerprint is almost identical to the key and conflicts are > extremely unlikely. So, varible-length keys are the one that > may have fingerprint conflicts now due to the weakness of > FNV1. For them, however, the key_len (often derived from > paths) has a good chance of being different. > > Note that if the key_len differs, it *cannot* be the same key. > Hence, including the key_len does not create false negatives. > +1.
>> Another issue: find_entry() now calls drop_entry() in more cases and can >> now call it when find_empty==FALSE during read operations. On Unix when >> using the read-write lock this means the cache gets modified while only >> holding a read lock, not a write lock, and that can corrupt the cache. >> As far I understand find_entry(find_empty == TRUE) requires write-lock for cache, while only read-lock is required for find_empty == FALSE. Is correct? It would be nice to document this in docstring. >> If we use read-write locks then when read detects a fingerprint >> collision it cannot modify the cache to clear the collision, it will >> have to return NULL and leave the entry in the cache. > > > You are absolutely right. Fixed in r1679681. > PS: Just to make things clear: branch has been changed so my +1 doesn't apply to it. -- Ivan Zhakov