On Wed, Oct 17, 2012 at 2:01 AM, Daniel Shahaf <d...@daniel.shahaf.name>wrote:
> stef...@apache.org wrote on Tue, Oct 02, 2012 at 22:15:28 -0000: > > Author: stefan2 > > Date: Tue Oct 2 22:15:28 2012 > > New Revision: 1393211 > > > > URL: http://svn.apache.org/viewvc?rev=1393211&view=rev > > Log: > > Empty representations are relatively frequent (1..2%) in FSFS. > > However, we can't easily distinghish them from non-empty plain > > representations because for the latter the expanded size value > > will be given as 0. > > > > This patch explicitly detects the case that a delta rep is > > actually an empty rep and should be cached as such. > > > > * subversion/libsvn_fs_fs/fs_fs.c > > (build_rep_list): determine and correct the expanded size > > (rep_read_get_baton): let the above determine the fulltext length > > > > Modified: > > subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c > > > > Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c > > URL: > http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1393211&r1=1393210&r2=1393211&view=diff > > > ============================================================================== > > --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original) > > +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Tue Oct 2 22:15:28 > 2012 > > @@ -4553,11 +4553,15 @@ set_cached_combined_window(svn_stringbuf > > ID, and representation REP. > > Also, set *WINDOW_P to the base window content for *LIST, if it > > could be found in cache. Otherwise, *LIST will contain the base > > - representation for the whole delta chain. */ > > + representation for the whole delta chain. > > + Finally, return the expanded size of the representation in > > + *EXPANDED_SIZE. It will take care of cases where only the on-disk > > + size is known. */ > > static svn_error_t * > > build_rep_list(apr_array_header_t **list, > > svn_stringbuf_t **window_p, > > struct rep_state **src_state, > > + svn_filesize_t *expanded_size, > > svn_fs_t *fs, > > representation_t *first_rep, > > apr_pool_t *pool) > > @@ -4572,10 +4576,24 @@ build_rep_list(apr_array_header_t **list > > *list = apr_array_make(pool, 1, sizeof(struct rep_state *)); > > rep = *first_rep; > > > > + /* The value as stored in the data struct. > > + 0 is either for unknown length or actually zero length. */ > > + *expanded_size = first_rep->expanded_size; > > while (1) > > { > > SVN_ERR(create_rep_state(&rs, &rep_args, &last_file, > > &last_revision, &rep, fs, pool)); > > + > > + /* Unknown size or empty representation? > > + That implies the this being the first iteration. > > + Usually size equals on-disk size, except for empty, > > + compressed representations (delta, size = 4). > > + Please note that for all non-empty deltas have > > + a 4-byte header _plus_ some data. */ > > + if (*expanded_size == 0) > > + if (! rep_args->is_delta || first_rep->size != 4) > > + *expanded_size = first_rep->size; > > Is it correct to perform this assignment for delta reps with size!=4 ? > first_rep is a representation instead of a just a single delta window, i.e. it represents the full content of a node. If the <size> is 0, it is assumed to be equal to <length> and that seems to have been the case ever since but the structure description does not mention it. r1400426 add that info. So, it is *always* safe to set a 0 <size> to <length> *unless* the expanded content size is actually 0. For plain data, <length> would be 0, too making the assignment correct. A compressed empty representation is exactly 4 bytes long ("SVN\0x1"). Any other representation is longer. Since <length> can never be negative, the only way <size> may get the wrong value is if the representation content is empty but uses a longer physical representation on disk and we end up with the expectation of a longer data stream than what we actually get. That has been the norm in the past but the stream would simply end after 0 bytes and the content would never be cached in the fulltext cache. The patch should now correct the <size> info in (almost?) all cases to make fulltext cache effective for them as well. > It seems odd that this check is in the loop but uses first_rep. Should > this check move out of the loop, or use rep.size instead of > first_rep->size? > We need to read rep_args, i.e. the rep header, to detect whether the representation is plain or deltified. r1400436 moved that out of the loop - at the expense of some extra code. -- Stefan^2. -- * Join us this October at Subversion Live 2012<http://www.wandisco.com/svn-live-2012> for two days of best practice SVN training, networking, live demos, committer meet and greet, and more! Space is limited, so get signed up today<http://www.wandisco.com/svn-live-2012> ! *