Re: FSv2 (was: FREE Apache Subversion Meetup...)

Blair Zajac Tue, 19 Oct 2010 10:12:37 -0700

On 10/19/2010 01:31 AM, Greg Stein wrote:

On Mon, Oct 18, 2010 at 23:51, Blair Zajac<[email protected]>  wrote:

On 10/04/2010 06:45 AM, C. Michael Pilato wrote:


There, you can learn more about what the Meetups tend to look like, what
other Meetups are planned for this years conference, and so on.  You'll
also
find a link to the Subversion Meetup wiki page:

        http://subversion.open.collab.net/wiki/ApacheConNA2010Meetup


That's the first mention I've seen of FSv2.  What ideas are going into it?
  What problems is it primarily meant to solve?


FSv2 is a hand-wave.

Personally, I see it as a broad swath of API changes to align our
needs with the underlying storage. Trowbridge noted that our current
API makes it *really* difficult to implement an effective backend. I'd
also like to see a backend that allows for parallel PUTs during the
commit process. Hyrum sees FSv2 as some kind of super-key-value
storage with layers on top, allowing for various types of high-scaling
mechanisms.


How would that API look?  The API as it is is pretty clear.

Background for my wish list.

We use Subversion as a backend for a versioned asset management system.We get up to 5 commits per second from render processes generating newassets and artists saving assets. We have interactive GUI users that doasset lookups all the time.

While the immutability of svn has allowed us to cache revision data andour servers can push 4,000 lookups per second to our render farm that dolookups on a particular revision, interactive users that do HEAD lookupssuffer because the high commit rate. We cache data by node-id inmemcached, but because the root node always get a new node-id andbecause the first thing interactive users do is get a list of folders ofthe root node, we always get cache misses. I don't really want svn tochange the way new node-ids are assigned to parent nodes all the way tothe root.


1) Scalability to 30,000 child nodes in a single directory.

Currently, a single change to a node in a directory with 20,000 childnodes causes a new revision file in fsfs to use around 960 kB. With acommit rate of 1.5 commits per second in a repository, the disk usage isvery high. We introduced a hidden layer of "hash:DD" directories, 30 inour case, that our internal Subversion server hashes path elements to.This makes the revision files much smaller, but now when getting a listof nodes in a directory, we have up to 30 child directories to index,increasing lookup times.

If we could remove the need to hash directories, then the lookup on theroot node would be much faster and interactive users would be happier.

2) I would like to ensure that the new backend supports multiplemodifications to the same node. I don't know if this was designed intothe current backend, but given I expose svn_fs.h over RPC, clients canmake any one or multiple modifications to the tree, so the new backendshould support this.


And while we're discussing wants.

3) Pools are painful to use. We have repository, revision andtransaction C++ objects stored in an LRU cache. They cache revision andtransaction roots for improved performance. Using the wrong pool for aRPC method can cause memory leaks (we just found one Monday causing abackend server to run out of memory). Constructing and destroying poolsin the wrong order can cause the process to crash. This is hard to getright, so using a different model would be very useful. I haven't hadthe cycles to look at Hyrum's new C++ object and see how that would help.


Blair

Re: FSv2 (was: FREE Apache Subversion Meetup...)

Reply via email to