On Thu, Sep 22, 2011 at 08:43:14PM +0200, Stefan Fuhrmann wrote: > >>>This looks very interesting. > >>> > >>>What about FSFS-specific requirements? > >>See assumptions above, this may require a different > >>data structure. But I think that noderev dependencies > >>will turn out to be redundant if you have a log cache > >>and access to the skip-delta forwards dependencies. > >>>It sounds like you avoid those by storing data in semantics of the repos > >>>layer (path@revision) instead of the FS layer (node-revision-id)? > >>Yes. > >>>In this case separate implementations for FSFS and BDB aren't needed. > >>>This could be an advantage (e.g. third party FS implementations > >>>wouldn't need to change to support this). > >>It also eliminates on of the performance weaknesses > >>of SVN today: A log on some old / seldom changed > >>path can take a very long time. > >>>I'll think about this some more, thanks. > >>> > >>Welcome ;)
Reviving this thread. Your concerns about a node-rev based approach seem to resolve largely around performance, not about correctness. I.e. you agree that a node-rev-based solution as currently being worked on within the fs-successor-ids branch will work correctly, but won't perform as well as your proposal for certain queries, right? Now, I don't feel comfortable trying to implement your design. The reason for this is that you could do a much better job at this yourself. However, I do feel very comfortable continuing the work we've started on the fs-successor-ids branch. I also think that the two approaches can complement each other. We are not in an either/or situation. We will get correct answers either way and the only real difference is performance. Note also that our plan for putting successor-IDs into the filesystem layer we will also solve the problem of creating the successor data during an upgrade from SVN 1.7 to 1.8. Both approaches need to solve this somehow, and we'd have that part sorted out for you. So, what about this: We implement successor-IDs in the filesystem as planned on the fs-successor-ids branch. Once we have that, and when you have time, you adapt your log cache proposal to create a runtime cache that sits on top of the new successor-ID filesystem data, and caches results for certain log queries in memory for quick access. It should even be possible to pre-populate this cache when the server start up. This way, we have some amount of redundancy in the system, but Daniel and I can continue trying to deliver a working solution for 1.8 based on what we've started. And we can worry about performance issues later, because you already have a plan for that. Frankly, I think the time people are wasting today resolving trivial tree-conflicts is a huge waste of their time. No matter how bad the performance of an automated solution to this problem will be, it will be faster than a human being. Our users will get a huge benefit either way because we will be reducing their load of manual labour. Performance of the solution doesn't need to be perfect by the time we release 1.8 and nothing stands in the way of improving performance later. Do you agree? If not, I hope that you'll find time to help us implement your pure caching solution for 1.8. I would really like to see some solution to this problem in the 1.8. release.