On 30.09.2011 18:19, Stefan Sperling wrote:
On Thu, Sep 22, 2011 at 08:43:14PM +0200, Stefan Fuhrmann wrote:
This looks very interesting.
What about FSFS-specific requirements?
See assumptions above, this may require a different
data structure. But I think that noderev dependencies
will turn out to be redundant if you have a log cache
and access to the skip-delta forwards dependencies.
It sounds like you avoid those by storing data in semantics of the repos
layer (path@revision) instead of the FS layer (node-revision-id)?
Yes.
In this case separate implementations for FSFS and BDB aren't needed.
This could be an advantage (e.g. third party FS implementations
wouldn't need to change to support this).
It also eliminates on of the performance weaknesses
of SVN today: A log on some old / seldom changed
path can take a very long time.
I'll think about this some more, thanks.
Welcome ;)
Reviving this thread.
Your concerns about a node-rev based approach seem to resolve largely
around performance, not about correctness.
Effectiveness, to be precise.
I.e. you agree that a
node-rev-based solution as currently being worked on within the
fs-successor-ids branch will work correctly, but won't perform
as well as your proposal for certain queries, right?
Since it does not add any information, the node-rev-based
approach will not *cause* incorrect behavior. In that sense,
I agree.
But I still fail to see how it will be effective except for a
very, very specific use-case. I probably just haven't understood
your use-case. Could you give a short description of the problem
that you are trying to solve and how the node-rev cache will help?
Now, I don't feel comfortable trying to implement your design.
The reason for this is that you could do a much better job at this yourself.
That's fine with me. I won't have the time to do that this year,
though.
However, I do feel very comfortable continuing the work we've started on
the fs-successor-ids branch.
Having that implementation available will certainly do no harm.
I also think that the two approaches can complement each other.
We are not in an either/or situation. We will get correct answers either
way and the only real difference is performance.
Note also that our plan for putting successor-IDs into the filesystem
layer we will also solve the problem of creating the successor data
during an upgrade from SVN 1.7 to 1.8.
Both approaches need to solve this somehow, and we'd have that part
sorted out for you.
I don't see a problem here. If necessary, we could extend the FS
layer API with version check methods etc.
So, what about this: We implement successor-IDs in the filesystem
as planned on the fs-successor-ids branch.
Once we have that, and when you have time, you adapt your log cache
proposal to create a runtime cache that sits on top of the new
successor-ID filesystem data, and caches results for certain log queries
in memory for quick access. It should even be possible to pre-populate
this cache when the server start up.
How would I reconstruct copy target path names from node-rev info?
This way, we have some amount of redundancy in the system, but Daniel
and I can continue trying to deliver a working solution for 1.8 based
on what we've started. And we can worry about performance issues later,
because you already have a plan for that.
Frankly, I think the time people are wasting today resolving trivial
tree-conflicts is a huge waste of their time. No matter how bad the
performance of an automated solution to this problem will be, it will
be faster than a human being. Our users will get a huge benefit either
way because we will be reducing their load of manual labour. Performance
of the solution doesn't need to be perfect by the time we release 1.8
and nothing stands in the way of improving performance later.
Do you agree?
This depends entirely on your use-case (see above). My experience
with navigating these change graphs indicates that better-than
O(n^2) performance requires completely different algorithms, data
structures and API than a merely correct path-by-path approach.
BTW, n > 10.000.000 for certain repositories.
If not, I hope that you'll find time to help us implement your pure
caching solution for 1.8. I would really like to see some solution
to this problem in the 1.8. release.
I'm currently working on other SVN-related projects.
From April on, I'm available for hire.
-- Stefan^2.