On Wed, Apr 28, 2010 at 10:05:58AM -0500, Hyrum K. Wright wrote: > I'm looking at adding support for the patch client API to JavaHL, but in > doing so, I have a few questions about the API itself. I'm pretty late the > game, so these questions might already be answered. If so, please point me > to the relevant thread(s). > > For reference, here's the public API: > > svn_error_t * > svn_client_patch(const char *abs_patch_path, > const char *local_abspath, > svn_boolean_t dry_run, > int strip_count, > svn_boolean_t reverse, > const apr_array_header_t *include_patterns, > const apr_array_header_t *exclude_patterns, > apr_hash_t **patched_tempfiles, > apr_hash_t **reject_tempfiles, > svn_boolean_t ignore_whitespaces, > svn_client_ctx_t *ctx, > apr_pool_t *result_pool, > apr_pool_t *scratch_pool); > > Firstly, the first parameter should probably be 'patch_abspath', which > follows convention with other variable names.
Sure. > That being said, is there a > reason why the patch needs to be a file, instead of a stream? Using a > stream seems much more flexible for the long-term evolution of the API. Be prepared. Such a change will probably go deep into the diff parser. The current svn patch and diff parsing code is optimized around never allocating more memory than needed to process a single line. Otherwise svn patch would run out of memory quickly for large files. We currently support lines as long as memory can hold, rather than limiting the size of the entire file by memory. Furthermore the diff parsing code provides the patch application code with streams mapped to portions of the patch file via the svn_stream_from_aprfile_range_readonly() API. We could now do something similar via mark / seek support in streams. When I wrote the diff parser we did not yet have mark / seek support in streams yet. The mark / seek support is currently only available for streams backed by APR files or stringbufs. IOW, making the patch file a stream will not provide a lot of abstraction -- you'll either use a file anyway and wrap a stream around it, or load the entire patch into memory and hog tons of RAM for large patches (which is exactly what we want to avoid). While I would not object to making the patch file a stream if it doesn't hurt memory usage, I don't think it's worth re-implementing a lot of the svn patch and diff parsing code just to provide callers with the option of passing a stream. Why would they want to use a stream? I have no problem with requiring callers of this API to create a tempfile if they get the patch data from something other than a file. If callers pass a stream we have no idea how much memory we'll end up using. Granted, we could say that's the caller's problem, but even then I still don't see enough benefit to justify the effort involved in changing this so I won't do it myself. > Despite my best efforts, I still haven't been able to fully grok the > patched_tempfiles and reject_tempfiles parameters. However, if they are > truly output parameters, they should be at the front of the parameter list, > not hidden in the middle somewhere. They are output parameters, so yes, we might want to move them. They're primarily for use by TortoiseSVN and similar 3rd party clients, not really a primary feature of the API. Hence it didn't look right to me to put them at the front. But I don't mind moving them. > Also, I know that the API uses > notifications, but it might also be useful to return the _tempfile > parameters through a callback mechanism, rather than as (arbitrarily large) > hashes. That would work as well, yes. Might look nicer than having the output parameters at the front. > Anyway, those are my thoughts. I'd go ahead and make these changes myself, > but as I mentioned before, I'm not sure if there are Deep reasons for the > current API. Feel free to go ahead and make these changes. If you change the patch file to a stream, please try not to make 'svn patch' use more memory than it currently does. Stefan

