On 3/24/2017 12:35 PM, Jonathan Nieder wrote:
g...@jeffhostetler.com wrote:
From: Jeff Hostetler <jeffh...@microsoft.com>
Teash do_read_index() in read-cache.c to call verify_hdr()
Nice. Do you have example commands I can run to reproduce
that benchmark? (Even better if you can phrase that as a
patch against t/perf/.)
I debated doing a t/perf and/or t/helper to demonstrate this
like I did for the lazy-init-name-hash changes the other day,
but decided against it. I'll put together something and
include it in the next version.
--- a/read-cache.c
+++ b/read-cache.c
@@ -1564,6 +1564,83 @@ static void post_read_index_from(struct index_state
+struct verify_hdr_thread_data {
+ struct cache_header *hdr;
+ size_t size;
'size' appears to always be cast to an unsigned long when it's
used. Why not use unsigned long consistently?
+ * Non-threaded version does all the work immediately.
+ * Returns < 0 on error or bad signature.
+ */
+static int verify_hdr_start(struct verify_hdr_thread_data *d)
+ return verify_hdr(d->hdr, (unsigned long)d->size);
+static int verify_hdr_finish(struct verify_hdr_thread_data *d)
+ return 0;
+#include <pthread.h>
Please put this at the top of the file with other #includes. One
simple way to do that is to #include "thread-utils.h" at the top of
the file unconditionally.
+ * Require index file to be larger than this threshold before
+ * we bother using a background thread to verify the SHA.
+ */
+#define VERIFY_HDR_THRESHOLD (1024)
nits: (1) no need for parens for a numerical macro like this
(2) comment can be made briefer and more explicit:
* Index size threshold in bytes before it's worth bothering to
* use a background thread to verify the index file.
How was this value chosen?
This was somewhat at random. I'll update with the t/perf stuff.
+struct verify_hdr_thread_data {
+ pthread_t thread_id;
+ struct cache_header *hdr;
+ size_t size;
+ int result;
All structs are data. Other parts of git seem to name this kind of
callback cookie *_cb_data, so perhaps verify_hdr_cb_data?
On the other hand this seems to also be used by the caller as a handle
to the async verify_hdr process. Maybe verify_hdr_state?
This seems to be doing something similar to the existing 'struct
async' interface. Could it use that instead, or does it incur too
much overhead? An advantage would be avoiding having to handle the
NO_PTHREADS ifdef-ery.
+ * Thread proc to run verify_hdr() computation in a background thread.
+ */
+static void *verify_hdr_thread_proc(void *_data)
Please don't name identifiers with a leading underscore --- those are
reserved names.
+ struct verify_hdr_thread_data *d = _data;
+ d->result = verify_hdr(d->hdr, (unsigned long)d->size);
+ return NULL;
I was just modeling the code on what I saw in preload-index.c.
There the #ifdef side defines the trivial functions. Then the #else
and the #include, the struct thread_data, and then the
preload_thread(void *_data) function declaration. But blame reports
that code dating back to 2008. Is there an example of a newer
preferred style somewhere ?
+ * Threaded version starts background thread and returns zero
+ * to indicate that we don't know the hash is bad yet. If the
+ * index is too small, we just do the work imediately.
+ */
+static int verify_hdr_start(struct verify_hdr_thread_data *d)
This comment restates what the code says. Is there background or
something about the intent behind the code that could be said instead
to help the reader? Otherwise I'd suggest removing the comment.
What happens if there is an error before the code reaches the end of
the function? I think there needs to be a verify_hdr_finish call in
the 'unmap:' cleanup section.
But the "unmap" section calls die(). Do need to join first ??
(It's OK if we do, just asking.)
The rest looks reasonable.
Thanks and hope that helps,