On Wed, Mar 21, 2018 at 09:24:06AM -0400, Derrick Stolee wrote:
> On 3/20/2018 6:25 PM, Jonathan Tan wrote:
> > On Tue, 20 Mar 2018 16:03:25 -0400
> > Derrick Stolee <dsto...@microsoft.com> wrote:
> > > One caveat about the patch: there is a place where I cast a sha1 hash
> > > into a struct object_id pointer. This is because the abbreviation code
> > > still uses 'const unsigned char *' instead of structs. I wanted to avoid
> > > a hashcpy() in these calls, but perhaps that is not too heavy a cost.
> > I recall a discussion that there were alignment issues with doing this,
> > but I might have be remembering wrongly - in my limited knowledge of C
> > alignment, both "unsigned char *" and "struct object_id *" have the same
> > constraints, but I'm not sure.
> 
> Adding Brian M. Carlson in the CC line for advice on how to do this
> translation between old sha1's and new object_ids. If it isn't safe, then we
> could do a hashcpy() until the translation makes it unnecessary.
> 
> I should have compared the two methods before sending the patch, but running
> the "git log --oneline --parents" test with a hashcpy() versus a cast has no
> measurable difference in performance for Linux. Probably best to do the
> safest thing here if there is no cost to perf.

There is no alignment difference.  The alignment of struct object_id is
going to be the same as the underlying hash.  My concern in the past has
been strict aliasing violations, which compilers can sometimes exploit
to generate incorrect code.

However, the bigger concern tends to be that when we switch to a new
hash function, we may extend struct object_id with a hash type byte.
The current hash function transition plan certainly makes this a likely
scenario.  In such a case, a cast would end reading past the end of the
underlying array should we read the type byte.

If this isn't a performance critical path, I'd recommend simply making a
copy.  I can clean up the definition of struct min_abbrev_data in a
future series, or I can do something like the following on top of the
last series I sent, which is in next (only compile tested).  If you're
willing to wait until it hits master, you can just drop the patch in.

-- >8 --
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: "brian m. carlson" <sand...@crustytoothpaste.net>
Date: Wed, 21 Mar 2018 22:38:09 +0000
Subject: [PATCH] sha1_name: convert struct min_abbrev_data to object_id

This structure is only written to in one place, where we already have a
struct object_id.  Convert the struct to use a struct object_id instead.

Signed-off-by: brian m. carlson <sand...@crustytoothpaste.net>
---
 sha1_name.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 39e911c8ba..16e0003396 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,7 +480,7 @@ struct min_abbrev_data {
        unsigned int init_len;
        unsigned int cur_len;
        char *hex;
-       const unsigned char *hash;
+       const struct object_id *oid;
 };
 
 static inline char get_hex_char_from_oid(const struct object_id *oid,
@@ -526,7 +526,7 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
                int cmp;
 
                current = nth_packed_object_sha1(p, mid);
-               cmp = hashcmp(mad->hash, current);
+               cmp = hashcmp(mad->oid->hash, current);
                if (!cmp) {
                        match = 1;
                        first = mid;
@@ -603,7 +603,7 @@ int find_unique_abbrev_r(char *hex, const struct object_id 
*oid, int len)
        mad.init_len = len;
        mad.cur_len = len;
        mad.hex = hex;
-       mad.hash = oid->hash;
+       mad.oid = oid;
 
        find_abbrev_len_packed(&mad);
 
-- >8 --
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

Attachment: signature.asc
Description: PGP signature

Reply via email to