Hi Christoph, Sorry for my delayed reply.
Christoph Buck <d...@icepic.de> writes: > Hi Guix! > > So i looked into the guile source code and, as expected, the `scm_hash` > function (see hash.c in guile) uses `unsigned long` wich is 8 bytes on > x64 and 4 bytes on arm32/i868. If `string-hash` is called with the size > parameter `n`, the hash value is limited to size by calculating the > modulo `n` of the hash value, see scm_ihash in hash.c:440, namely > >> (unsigned long) scm_raw_ihash (obj, 10) % n > > (The `10` can be ignored as far as i can tell). Since the hash values > are different on different platforms the modulo is different as well. > > However, if one steps through the call stack of `string-hash` you can > see that the actual hash value is calculated by the > `JENKINS_LOOKUP3_HASHWORD2` macro, which contains are rather > interesting comment and a possible workaround for the abi problem, > namely > > /* Scheme can access symbol-hash, which exposes this value. For \ > cross-compilation reasons, we ensure that the high 32 bits of \ > the hash on a 64-bit system are equal to the hash on a 32-bit \ > system. The low 32 bits just add more entropy. */ \ > if (sizeof (ret) == 8) \ > ret = (((unsigned long) c) << 32) | b; \ > else \ > ret = c; \ > > > in hash.c:82. > > Meaning, if executed on a x64 platform, the higher 32bit of the > resulting 64bit hash result are equal to the hash value on a 32bit > platform. A simple test case in c++ looks like this: > > int main(int args, char** argv) > { > scm_init_guile(); > auto strToHash = scm_from_locale_string ("((device) (mount-point))"); > auto maxULong = scm_from_ulong(ULONG_MAX); > auto hashResult = scm_hash(strToHash,maxULong); > auto hashResultUL = scm_to_ulong(hashResult); > std::cout << "Max ULONG_MAX: " << ULONG_MAX <<std::endl; > std::cout << "Original hashResult ulong: " << hashResultUL << std::endl; > > if(sizeof(hashResultUL) == 8) > { > std::cout << "Corrected for 32bit: " << (hashResultUL >> 32) << > std::endl; > } > } > > > which results on x64 in > >> Max ULONG_MAX: 18446744073709551615 >> Original hashResult ulong: 10454028974864831 >> Corrected for 32bit: 2434018 > > and on arm32 to > >> Max ULONG_MAX: 4294967295 >> Original hashResult ulong: 2434018 > > This suggest the following workaround. Always limit the hash size to > 32bit even if executed on a 64bit platform (or to be more specific a > platform where ulong is 8bytes big). Do this by right shift the hash > value 32bits and don't rely on the size parameter of the `string-hash` > function. > > In code it could look something like this > > (define (compute-abi-cookie field-specs) > ;; Compute an "ABI cookie" for the given FIELD-SPECS. We use > ;; 'string-hash' because that's a better hash function that 'hash' on a > ;; list of symbols. > (let ((hash > (syntax-case field-specs () > (((field get properties ...) ...) > (let ((hash-value (string-hash (object->string > (syntax->datum #'((field > properties ...) ...)))))) > (if (= (native-word-size) 8) > (ash hash-value -32) > hash-value))))) > (fd (syntax-case field-specs () > (((field get properties ...) ...) > (object->string > (syntax->datum #'((field properties ...) ...))))))) > > (format #t "Compute-abi-cookie: ~a~%" hash) > hash)) > > > where `native-word-size` is define by > > (define (native-word-size) > ((@ (system foreign) sizeof) '*)) This is a thorough investigation, and the above fix/workaround that can be applied to Guix is the cherry on the cake! Thank you for producing it. > (taken from `cross-compilation.test`). There might be a cleaner way to > formulate this, but you get the point. > > This seems to work for all combinations on my machine. I tested > x64 -> arm, x64 -> i868, i868 -> x64... > > I can only think of two drawbacks. > > 1) Lost entropy on 64 bit machines > 2) Abi break because on new compilation the hash values on 64bit > platforms will change. > > 1) is imho irrelevant, because it is not cryptophically important. For > 2) i am not sure how important this is. > > Any thoughts on this? I don't think 1 or 2 are too important; about 2 for example, we break the ABI every time we add or remove a new record field. > Might this be something worth fixing and sending a patch in? Totally, if you haven't done so already. -- Thanks, Maxim