Hi, Rob Browning <r...@defaultvalue.org> skribis:
> Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. > > Change the scm_i_str2symbol internal hash type to unsigned long to > explicitly match the hashing result type. Oh, good catch. For the final patch please add a ChangeLog-style entry. > + // Make sure a utf-8 symbol has the expected hash. In addition to > + // catching algorithmic regressions, this would have caught a > + // long-standing buffer overflow. > + > + // περί > + char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0}; > + SCM sym = scm_from_utf8_symbol (about_u8); > + > + const unsigned long expect = 4029223418961680680; > + const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym)); Is this a documented example of Jenkins? Or did you use a reference implementation? > Hmm. I suppose the current test could be handled on the scheme side > instead. (I'd started off attempting some more direct, elaborate tests > that didn't pan out.) Happy to rework that if desired. Yes, it may be nicer to have it in ‘test-suite/tests/hash.test’. AFAICS this will only change the hash of UTF-8 symbols and won’t have any effect on the output of ‘string-hash’, right? If not that would be an incompatibility. Thanks and sorry for the delay! Ludo’.