Hi folks,

We recently had a Jira and PR created around allowing sha256 to be
used instead of (or as well as) md5 for script caching/naming
purposes. The relevant issue is:

https://issues.apache.org/jira/browse/GROOVY-11459

By way of background, often md5 is not a good choice when a hack
threat is possible. That's not really the scenario for our usage, we
only use hashing to generate a unique name for scripts without a
filename or as a caching index. The sha256 algorithm does offer
slightly better collision avoidance which would be a good thing for
the caching scenario. Having said that, we'd not want to have a big
performance hit to gain a little collision avoidance.

I'll also point out that this caching isn't applicable to normally
compiled sources, just when using GroovyClassLoader directly (or
indirectly via GroovyScriptEngine), so most users wouldn't be impacted
anyway.

My initial thinking was that we'd make the algorithm configurable and
allow sha256 to be swapped in for folks that needed it. But in my
tests, sha256 is very close to md5, so having the default be md5
possibly comes into play. I also tested murmur3 and xxHash algorithms
but at least for the all-Java implementations that I tried, they
didn't offer any useful improvement for us.

An option for Groovy 5 is to make sha256 the default with the ability
to swap back to md5 with a system property. And we could back port the
configurability part to Groovy 4 but keep the default as is (md5). But
for us to consider this, I think we'd need to do some more testing. My
current testing is based around just the 12 sample files that the
performance testing uses.

Does anyone else have thoughts or have the ability to help out with
further testing?
(See more information in the Jira).

Thanks, Paul.

Reply via email to