XXH3 (by the xxhash author) was recently presented, though it's still
experimental for now:
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html
It is claimed to be significantly faster than xxhash, on all message sizes.
Regards
Antoine.
Le 06/03/2019 à 07:06, Micah Kornfield a
Hi Wes,
Thanks for the response. I was thinking being able to checksum
everything. I agree it should be off by default. I'll put this on the
back burner for now. If I can find some spare time (which won't likely be
any time soon), I'll submit a PR for further discussion.
Cheers,
Micah
On Wed
hi Micah,
It seems like the checksum could be included in the Message flatbuffer
table instead of having to add things to the protocol
https://github.com/apache/arrow/blob/master/format/Message.fbs#L94
Am I correct that computing a checksum on the message body is what is
mainly of interest? Beyo
Doing some light research it looks xxhash has better cross-platform support
as is faster then a vanilla implementation of crc32 [1]. However, crc32c
(a slightly different crc32 algorithm) is hardware accelerated on newer
(circa 2016) Intel CPUs [2] and is potentially faster.
[1] https://cyan4973.
Thanks Philipp,
Yeah, I probably shouldn't have said SHA1 either :)I'm not too
concerned with a particular hash/checksum implementation. It would be good
to have at least 1 or 2 well supported ones, and a migration path to
support more if necessary without breaking file/streaming formats for
Hey Micah,
in plasma, we are using xxhash to compute a hash/checksum [1] (it is
computed in parallel using multiple threads) and have good experience with
it -- all data in Ray is checksummed this way. Initially there were
problems with uninitialized bits in the arrow representation, but that has
(I meant to say SHA256 instead of SHA1)
On Tue, Mar 5, 2019 at 9:45 PM Philipp Moritz wrote:
> Hey Micah,
>
> in plasma, we are using xxhash to compute a hash/checksum [1] (it is
> computed in parallel using multiple threads) and have good experience with
> it -- all data in Ray is checksummed t
Hi Arrow Dev,
As we expand the use-cases for Arrow to move it more across system
boundaries (Flight) and make it live longer (e.g. in the file format), it
seems to make sense to build in a mechanism for data integrity verification
(e.g. a checksum like CRC32 or in some cases a cryptographic hash li