Re: New datatype: Huge integers & decimals

Antoine Pitrou Wed, 24 May 2023 01:55:58 -0700


Hi Will,

I'll also note that, while float16 is a first-class datatype, I'm notsure any Arrow implementation is able to do anything else than justtransport it currently.

You're right that we'd probably want extension number types to be basedon fixed-size-binary. A complication is endianness, though. Currently,we have logic (for example in Arrow C++) to optionally byte-swap numberdata at the edge (when receiving non-native-endian data). How would itwork with extension types based on fixed-size-binary? There is a riskthat implementations recognizing the bfloat16 extension type wouldbyte-swap, but others would not, leading to corrupt data streams.

The bfloat16 extension type would then have to be parametrized with itsendianness, or mandate a fixed endianness (probably little endian).

For bigints, I think the situation is simpler. Little-endian is, Ithink, a much more convenient representation for bigints (at the cost ofsome potential runtime byte-shuffling on big-endian systems).


Regards

Antoine.


Le 23/05/2023 à 23:47, Will Jones a écrit :


I'm just starting to look at this, so not yet sure what the pros and cons
are of implementing it as an extension type versus a native Arrow type. My
initial ideas:

Pros of an extension type:
* It can be moved through Arrow-native systems that don't implement it, as
long as they preserve extension type information.

Pros of a native type:
* We have established patterns for writing compute kernels for natively
supported types.

If we were to implement these as extension types, I think bfloat16 and the
number types Ian Joiner mentions would be best implemented as extension
types based on fixed-size binary. We have a native float16 type already,
but I think making bfloat16 an extension type based on that it could get
accidentally manipulated as a float16, which IIUC would be invalid.

If anyone has any advice from our work thus far on extension types, I'd
welcome your input.

Best,

Will Jones

[1]
https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus
[2] https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

On Tue, May 23, 2023 at 10:49 AM Antoine Pitrou <anto...@python.org> wrote:


Your question seems unspecific, but we now have the possibility of
standardizing canonical extension types (which are, of course, optional
to implement and support):

https://arrow.apache.org/docs/format/CanonicalExtensions.html


Le 23/05/2023 à 19:45, Ian Joiner a écrit :

That’s a possibility. Do we consider officially support them?


On Tuesday, May 23, 2023, Antoine Pitrou <anto...@python.org> wrote:


I'm not sure what you're actually proposing here. A new extension type
perhaps?


Le 23/05/2023 à 19:13, Ian Joiner a écrit :

Hi,

We need to have really large integers (with 128, 256 and 512 bits) as

well

as decimals (up to at least decimal1024) because they do actually

exist in

crypto / web3 space.

See https://docs.rs/primitive-types/latest/primitive_types/ for an
example
of what needs to be supported.

If accepted we can implement the types for C++/Python and Rust.

Thanks,
Ian

Re: New datatype: Huge integers & decimals

Reply via email to