It also seems that two variations of the variant encoding are being
discussed. The original spec, currently housed in Spark, creates a variant
array in row-major order, that is, each element in the array, is contained
contiguously. So, if you have objects like `{"a": 7, "b": 3}` then the
values f
Le 22/08/2024 à 17:08, Curt Hagenlocher a écrit :
(I also happen to want a canonical Arrow representation for variant data,
as this type occurs in many databases but doesn't have a great
representation today in ADBC results. That's why I filed [Format] Consider
adding an official variant type
This seems to straddle that line, in that you can also view this as a way
to represent semi-structured data in a manner that allows for more
efficient querying and computation by breaking out some of its components
into a more structured form.
(I also happen to want a canonical Arrow representatio
Ah, thanks. I've tried to find a rationale and ended up on
https://lists.apache.org/thread/xnyo1k66dxh0ffpg7j9f04xgos0kwc34 . Is it
a good description of what you're after?
If so, then I don't think Arrow is a good match. This seems mostly to be
a marshalling format for semi-structured data
Sorry for the inconvenience.
This is the permalink for the discussion:
https://lists.apache.org/thread/hopkr2f0ftoywwt9zo3jxb7n0ob5s5bw
On Thu, Aug 22, 2024 at 3:51 PM Antoine Pitrou wrote:
>
> Hi Gang,
>
> Sorry, but can you give a pointer to the start of this discussion thread
> in a readable
Hi Gang,
Sorry, but can you give a pointer to the start of this discussion thread
in a readable format (for example a mailing-list archive)? It appears
that dev@arrow wasn't cc'ed from the start and that can make it
difficult to understand what this is about.
Regards
Antoine.
Le 22/08/2
I personally believe arrow is a better choice since we will eventually have the
same memory layout but different physical layouts in Parquet, ORC, or other
file formats.
One concern about this option I have is whether the Arrow community is willing
to make this happen and maintain this specific
It seems that we have reached a consensus to some extent that there
should be a new home for the variant spec. The pending question
is whether Parquet or Arrow is a better choice. As a committer from Arrow,
Parquet and ORC communities, I am neutral to choose any and happy to
help with the movement
>
> That being said, I think the most important consideration for now is where
> are the current maintainers / contributors to the variant type. If most of
> them are already PMC members / committers on a project, it becomes a bit
> easier. Otherwise if there isn't much overlap with a project's exi
In being more engine and format agnostic, I agree the Arrow project might
be a good host for such a specification. It seems like we want to move away
from hosting in Spark to make it engine agnostic. But moving into Iceberg
might make it less format agnostic, as I understand multiple formats might
+ dev@arrow
Thanks for all the valuable suggestions! I am inclined to Micah's idea that
Arrow might be a better host compared to Parquet.
To give more context, I am taking the initiative to add the geometry type
to both Parquet and ORC. I'd like to do the same thing for variant type in
that varia
11 matches
Mail list logo