Hi folks,
Neal Richardson suggested on the rOpenSci slack I might pose this question
to this list.
As an observer to both communities, I'm interested in if there is or might
be more communication between the Pangeo community's focus on Zarr
serialization with what the Arrow team has done with Par
Thank you for the updates so far.
However, we still need to add content for the following subprojects. The
report is due this Wednesday, July 10 (the board meeting is on July 17).
-
Arrow Flight SQL adapter for PostgreSQL
-
Nanoarrow
-
C++
-
Dataset & Parquet
-
On Mon, Jul 8, 2024 at 3:57 PM Weston Pace wrote:
> > user-facing API documentation someone would need to practically form
> and/or
> > process data when integrating a library into their code.
>
> If we are thinking API contract / programmatic access then I'd offer yet
> another alternative. At
Thanks for the links. That's very helpful context.
It's a shame the flatbuffer <-> json conversion isn't more widely
available, though I do see the complexity now.
It sounds like our best path forward for now will be to generate a pair of
assets for each of our types:
- A binary fbs-encoded IPC
Based on the response to using an empty IPC stream/file, it sounds to me like
something substrait-like is ideal. Maybe an interface that can go between the
equivalent of relational schemas and (generated) arrow code as you have shown.
Then, there could be straightforward integration points with
> but it doesn't address questions of the kind of
> user-facing API documentation someone would need to practically form
and/or
> process data when integrating a library into their code.
Agreed that IPC / flatbuffers / proto are not useful here. JSON might help
and YAML would be more pleasantly c
This has come up a few times in the past [1][2]. The main concern has been
about cross-version compatibility guarantees.
[1] https://github.com/apache/arrow/issues/25078
[2] https://lists.apache.org/thread/02p37yxksxccsqfn9l6j4ryno404ttnl
On Mon, Jul 8, 2024 at 3:10 PM Lee, David (PAG)
wrote:
>
Gah found a bug with my code.. Here's a corrected python version..
# iterate through possible nested columns
def _convert_to_arrow_type(field, obj):
"""
:param field:
:param obj:
:returns: pyarrow datatype
"""
if isinstance(obj, list):
for child_obj in obj:
I came up with my own json representation that I could put into json / yaml
config files with some python code to convert this into a pyarrow schema
object..
- yaml flat example-
fields:
cusip: string
start_date: date32
end_date: date32
purpose: string
source:
Hi,
So, something like a human and computer readable standard for arrow
schemas, e.g. via yaml or a json schema.
We kind of do this in our integration tests / golden tests, where we have
a non-official json representation of an arrow schema.
The ask here is to standardize such a format in some
That handles questions of machine-to-machine coordination, and let's me do
things like validation, but it doesn't address questions of the kind of
user-facing API documentation someone would need to practically form and/or
process data when integrating a library into their code.
I want to be able
+1 for empty stream/file as schema serialization. I have used this
approach myself on more than one occasion and it works well. It can even
be useful for transmitting schemas between different arrow-native libraries
in the same language (e.g. rust->rust) since it allows the different
libraries to
Hey Jeremy,
Currently the first message of an IPC stream is a Schema message which
consists solely of a flatbuffer message and defined in the Schema.fbs file
of the Arrow repo. All of the libraries that can read Arrow IPC should be
able to also handle converting a single IPC schema message back in
I'm looking for any advice folks may have on a generic way to document and
represent expected arrow schemas as part of an interface definition.
For context, our library provides a cross-language (python, c++, rust) SDK
for logging semantic multi-modal data (point clouds, images, geometric
transfor
14 matches
Mail list logo