I think spinning up a new repository while this exploratory work
progresses is a fine idea — perhaps apache/arrow-dbc / arrow-adbc or
similar (the name can always be changed later). That would bubble up
discussions in a way that's easier for people to follow (watching your
fork isn't ideal!). If it
> I don't think replacing Scalar compute paths with dedicated paths for
> RLE-encoded data would ever be a simplification. Also, when a kernel
> hasn't been upgraded with a native path for RLE data, former Scalar
> Datums would now be expanded to the full RLE-decoded version before
> running the ke
I don't think you are missing anything. The parquet encoding is baked
into the data on the disk so re-encoding at some stage is inevitable.
Re-encoding in python like you are doing is going to be inefficient.
I think you will want to do the re-encoding in C++. Unfortunately, I
don't think we have
I haven't had a chance to look at the branch in detail, but if you can
provide a pointer to a specification or other details about the
proposed memory format for RLE (basically: what would be added to the
columnar documentation as well as the Flatbuffers schema files), it
would be helpful so it can
I'm also supportive of having a small vendorable C/C++ "Arrow
middleware" that provides:
* Schemas and types
* Columnar data structures and minimal APIs to build them and iterate over them
* C data interface
* Minimal validation (at the level of Validate but not ValidateFull)
I don't think it's g
Hi,
There are no objections. I've merged this:
https://github.com/apache/arrow/pull/13184
Thanks,
--
kou
In <20220525.061541.194737838528371525@clear-code.com>
"Re: Merge a pull request with GitHub API" on Wed, 25 May 2022 06:15:41 +0900
(JST),
Sutou Kouhei wrote:
> Hi,
>
> Do you
Hi,
Am Dienstag, dem 31.05.2022 um 21:12 +0200 schrieb Antoine Pitrou:
>
> Hi,
>
> Le 31/05/2022 à 20:24, Tobias Zagorni a écrit :
> > Hi, I'm currently working on adding Run-Length encoding to arrow. I
> > created a function to dictionary-encode arrays here (currently only
> > for
> > fixed le
Le 31/05/2022 à 21:41, Micah Kornfield a écrit :
I'm currently working on adding Run-Length encoding to arrow.
Nice
What are the intended use cases for this:
- external engines want to provide run-length encoded data to work on
using arrow?
It is more than just external engines. Many p
>
> I'm currently working on adding Run-Length encoding to arrow.
Nice
> What are the intended use cases for this:
> - external engines want to provide run-length encoded data to work on
> using arrow?
>
It is more than just external engines. Many popular file formats support
RLE encoding. Bei
Hi,
Le 31/05/2022 à 20:24, Tobias Zagorni a écrit :
Hi, I'm currently working on adding Run-Length encoding to arrow. I
created a function to dictionary-encode arrays here (currently only for
fixed length types):
https://github.com/apache/arrow/compare/master...zagto:rle?expand=1
The general
Hi,
Background:
I have a need to optimize read speed for few-column lookups in large
datasets. Currently I have the data in Plasma to have fast reading of it,
but Plasma is cumbersome to manage when the data frequently changes (and
“locks” the ram). Instead I’m trying to figure out a fast-enough a
Hi, I'm currently working on adding Run-Length encoding to arrow. I
created a function to dictionary-encode arrays here (currently only for
fixed length types):
https://github.com/apache/arrow/compare/master...zagto:rle?expand=1
The general idea is that RLE data will be a nested data type, with a
Some updates:
The proposal is being updated based on feedback from contributors to DuckDB and
DBI. We've been using GitHub issues on the fork to discuss the API design and
how to implement data ingestion/bound parameters:
https://github.com/lidavidm/arrow/issues
If anyone has suggestions/idea
For those interested, the PR for this new API is ready for review here:
https://github.com/apache/arrow/pull/12775
On Wed, Apr 6, 2022 at 11:17 AM Will Jones wrote:
> Hello,
>
> I've fleshed out the ideas in the doc in this draft PR:
> https://github.com/apache/arrow/pull/12775
>
> Feedback on t
For the record, https://github.com/apache/arrow/pull/13115 was merged
with the proposed change.
Regards
Antoine.
On Fri, 13 May 2022 17:48:21 +0200
Antoine Pitrou wrote:
> I don't think this needs a vote, there is no functional change in the
> spec, it's just an additional technical recomm
15 matches
Mail list logo