My patch for this is finally up
https://github.com/apache/arrow/pull/4316
It was kind of a bloodbath, but I think this puts us on a sustainable
path and unlocks a lot of efforts that we've been blocked on.
On Mon, May 13, 2019 at 10:01 AM Wes McKinney wrote:
>
> As I've ventured further in work
As I've ventured further in working on this I've realized that it's
not practical (or even a good idea) to continue to maintain the "fixed
dictionary" path. Since the IPC protocol can have evolving
dictionaries, nearly all code paths in the codebase have to change to
work for the variable case, whi
I have started working on this some to assess what is involved.
My present plan is to have
FixedDictionaryType and FixedDictionaryArray
VariableDictionaryType and VariableDictionaryArray
deprecate (?) current DictionaryType/DictionaryArray names, for
clarity (thoughts about this would be welcome
Thanks Wes, your proposed additional data type makes more sense to me.
> As a first use case for this I would be personally looking to address
> reads of encoded data from
> Parquet format without an intermediate pass through dense format
> (which can be slow and wasteful for heavily
hi Hatem,
Thanks for commenting.
I am not sure your solution will work reliably because code is written
against arrow::DictionaryType with the presumption that the dictionary
is known and static, and can be obtained by invoking
DictionaryType::dictionary. In the variable dictionary case, the
dict
Hi Wes,
Thanks for the detailed writeup and I think this an important problem to solve.
I spent some time thinking about this when working on ARROW-3769 and came to a
similar conclusion that the current dictionary type was limiting when doing
partial reads of parquet files.
I'm not sure if th
On Mon, Apr 29, 2019 at 2:59 PM Micah Kornfield wrote:
>
> >
> > > * The _actual_ dictionary values for a particular Array must be stored
> > > somewhere and lifetime managed. I propose to put these as a single
> > > entry in ArrayData::child_data [4]. An alternative to this would be to
> > > modi
>
> > * The _actual_ dictionary values for a particular Array must be stored
> > somewhere and lifetime managed. I propose to put these as a single
> > entry in ArrayData::child_data [4]. An alternative to this would be to
> > modify ArrayData to have a dictionary field that would be unused
> > exc
Hi Wes,
Le 29/04/2019 à 20:10, Wes McKinney a écrit :
>
> * Receiving a record batch schema without the dictionaries attached
> (e.g. in Arrow Flight), see also experimental patch [2]
Note that this was finally done in a separate PR, and only required
changes in the IPC implementation.
> Here
hi all,
There have been many discussions in passing on various issues and JIRA
tickets over the last months and years about how to manage
dictionary-encoded columnar arrays in-memory in C++. Here's a list of
some problems we have encountered:
* Dictionaries that may differ from one record batch t
10 matches
Mail list logo