Hi Micah, thanks for your suggestion.
You are right, the mainly difference between FixSizedListVector and ListVector
is the offsetBuffer, but I think this could be avoided through
allocateNewSafe() overwrite which calls allocateOffsetBuffer() in
BaseRepeatedValueVector, in this way, offsetBuffe
Hi Ji Liu,
I think have a common interface/base-class for the two makes sense (but
don't have historical context) from a reading data perspective.
I think the change would need to be something above
BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an
offset buffer, and that f
Hi Jacques,
I definitely understand these concerns and this change is risky because it
is so large. Perhaps, creating a new hierarchy, might be the cleanest way
of dealing with this. This could have other benefits like cleaning up some
cruft around dictionary encode and "orphaned" method. Per p
Hi, Jacques, thanks for your valuable feedback.
Sorry for the lack of discuss. Some of these PRs are small change/bugfix which
not deserving a discuss. You are right, some PRs are more complex than we
thought before in the review process, making a discuss on ML/JIRA would
actually help. This si
Hi, all
While working on the issue to implement dictionary-encoded subfields[1] [2], I
found FixedSizeListVector not extends ListVector(Thanks Micah pointing this out
and curious why implemented FixedSizeListVector this way
before). Since FixedSizeListVector is a specific case of ListVector, sh
Hey Micah, I didn't have a particular path in mind. Was thinking more along
the lines of extra methods as opposed to separate classes.
Arrow hasn't historically been a place where we're writing algorithms in
Java so the fact that they aren't there doesn't mean they don't exist. We
have a large amo
Sutou Kouhei created ARROW-6197:
---
Summary: [GLib] Add garrow_decimal128_rescale()
Key: ARROW-6197
URL: https://issues.apache.org/jira/browse/ARROW-6197
Project: Apache Arrow
Issue Type: New Fea
Reading data from two different parquet files sequentially with different
dictionaries for the same column. This could be handled by re-encoding
data but that seems potentially sub-optimal.
On Sat, Aug 10, 2019 at 12:38 PM Jacques Nadeau wrote:
> What situation are anticipating where you're goi
Hi Jacques,
What avenue were you thinking for supporting both paths? I didn't want to
pursue a different class hierarchy, because I felt like that would
effectively fork the code base, but that is potentially an option that
would allow us to have a complete reference implementation in Java that c
Sutou Kouhei created ARROW-6196:
---
Summary: [Ruby] Add support for building Arrow::TimeNNArray by .new
Key: ARROW-6196
URL: https://issues.apache.org/jira/browse/ARROW-6196
Project: Apache Arrow
This is a pretty massive change to the apis. I wonder how nasty it would be
to just support both paths. Have you evaluated how complex that would be?
On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield
wrote:
> After more investigation, it looks like Float8Benchmarks at least on my
> machine are wit
What situation are anticipating where you're going to be restating ids mid
stream?
On Sat, Aug 10, 2019 at 12:13 AM Micah Kornfield
wrote:
> The IPC specification [1] defines behavior when isDelta on a
> DictionaryBatch [2] is "true". I might have missed it in the
> specification, but I couldn'
I think one of the issues here is that there is no upfront discussion about
most of the changes that are being proposed. In most cases, a pull request
just appears without. This makes the reviews much more intensive and time
consuming as frequently there are questions about the validity, nature or
Omer Ozarslan created ARROW-6195:
Summary: [C++] CMake fails with file not found error while
bundling thrift if python is not installed
Key: ARROW-6195
URL: https://issues.apache.org/jira/browse/ARROW-6195
Ji Liu created ARROW-6194:
-
Summary: [Java] Make DictionaryEncoder non-static making it easy
to extend and reuse
Key: ARROW-6194
URL: https://issues.apache.org/jira/browse/ARROW-6194
Project: Apache Arrow
I should add that Option #1 above would be my preference, even though it
adds some complications (especially for the file format).
On Sat, Aug 10, 2019 at 12:12 AM Micah Kornfield
wrote:
> The IPC specification [1] defines behavior when isDelta on a
> DictionaryBatch [2] is "true". I might have
The IPC specification [1] defines behavior when isDelta on a
DictionaryBatch [2] is "true". I might have missed it in the
specification, but I couldn't find the interpretation for what the expected
behavior is when isDelta=false and and two dictionary batches with the
same ID are sent.
It seems
17 matches
Mail list logo