Responding to this comment from GitHub[1]:
If we had to make a bet about what % of dictionaries empirically are
between 128 and 255 elements, I would bet that the percentage is
small. If it turned out that 40% of dictionaries fell in that range
then I would agree that this makes sense.
I agr
excellent thank you:
https://github.com/grafana/grafana/pull/25871/files#diff-ad1ca51923ba5a2e01652c1686ae6797R169
On Fri, Jun 26, 2020 at 7:54 AM Brian Hulette wrote:
> Hi Ryan,
> Here or user@arrow.apache.orgis a fine place to ask :)
>
> The metadata on Table/Column/Field objects are all immu
The vote carries with 5 binding +1 votes and 1 non-binding +1. I will
merge the change and open some JIRAs about the reference
implementations adding forward compatibility checks that the bit width
they receive is either null or 128
On Thu, Jun 25, 2020 at 4:02 PM Sutou Kouhei wrote:
>
> +1 (bind
I think that situations where you need uint64 indices are likely to be
exceedingly esoteric. I would recommend that the specification advise
against use of 64-bit indices at all unless that are actually needed
to represent the data (i.e. dictionaries have more than INT32_MAX /
UINT32_MAX elements,
If positive integers are expected, I'm in favor of supporting unsigned
index types. I was surprised at Arrow C++ restriction on signed indices
in the RAPIDS thread, perhaps it's newer than when I ported the logic in JS.
Based on the flatbuffer schemas, dictionary indices could technically be
a
On Fri, 26 Jun 2020 13:56:26 -0400
Radu Teodorescu wrote:
> Looks like Concatenate is my best bet if I am looking at putting together
> ranges, certainly doesn’t look as neatly packaged as Take, but this might be
> the right tool for this job.
Yes, you could Slice the array and then Concatena
I think in the interest of not having the spec fork we should probably do
this. It is partially our fault for not providing better documentation in
Schema.fbs (and potentially more thorough integration tests).
Maybe we should explicitly disallow uint64 which provides the biggest
headache for the
I agree I think we have to do this given the number of changes in flight
(especially union types).
On Fri, Jun 26, 2020 at 7:29 AM Wes McKinney wrote:
> I created a JIRA about this
>
> https://issues.apache.org/jira/browse/ARROW-9231
>
> This issue is quite important so please take a look.
>
> O
You can also use the `Field.prototype.clone()` method[1] like this to
further reduce the boilerplate:
function renameColumn(col, new_name) {
return Column.new(col.field.clone(new_name), col.chunks);
}
1. https://github.com/apache/arrow/blob/master/js/src/schema.ts#L139-L146
On 6/26/20 7:54
Looks like Concatenate is my best bet if I am looking at putting together
ranges, certainly doesn’t look as neatly packaged as Take, but this might be
the right tool for this job.
> On Jun 26, 2020, at 1:01 PM, Radu Teodorescu
> wrote:
>
> That is fabulous and pretty much it!
> Follow up qu
That is fabulous and pretty much it!
Follow up questions:
1. Is there any efficient way to refer to ranges: say I want to take rows
1000-2000 and 4000-5000, feels unwieldy to have to create an index array of
2000 elements and then also the underlying implementation would be less
efficient having
This sounds like the Take kernel?
On Friday, June 26, 2020, Radu Teodorescu
wrote:
> (Light weigh topic this time)
> Are there any existing functions for deep copying Array,ArrayData or Table
> objects in the C++ API?
> Ultimately, I am trying to get a bunch of sparse row ranges from a ranges
>
(Light weigh topic this time)
Are there any existing functions for deep copying Array,ArrayData or Table
objects in the C++ API?
Ultimately, I am trying to get a bunch of sparse row ranges from a ranges into
a contiguous new Table - I can see how I can copy Buffer and I can implement it
all myse
Also, let me clarify so there is no confusion - There should be no problem
creating static / read only arrow data files with a 'date to batch' index
in the manner i described. The problem I am referring to only becomes an
issue if you need to append a new batch on a daily basis
-Anthony
On Fri
+1 to this..
There is a logical way to do this now - If you create a batch per day you
can maintain a separate arrow file (an index) to map the date to batch.. We
do this for indexing via other keys, and I can say it works well for
'large' files - 25gb+. I think unfortunately, doing this via the c
+1
Is the dataset the model for that?
On Fri, Jun 26, 2020 at 11:42 AM Kirill Lykov
wrote:
> Hi,
>
> I wonder what is the best way to represent time series in the arrow.
> Maybe someone did a research already about different ways of
> representing these data? Or there is a ready-to-use solution
Hi,
I wonder what is the best way to represent time series in the arrow.
Maybe someone did a research already about different ways of
representing these data? Or there is a ready-to-use solution inside
the library. Basically, I need a third dimension to the table which is
time. One of the solutio
Hi Ryan,
Here or user@arrow.apache.orgis a fine place to ask :)
The metadata on Table/Column/Field objects are all immutable, so doing this
right now would require creating a new instance of Table with the field
renamed, which takes quite a lot of boilerplate. A helper for renaming a
column (or ev
I created a JIRA about this
https://issues.apache.org/jira/browse/ARROW-9231
This issue is quite important so please take a look.
On Thu, Jun 25, 2020 at 8:53 AM Wes McKinney wrote:
>
> On Thu, Jun 25, 2020 at 5:31 AM Antoine Pitrou wrote:
> >
> >
> > Le 25/06/2020 à 12:18, Antoine Pitrou a éc
Yes, it would be quite feasible to preallocate a region large enough for
several thousand rows for each column, assuming I read from that region while
it's still filling in. When that region is full, I could either allocate a new
big chunk or loop around if I no longer need the data. I'm now doi
hi folks,
At the moment, using unsigned integers for dictionary indices/codes
isn't exactly forbidden by the metadata [1], which says that the
indices must be "positive integers". Meanwhile, the columnar format
specification says
"When a field is dictionary encoded, the values are represented by
Arrow Build Report for Job nightly-2020-06-26-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-26-0
Failed Tasks:
- centos-7-aarch64:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-26-0-travis-centos-7-aarch64
- debian-bust
22 matches
Mail list logo