Hi Micah,
We have run into some of these issues on Impala in various guises,
including hash tables and min/max stats in parquet. Treating +0/-0 as
indistinguishable for purposes of equality and grouping makes the most
sense and avoids most pitfalls.
NaN is messier. I don't think there's necessar
hi Peng,
Here is a minimal reproduction of the issue you're having:
In [38]: arr = np.empty(2, dtype=object)
In [39]: arr[0] = np.array([1, 2])
In [40]: arr[1] = np.array([2, 3])
In [41]: arr2 = np.empty(2, dtype=object)
In [42]: arr2[0] = arr
In [43]: arr2[1] = arr
In [45]: pa.array(arr2)
Hi Java Arrow-Developers,
I've been looking more into the java code base and I was wondering if
people think any of the following might be worthwhile (or are strictly
against them). My java infrastructure knowledge is a little stale, so if a
suggestion I make is absolutely ridiculous I apologize.
Andy Grove created ARROW-4681:
-
Summary: [Rust] [DataFusion] Implement parallel query execution
using threads
Key: ARROW-4681
URL: https://issues.apache.org/jira/browse/ARROW-4681
Project: Apache Arrow
I apologize I'm a little late on chiming in on flight but I had some
questions/comments that a quick search of the mailing list didn't seem to
turn up anything and I didn't see comment on the initial pull request [1]
1. What is meant by "sidecar patterns" [2] on the data buffer bytes?
2. Was usi
Neville Dipale created ARROW-4680:
-
Summary: [CI] [Rust] Travis CI builds fail with latest Rust
1.34.0-nightly (2019-02-25)
Key: ARROW-4680
URL: https://issues.apache.org/jira/browse/ARROW-4680
Projec
Implementing compute kernels that depend on hashing has raised a couple of
edge cases that are worth discussing. In particular
the following points need to be resolved (I opened a JIRA [1] to track the
fixes). In particular:
1. How to handle -0.0 and 0.0?
- Option 1: Collapse to a single value
The issue I'm blocked on is getting boost installed properly. I've
included all of the steps I've run below, if anyone has some thoughts or
the magical script to build and install the appropriate boost libraries
appropriate for the Static_Crt_Build i would greatly appreciate it.
With a Windows 10
Andy Grove created ARROW-4679:
-
Summary: [Rust] [DataFusion] Implement in-memory DataSource
Key: ARROW-4679
URL: https://issues.apache.org/jira/browse/ARROW-4679
Project: Apache Arrow
Issue Type:
Steven Fackler created ARROW-4678:
-
Summary: [Rust] Minimize unstable feature usage
Key: ARROW-4678
URL: https://issues.apache.org/jira/browse/ARROW-4678
Project: Apache Arrow
Issue Type: Imp
Gabe Joseph created ARROW-4677:
--
Summary: [Python] serialization does not consider ndarray
endianness
Key: ARROW-4677
URL: https://issues.apache.org/jira/browse/ARROW-4677
Project: Apache Arrow
On Mon, Feb 25, 2019 at 5:36 PM Antoine Pitrou wrote:
>
> Does it also roundtrip through e.g. Pandas conversion?
No. Any Arrow metadata is lost when you call to_pandas() (because
pandas objects don't have the ability to preserve any column-level
metadata, only the physical data type). The metadat
Le 26/02/2019 à 00:32, Wes McKinney a écrit :
> hi folks,
>
> I recently wrote a patch to propose a C++ API for user-defined "extension"
> types
>
> https://github.com/apache/arrow/pull/3694
>
> The idea is that an extension type wraps a pre-existing Arrow type.
> For example a UUIDType can b
hi folks,
I recently wrote a patch to propose a C++ API for user-defined "extension" types
https://github.com/apache/arrow/pull/3694
The idea is that an extension type wraps a pre-existing Arrow type.
For example a UUIDType can be represented as FixedSizeBinary(16). The
intent is that Arrow cons
Hi,
Currently We have nightly package builds, currently under my
github account, which is not really visible. It would be great to
make them available for developer purposes, and additionally
it'd test the binary scripts too.
The nightly packages are produced the same way like it is
documented in
Kouhei Sutou created ARROW-4676:
---
Summary: [C++] Add support for debug build with MinGW
Key: ARROW-4676
URL: https://issues.apache.org/jira/browse/ARROW-4676
Project: Apache Arrow
Issue Type: I
Gabe Joseph created ARROW-4675:
--
Summary: [Python] Error serializing bool ndarray in py2 and
deserializing in py3
Key: ARROW-4675
URL: https://issues.apache.org/jira/browse/ARROW-4675
Project: Apache Arr
Paul Taylor created ARROW-4674:
--
Summary: [JS] Update arrow2csv to new Row API
Key: ARROW-4674
URL: https://issues.apache.org/jira/browse/ARROW-4674
Project: Apache Arrow
Issue Type: Bug
Hi all,
I’d like to discuss the versioning of the parquet shared libs that are built
when you use -DARROW_PARQUET=ON. My observation is that back when parquet-cpp
was a separate project the shared libs were versioned using the parquet-cpp
version number (e.g 1.4.0). Since moving to a single r
It might be nice to do this as a Gmock matcher instead of a separate macro
On Monday, February 25, 2019, Francois Saint-Jacques (JIRA)
wrote:
> Francois Saint-Jacques created ARROW-4673:
> -
>
> Summary: [C++] Implement AssertDatumEquals
>
Francois Saint-Jacques created ARROW-4673:
-
Summary: [C++] Implement AssertDatumEquals
Key: ARROW-4673
URL: https://issues.apache.org/jira/browse/ARROW-4673
Project: Apache Arrow
Issu
Thanks for quick response, I'll update the discussion in case of progress.
On Mon, Feb 25, 2019 at 6:01 PM Wes McKinney wrote:
>
> hi Igor,
>
> We have Map as a top-level logical data type in the columnar metadata:
>
> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L55
>
> There is
hi Joel and Uwe,
yes, feedback from the Iceberg community would be useful about what
kinds of APIs are required to be able to interact well with table
formats like Iceberg. As Uwe says, the objective of the C++ code I am
proposing to develop is to have appropriate C++ APIs for interacting
with dif
> On Feb 25, 2019, at 8:02 PM, Ihor Huzenko wrote:
>
> Hello Arrow Team,
>
> My name is Igor Guzenko. I'm currently working on task related to
> complex types in Apache Drill [1], and bumped into an issue that Drill
> hasn't
> appropriate vector for representing canonical (java-like) Map data
hi Igor,
We have Map as a top-level logical data type in the columnar metadata:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L55
There isn't anything more than this right now. We have not implemented
container types in Java or C++ yet, for the Map type, but I don't view
it to be
Hello Arrow Team,
My name is Igor Guzenko. I'm currently working on task related to
complex types in Apache Drill [1], and bumped into an issue that Drill
hasn't
appropriate vector for representing canonical (java-like) Map datatype
[2]. So I'm looking for inspiration how the efficient
columnar ma
With +6 (+4 binding) the vote passes. I will upload the artifacts soon.
On Mon, Feb 25, 2019, at 11:28 AM, Hatem Helal wrote:
> +1 (non-binding)
>
> Built on macOS 10.13 and ran unittests.
>
>
> On 2/24/19, 1:43 PM, "Wes McKinney" wrote:
>
> +1 (binding)
>
> Verified release can
+1 (non-binding)
Built on macOS 10.13 and ran unittests.
On 2/24/19, 1:43 PM, "Wes McKinney" wrote:
+1 (binding)
Verified release candidate with Windows 10 MSVC 2015
On Fri, Feb 22, 2019 at 4:14 PM Kouhei Sutou wrote:
>
> +1 (binding)
>
> I ran the follo
Hello,
this should definitely be shared with the Apache Iceberg community (cc'ed). The
title of the document may be a bit confusing. What is proposed in there is
actually constructing the building blocks in C++ that are required for
supporting Python/C++/.. implementations for things like Icebe
Hello,
Thanks for the write-up.
Have you considered sharing this document with the Apache Iceberg community?
My feeling is that there are some shared goals here between the two
projects.
And while their implementation is in Java, their spec is language agnostic.
Regards, Joel
On Sun, Feb 24,
30 matches
Mail list logo