date:20190429

Re: Use arrow as a general data serialization framework in distributed stream data processing

2019-04-29 Thread Shawn Yang

Hi Micah, Thank you for your information about in-memory row-oriented standard. After days of work, I find that it is exactly the thing we need now. I looked into the discuss you mentioned. It seems no one takes up the work. Is there anything I can do to speed up us having in-memory row-oriented st

Re: How about inet4/inet6/macaddr data types?

2019-04-29 Thread Kohei KaiGai

Hello, It is an proposition to add new logical types for the Apache Arrow data format. As Melik-Adamyan said, it is quite easy to convert 5-bytes FixedSizeBinary to PostgreSQL's inet data type by the Arrow_Fdw module (an extension of PostgreSQL responsible to data conversion), however, it is not

Re: How about inet4/inet6/macaddr data types?

2019-04-29 Thread Micah Kornfield

Hi KaiGai Kohei, Can you clarify if you are looking for advice on modelling these types or proposing to add new logical types to the Arrow specification? Thanks, Micah On Monday, April 29, 2019, Kohei KaiGai wrote: > Hello folks, > > How about your opinions about network address types support i

RE: How about inet4/inet6/macaddr data types?

2019-04-29 Thread Melik-Adamyan, Areg

If you want to store it and manipulate the best format is integers (or binary) - it will allow all the fast operations of masking, subnet querying, etc. but text representation will require conversion. It highly depends on the use-case, but conversion to pgSQL's inet or cidr from integer is ver

[jira] [Created] (ARROW-5242) Arrow doesn't compile cleanly with Visual Studio 2017 Update 9 or later due to narrowing

2019-04-29 Thread Billy Robert O'Neal III (JIRA)

Billy Robert O'Neal III created ARROW-5242: -- Summary: Arrow doesn't compile cleanly with Visual Studio 2017 Update 9 or later due to narrowing Key: ARROW-5242 URL: https://issues.apache.org/jira/browse/AR

How about inet4/inet6/macaddr data types?

2019-04-29 Thread Kohei KaiGai

Hello folks, How about your opinions about network address types support in Apache Arrow data format? Network address always appears at network logs massively generated by any network facilities, and it is a significant information when people analyze their backward logs. I'm working on Apache Ar

Re: [Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

2019-04-29 Thread Wes McKinney

AFAIK no one has been employing systematic IP scanning tools; generally when there is code reuse in a pull request it is fairly obvious. It would be interesting to know how large, mature Apache projects (Apache Hadoop, Apache Spark, etc.) have approached this problem. On Mon, Apr 29, 2019 at 5:13

RE: [Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

2019-04-29 Thread Melik-Adamyan, Areg

HI Wes, thanks for the reply. How do the committers and PMC check the IP currently? Is there any standard tool for it that you use? > -Original Message- > From: Wes McKinney [mailto:wesmck...@gmail.com] > Sent: Monday, April 29, 2019 4:39 PM > To: dev@arrow.apache.org > Subject: Re: [Cont

[jira] [Created] (ARROW-5241) [Python] Add option to disable writing statistics

2019-04-29 Thread Deepak Majeti (JIRA)

Deepak Majeti created ARROW-5241: Summary: [Python] Add option to disable writing statistics Key: ARROW-5241 URL: https://issues.apache.org/jira/browse/ARROW-5241 Project: Apache Arrow Issue

Re: [Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

2019-04-29 Thread Wes McKinney

hi Areg, I think this is a question for ASF Legal and not Apache Arrow directly. Some contributors submit a ICLA or CCLA to the project, but broadly it is the responsibility of the Committers and PMC members to steward IP in the project, and one of the parts of the release process is to verify tha

[jira] [Created] (ARROW-5240) [C++][CI] cmake_format 0.5.0 appears to fail the build

2019-04-29 Thread Micah Kornfield (JIRA)

Micah Kornfield created ARROW-5240: -- Summary: [C++][CI] cmake_format 0.5.0 appears to fail the build Key: ARROW-5240 URL: https://issues.apache.org/jira/browse/ARROW-5240 Project: Apache Arrow

[Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

2019-04-29 Thread Melik-Adamyan, Areg

To avoid contamination of the Arrow code with wrong licensed code, which can be accidentally included into arrow, including GPL code, and track the contributions maintainers needs to check actually whether committer has signed the ICLA or CCLA, and listed in the contributors file - which we do n

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

2019-04-29 Thread Wes McKinney

On Mon, Apr 29, 2019 at 2:59 PM Micah Kornfield wrote: > > > > > > * The _actual_ dictionary values for a particular Array must be stored > > > somewhere and lifetime managed. I propose to put these as a single > > > entry in ArrayData::child_data [4]. An alternative to this would be to > > > modi

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

2019-04-29 Thread Micah Kornfield

> > > * The _actual_ dictionary values for a particular Array must be stored > > somewhere and lifetime managed. I propose to put these as a single > > entry in ArrayData::child_data [4]. An alternative to this would be to > > modify ArrayData to have a dictionary field that would be unused > > exc

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

2019-04-29 Thread Antoine Pitrou

Hi Wes, Le 29/04/2019 à 20:10, Wes McKinney a écrit : > > * Receiving a record batch schema without the dictionaries attached > (e.g. in Arrow Flight), see also experimental patch [2] Note that this was finally done in a separate PR, and only required changes in the IPC implementation. > Here

[jira] [Created] (ARROW-5239) Add support for interval types in javascript

2019-04-29 Thread Micah Kornfield (JIRA)

Micah Kornfield created ARROW-5239: -- Summary: Add support for interval types in javascript Key: ARROW-5239 URL: https://issues.apache.org/jira/browse/ARROW-5239 Project: Apache Arrow Issue T

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-04-29 Thread Wes McKinney

I'm also curious which APIs are particularly problematic for performance. In ARROW-1833 [1] and some related discussions there was the suggestion of adding methods like getUnsafe, so this would be like get(i) [2] but without checking the validity bitmap [1] : https://issues.apache.org/jira/browse/

[DISCUSS][C++] Static versus variable Arrow dictionary encoding

2019-04-29 Thread Wes McKinney

hi all, There have been many discussions in passing on various issues and JIRA tickets over the last months and years about how to manage dictionary-encoded columnar arrays in-memory in C++. Here's a list of some problems we have encountered: * Dictionaries that may differ from one record batch t

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-04-29 Thread Micah Kornfield

Thanks for the design. Personally, I'm not a huge fan of creating a parallel classes for every vector type, this ends up being confusing for developers and adds a lot of boiler plate. I wonder if you could use a similar approach that the memory module uses for turning bounds checking on/off [1].

[jira] [Created] (ARROW-5238) [Python] Improve usability of pyarrow.dictionary function

2019-04-29 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-5238: --- Summary: [Python] Improve usability of pyarrow.dictionary function Key: ARROW-5238 URL: https://issues.apache.org/jira/browse/ARROW-5238 Project: Apache Arrow

Re: [DISCUSS] C++ Filesystem abstraction

2019-04-29 Thread Wes McKinney

hi Antoine, Thank you for starting this discussion. I left some comments on the PR. I had been looking previously at TensorFlow's file system APIs ([1], and various implementations) for some possible guidance around this, though since Arrow is intended as development platform / reusable set of li

[DISCUSS] C++ Filesystem abstraction

2019-04-29 Thread Antoine Pitrou

Hello, For the datasets project (*), one requirement is for Arrow to grow a filesystem abstraction. The aim is to access various kinds of storage systems (local filesystem, S3, HadoopFS...) with a single API. Hopefully, the API can be made good enough to avoid inefficiencies. I've pushed a dra

[jira] [Created] (ARROW-5237) [Python] pandas_version key in pandas metadata no longer populated

2019-04-29 Thread Joris Van den Bossche (JIRA)

Joris Van den Bossche created ARROW-5237: Summary: [Python] pandas_version key in pandas metadata no longer populated Key: ARROW-5237 URL: https://issues.apache.org/jira/browse/ARROW-5237 Proj

[jira] [Created] (ARROW-5236) hdfs.connect() is trying to load libjvm in windows

2019-04-29 Thread Kamaraju (JIRA)

Kamaraju created ARROW-5236: --- Summary: hdfs.connect() is trying to load libjvm in windows Key: ARROW-5236 URL: https://issues.apache.org/jira/browse/ARROW-5236 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5235) [C++] RAPIDJSON_INCLUDE_DIR not set (Windows + Anaconda)

2019-04-29 Thread Antoine Pitrou (JIRA)

Antoine Pitrou created ARROW-5235: - Summary: [C++] RAPIDJSON_INCLUDE_DIR not set (Windows + Anaconda) Key: ARROW-5235 URL: https://issues.apache.org/jira/browse/ARROW-5235 Project: Apache Arrow

[jira] [Created] (ARROW-5234) [Rust] [DataFusion] Create Python bindings for DataFusion

2019-04-29 Thread Andy Grove (JIRA)

Andy Grove created ARROW-5234: - Summary: [Rust] [DataFusion] Create Python bindings for DataFusion Key: ARROW-5234 URL: https://issues.apache.org/jira/browse/ARROW-5234 Project: Apache Arrow Issu

[jira] [Created] (ARROW-5233) [Go] migrate to new flatbuffers-v0.11.0

2019-04-29 Thread Sebastien Binet (JIRA)

Sebastien Binet created ARROW-5233: -- Summary: [Go] migrate to new flatbuffers-v0.11.0 Key: ARROW-5233 URL: https://issues.apache.org/jira/browse/ARROW-5233 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-5232) [Java] value vector size increases rapidly in case of clear/setSafe loop

2019-04-29 Thread Pindikura Ravindra (JIRA)

Pindikura Ravindra created ARROW-5232: - Summary: [Java] value vector size increases rapidly in case of clear/setSafe loop Key: ARROW-5232 URL: https://issues.apache.org/jira/browse/ARROW-5232 Proj

Re: Use arrow as a general data serialization framework in distributed stream data processing

Re: How about inet4/inet6/macaddr data types?

Re: How about inet4/inet6/macaddr data types?

RE: How about inet4/inet6/macaddr data types?

[jira] [Created] (ARROW-5242) Arrow doesn't compile cleanly with Visual Studio 2017 Update 9 or later due to narrowing

How about inet4/inet6/macaddr data types?

Re: [Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

RE: [Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

[jira] [Created] (ARROW-5241) [Python] Add option to disable writing statistics

Re: [Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

[jira] [Created] (ARROW-5240) [C++][CI] cmake_format 0.5.0 appears to fail the build

[Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

Re: [DISCUSS][C++] Static versus variable Arrow dictionary encoding

[jira] [Created] (ARROW-5239) Add support for interval types in javascript

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

[DISCUSS][C++] Static versus variable Arrow dictionary encoding

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

[jira] [Created] (ARROW-5238) [Python] Improve usability of pyarrow.dictionary function

Re: [DISCUSS] C++ Filesystem abstraction

[DISCUSS] C++ Filesystem abstraction

[jira] [Created] (ARROW-5237) [Python] pandas_version key in pandas metadata no longer populated

[jira] [Created] (ARROW-5236) hdfs.connect() is trying to load libjvm in windows

[jira] [Created] (ARROW-5235) [C++] RAPIDJSON_INCLUDE_DIR not set (Windows + Anaconda)

[jira] [Created] (ARROW-5234) [Rust] [DataFusion] Create Python bindings for DataFusion

[jira] [Created] (ARROW-5233) [Go] migrate to new flatbuffers-v0.11.0

[jira] [Created] (ARROW-5232) [Java] value vector size increases rapidly in case of clear/setSafe loop

28 matches

Site Navigation

Mail list logo

Footer information