Re: Trying to build to build pyarrow for python 2.7

2018-01-10 Thread simba nyatsanga
Hi Wes, Thanks for the response. I was following the development instructions on Github here: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst I took MacOS option and installed my virtual env via conda. I must've missed an instruction when trying the 2.7 install, beca

[jira] [Created] (ARROW-1987) [Website] Enable Docker-based documentation generator to build at a specific Arrow commit

2018-01-10 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1987: --- Summary: [Website] Enable Docker-based documentation generator to build at a specific Arrow commit Key: ARROW-1987 URL: https://issues.apache.org/jira/browse/ARROW-1987

Re: Trying to figure out how to use multiprocessing with HdfsClient

2018-01-10 Thread Wes McKinney
The HadoopFileSystem appears to not be safe for multiprocessing. I'm opening a bug report here: https://issues.apache.org/jira/browse/ARROW-1986 In general, when things don't work as you expect and seems like it may be a bug, please open a JIRA. Thanks, Wes On Wed, Jan 10, 2018 at 3:42 AM, Eli

[jira] [Created] (ARROW-1986) [Python] HadoopFileSystem is not picklable and cannot currently be used with multiprocessing

2018-01-10 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1986: --- Summary: [Python] HadoopFileSystem is not picklable and cannot currently be used with multiprocessing Key: ARROW-1986 URL: https://issues.apache.org/jira/browse/ARROW-1986

Re: Trying to build to build pyarrow for python 2.7

2018-01-10 Thread Wes McKinney
hi Simba, Are you following development instructions in http://arrow.apache.org/docs/python/development.html#developing-on-linux-and-macos or something else? - Wes On Wed, Jan 10, 2018 at 11:20 AM, simba nyatsanga wrote: > Hi, > > I've created a python 2.7 virtualenv in my attempt to build the

[jira] [Created] (ARROW-1985) [JS] Schema custom_metadata is not exposed

2018-01-10 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-1985: Summary: [JS] Schema custom_metadata is not exposed Key: ARROW-1985 URL: https://issues.apache.org/jira/browse/ARROW-1985 Project: Apache Arrow Issue Type: B

[jira] [Created] (ARROW-1984) NullableDateMilliVector.getObject() should return a LocalDate, not a LocalDateTime

2018-01-10 Thread Vanco Buca (JIRA)
Vanco Buca created ARROW-1984: - Summary: NullableDateMilliVector.getObject() should return a LocalDate, not a LocalDateTime Key: ARROW-1984 URL: https://issues.apache.org/jira/browse/ARROW-1984 Project: A

[jira] [Created] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2018-01-10 Thread Jim Crist (JIRA)
Jim Crist created ARROW-1983: Summary: [Python] Add ability to write parquet `_metadata` file Key: ARROW-1983 URL: https://issues.apache.org/jira/browse/ARROW-1983 Project: Apache Arrow Issue Typ

[jira] [Created] (ARROW-1982) [Python] Return parquet statistics min/max as values instead of strings

2018-01-10 Thread Jim Crist (JIRA)
Jim Crist created ARROW-1982: Summary: [Python] Return parquet statistics min/max as values instead of strings Key: ARROW-1982 URL: https://issues.apache.org/jira/browse/ARROW-1982 Project: Apache Arrow

Re: How to get "standard" binary columns out of a pyarrow table

2018-01-10 Thread Wes McKinney
hi Eli, I am not aware of any standards for binary columns (or at least, I don't know what "regular" means in this context) -- part of the purpose of the Apache Arrow project is to define a columnar standard in the absence of any existing one. Most database systems define their own custom wire pro

Re: Next Arrow sync: 10 January at 12:00 US/Eastern

2018-01-10 Thread Li Jin
Awesome, I am in now. Thanks! On Wed, Jan 10, 2018 at 12:00 PM, Li Jin wrote: > Hey, > > Is this the correct link? https://meet.google.com/vtm-teks-phx > > I am unable to join (Waiting on some one to let me in) > > On Fri, Dec 29, 2017 at 2:56 PM, Wes McKinney wrote: > >> hi folks, >> >> We did

Re: Next Arrow sync: 10 January at 12:00 US/Eastern

2018-01-10 Thread Li Jin
Hey, Is this the correct link? https://meet.google.com/vtm-teks-phx I am unable to join (Waiting on some one to let me in) On Fri, Dec 29, 2017 at 2:56 PM, Wes McKinney wrote: > hi folks, > > We did not have a sync call this past Wednesday. The next Arrow > hangout will be on Wednesday, Januar

Trying to build to build pyarrow for python 2.7

2018-01-10 Thread simba nyatsanga
Hi, I've created a python 2.7 virtualenv in my attempt to build the pyarrow project. But I'm having trouble running one of commands as specified in the development docs on Github, specifically this command: cd arrow/python python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \ --with-p

[jira] [Created] (ARROW-1981) UnicodeEncodeError in column name

2018-01-10 Thread Shubham Chaudhary (JIRA)
Shubham Chaudhary created ARROW-1981: Summary: UnicodeEncodeError in column name Key: ARROW-1981 URL: https://issues.apache.org/jira/browse/ARROW-1981 Project: Apache Arrow Issue Type: Bu

Trying to figure out how to use multiprocessing with HdfsClient

2018-01-10 Thread Eli
Hi, I'm trying to paralellize reading columns from a parquet file, to serialize it back to standard format in the quickest way. I'm currently using pool.map(), such that a reader/binarizer toplevel accepts the filepath, column to be read and some other data as a tuple. It looks something like