Wes McKinney created ARROW-2628:
---
Summary: [Python] parquet.write_to_dataset is memory-hungry on
large DataFrames
Key: ARROW-2628
URL: https://issues.apache.org/jira/browse/ARROW-2628
Project: Apache Ar
Wes McKinney created ARROW-2627:
---
Summary: [Python] Add option (or some equivalent) to toggle memory
mapping functionality when using parquet.ParquetFile or other read entry points
Key: ARROW-2627
URL: https://issue
Louis Potok created ARROW-2626:
--
Summary: pandas ArrowInvalid message should include failing column
name
Key: ARROW-2626
URL: https://issues.apache.org/jira/browse/ARROW-2626
Project: Apache Arrow
Wes McKinney created ARROW-2625:
---
Summary: [Python] Serialize timedelta64 values from pandas to
Arrow interval types
Key: ARROW-2625
URL: https://issues.apache.org/jira/browse/ARROW-2625
Project: Apache
Wes McKinney created ARROW-2624:
---
Summary: [Python] Random schema and data generator for Arrow
conversion and Parquet testing
Key: ARROW-2624
URL: https://issues.apache.org/jira/browse/ARROW-2624
Projec
Wes McKinney created ARROW-2623:
---
Summary: [Doc] Add example of List with nested child type in
format specification documents
Key: ARROW-2623
URL: https://issues.apache.org/jira/browse/ARROW-2623
Projec
Sorry, I realized I was a bit inarticulate in my reply. I meant the
data page HEADERS (the metadata). The actual encoded structure of the
data pages should be the same in V2 files. But if the Thrift header is
say 16 bytes in V1, it's at least 32 bytes in V2
On Mon, May 21, 2018 at 7:10 PM, Wes McK
hi Feras,
Given the very high compression ratio with your data, it's completely
possible that the difference in size is coming from the larger V2 data
pages. Compare DataPageHeader with DataPageHeaderV2 in parquet.thrift
https://github.com/apache/parquet-cpp/blob/master/src/parquet/parquet.thrift#
Among other things, the columnar format specification files should
probably make their way into this new documentation project.
On Mon, May 21, 2018 at 5:19 PM, Wes McKinney wrote:
> I don't think we should attempt to create a documentation "super
> project" that includes the generated API refere
I don't think we should attempt to create a documentation "super
project" that includes the generated API reference for all the
libraries in Apache Arrow. I do think that creating a documentation
"hub" project (with the low-level API docs being the "spokes") is a
good idea. Currently, the Jekyll pr
hi Josh,
Yes, the standard process for importing externally-developed code is
the Incubator IP clearance: http://incubator.apache.org/ip-clearance/.
As an example, we recently received a Go codebase donation from
InfluxData where there was a combination of ICLAs from the
contributors and a softwar
Hi Wes,
I'm sure we're going to run into this with libgdf/pygdf as well. Is there a
systematic way we could do a transfer of IP?
On 5/20/18, 7:05 PM, "Wes McKinney" wrote:
hi Paul,
This is a great discussion to get started. I will review the patch in
some more detail and send
Thomas Buhrmann created ARROW-2622:
--
Summary: [C++] Array methods IsNull and IsValid are not
complementary
Key: ARROW-2622
URL: https://issues.apache.org/jira/browse/ARROW-2622
Project: Apache Arrow
Uwe L. Korn created ARROW-2621:
--
Summary: [Python/CI] Use pep8speaks for
Key: ARROW-2621
URL: https://issues.apache.org/jira/browse/ARROW-2621
Project: Apache Arrow
Issue Type: Task
14 matches
Mail list logo