Re: [DISCUSS][C++][Python]Switch default mmap behaviour to off

2022-05-11 Thread Alessandro Molina
As far as I understood, the idea is not to fully remove memory mapping, just turn the current mmap=True default arguments to mmap=False The goal is mostly to provide consistent behaviour for end users. At the moment users might face very different performances when they read locally or on a networ

Re: [DISCUSS][C++][Python]Switch default mmap behaviour to off

2022-05-11 Thread Antoine Pitrou
Le 11/05/2022 à 10:19, Alessandro Molina a écrit : As far as I understood, the idea is not to fully remove memory mapping, just turn the current mmap=True default arguments to mmap=False The goal is mostly to provide consistent behaviour for end users. At the moment users might face very diff

Datafusion's Java binding is available in Maven Central

2022-05-11 Thread Jiayu Liu
Hi dev@arrow, Recently I've created and published a Java binding[1] to datafusion[2], as part of datafusion-contrib projects[3]. I've updated the README.md[4] so people can pick it up via maven[5] or gradle. Any feedback or contributions are welcome! [1]: https://github.com/datafusion-contrib/da

Re: Datafusion's Java binding is available in Maven Central

2022-05-11 Thread Antoine Pitrou
Hi! Can you elaborate how the binding transfers data between Datafusion and Java Arrow? If I'm reading the code correctly, it seems to be writing an IPC stream? Le 11/05/2022 à 11:20, Jiayu Liu a écrit : Hi dev@arrow, Recently I've created and published a Java binding[1] to datafusion[2

Re: [DISC][Release] More control on Release Candidates commits

2022-05-11 Thread Krisztián Szűcs
On Wed, May 11, 2022 at 6:01 AM Sutou Kouhei wrote: > > Hi, > > In > "Re: [DISC][Release] More control on Release Candidates commits" on Tue, 10 > May 2022 13:27:09 +0200, > Raul Cumplido wrote: > > > I still think there is some value in standardising the "feature freeze" on > > new release

Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-11 Thread Ian Cook
Hi all, Our biweekly sync call is today at 12:00 noon Eastern time. The Zoom meeting URL for this and other biweekly Arrow sync calls is: https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 Alternatively, enter this information into the Zoom website or app to join the call: Meetin

[Rust] Proposal to move Ballista to a top-level arrow-ballista repository

2022-05-11 Thread Andy Grove
I would like to propose that we move the Ballista project to a new top-level *arrow-ballista* repository. The rationale for this (copied from the GitHub issue [1]) is: - Decouple release process for DataFusion and Ballista - Allow each project to have top-level documentation and user guides

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-11 Thread Li Jin
@Vibhatha > Are these computations computationally intensive? To quantify it, in general > do these workloads occupy majority of the time compared to the overall > dataflow problem's execution time? It varies a lot and depending on what the user is doing and can vary anywhere between 5% (e.g., a

Re: [C++] Code style and lint question

2022-05-11 Thread Li Jin
Thanks Weston. This resolved issue 1 for me. As for issue 2, I am now running "ninja format lint clang-tidy lint_cpp_cli" and it seems to still take a while (over 30min now), and the console shows " [2/4] cd /home/icexelloss/workspace/arrow/cpp/build && /usr/bin/python3.10 /home/icexelloss/worksp

Re: [C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-05-11 Thread Wes McKinney
I talked about these problems with my colleague Michal Nowakiewicz who has been developing some of the C++ engine implementation over the last year and a half, and he wrote up this document with some ideas about task scheduling and control flow in the query engine for everyone to look at and commen

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-11 Thread Ian Cook
Attendees: Joris Van den Bossche Ian Cook Nic Crane Raul Cumplido Ian Joiner David Li Rok Mihevc Dragoș Moldovan-Grünfeld Aldrin Montana Weston Pace Eduardo Ponce Matthew Topol Jacob Wujciak Discussion: Eduardo: Draft PR with a guide showing how to create a new Arrow C++ compute kernel [1] - R