Re: [Rust] Arrow SQL Adapters/Connectors

2020-09-27 Thread Andy Grove
I didn't get a chance yet to really read this thread in detail but I am definitely very interested in this conversation and will make time this week to add my thoughts. Thanks, Andy. On Sun, Sep 27, 2020, 4:01 PM Adam Lippai wrote: > Hi Neville, > > yes, my concerns against common row based DB

Re: [Rust] Arrow SQL Adapters/Connectors

2020-09-27 Thread Adam Lippai
Hi Neville, yes, my concerns against common row based DB APIs is that I use Arrow/Parquet for OLAP too. What https://turbodbc.readthedocs.io/en/latest/ (python) or https://github.com/pacman82/odbc-api#state (rust) does is that they read large blocks of data instead of processing rows one-by-one, b

Re: [Rust] Arrow SQL Adapters/Connectors

2020-09-27 Thread Neville Dipale
Thanks for the feedback My interest is mainly in the narrow usecase of reading and writing batch data, so I wouldn't want to deal with producing and consuming rows per se. Andy has worked on RDBC (https://github.com/tokio-rs/rdbc) for the row-based or OLTP case, and I'm considering something more

Re: [Python/C-Glib] writing IPC file format column-by-column

2020-09-27 Thread Ishan Anand
Hi Updating the thread for people with a similar use case. A new project called [duckdb](https://github.com/cwida/duckdb) allows usage of Arrow memory mapped files as virtual tables, so a lot of pandas functionality can be covered using their sql equivalents. Duckdb works equally well with chun

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-09-27-0

2020-09-27 Thread Uwe L. Korn
I'm working on a fix for the conda failures in https://github.com/apache/arrow/pull/8282 On Sun, Sep 27, 2020, at 12:20 PM, Crossbow wrote: > > Arrow Build Report for Job nightly-2020-09-27-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-27-0 > > Fa

Re: [Rust] Arrow SQL Adapters/Connectors

2020-09-27 Thread Adam Lippai
One more universal approach is to use ODBC, this is a recent Rust conversation (with example) on the topic: https://github.com/Koka/odbc-rs/issues/140 Honestly I find the Python DB API too simple, all it provides is a row-by-row API. I miss four things: - Batched or bulk processing both for da

Re: [DISCUSS] Rethinking our approach to scheduling CPU and IO work in C++?

2020-09-27 Thread Wes McKinney
Hi Weston -- this is a really interesting analysis. 1. I have been under the assumption that the current libraries work poorly on high latency file systems, and your analysis provides the proof, so thank you. 2. This shows that we have a lot of work to do to retool many of our IO libraries (Parqu

Re: [Rust] Arrow SQL Adapters/Connectors

2020-09-27 Thread Jorge Cardoso Leitão
That would be awesome! I agree with this, and would be really useful, as it would leverage all the goodies that RDMS have wrt to transitions, etc. I would probably go for having database-specifics outside of the arrow project, so that they can be used by other folks beyond arrow, and keep the arro

Re: [Rust] Arrow SQL Adapters/Connectors

2020-09-27 Thread Wes McKinney
hi Neville, In Python we have something called the DB API 2.0 (PEP 249) that defines an API standard for SQL databases in Python, including an expectation around the data format of result sets. It sounds like you need to create the equivalent of that in Rust with Arrow as the API / format returned

Re: [DISCUSS] Plasma appears to have been forked, consider deprecating pyarrow.serialization

2020-09-27 Thread Wes McKinney
To be clear, if someone wants to step up as the Plasma maintainer in Apache Arrow, that's completely fine -- that would be a good outcome. Many of us had already been concerned for a while about Plasma's maintenance status -- lots of stale PRs and low engagement on JIRA issues and mailing list disc

Re: [DISCUSS] Plasma appears to have been forked, consider deprecating pyarrow.serialization

2020-09-27 Thread Niklas B
We to rely heavily on Plasma (we use Ray as well, but also Plasma independent of Ray). I’ve started a thread on ray dev list to see if Rays plasma can be used standalone outside of ray as well. That would allow us who use Plasma to move to a standalone “ray plasma” when/if it’s removed from Arro

[NIGHTLY] Arrow Build Report for Job nightly-2020-09-27-0

2020-09-27 Thread Crossbow
Arrow Build Report for Job nightly-2020-09-27-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-27-0 Failed Tasks: - conda-linux-gcc-py36-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-27-0-drone-conda-linux-gcc-py3