I didn't get a chance yet to really read this thread in detail but I am
definitely very interested in this conversation and will make time this
week to add my thoughts.
Thanks,
Andy.
On Sun, Sep 27, 2020, 4:01 PM Adam Lippai wrote:
> Hi Neville,
>
> yes, my concerns against common row based DB
Hi Neville,
yes, my concerns against common row based DB APIs is that I use
Arrow/Parquet for OLAP too.
What https://turbodbc.readthedocs.io/en/latest/ (python) or
https://github.com/pacman82/odbc-api#state (rust) does is that they read
large blocks of data instead of processing rows one-by-one, b
Thanks for the feedback
My interest is mainly in the narrow usecase of reading and writing batch
data,
so I wouldn't want to deal with producing and consuming rows per se.
Andy has worked on RDBC (https://github.com/tokio-rs/rdbc) for the
row-based or OLTP case,
and I'm considering something more
Hi
Updating the thread for people with a similar use case. A new project called
[duckdb](https://github.com/cwida/duckdb) allows usage of Arrow memory mapped
files as virtual tables, so a lot of pandas functionality can be covered using
their sql equivalents. Duckdb works equally well with chun
I'm working on a fix for the conda failures in
https://github.com/apache/arrow/pull/8282
On Sun, Sep 27, 2020, at 12:20 PM, Crossbow wrote:
>
> Arrow Build Report for Job nightly-2020-09-27-0
>
> All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-27-0
>
> Fa
One more universal approach is to use ODBC, this is a recent Rust
conversation (with example) on the topic:
https://github.com/Koka/odbc-rs/issues/140
Honestly I find the Python DB API too simple, all it provides is a
row-by-row API. I miss four things:
- Batched or bulk processing both for da
Hi Weston -- this is a really interesting analysis.
1. I have been under the assumption that the current libraries work
poorly on high latency file systems, and your analysis provides the
proof, so thank you.
2. This shows that we have a lot of work to do to retool many of our
IO libraries (Parqu
That would be awesome! I agree with this, and would be really useful, as it
would leverage all the goodies that RDMS have wrt to transitions, etc.
I would probably go for having database-specifics outside of the arrow
project, so that they can be used by other folks beyond arrow, and keep the
arro
hi Neville,
In Python we have something called the DB API 2.0 (PEP 249) that
defines an API standard for SQL databases in Python, including an
expectation around the data format of result sets. It sounds like you
need to create the equivalent of that in Rust with Arrow as the API /
format returned
To be clear, if someone wants to step up as the Plasma maintainer in
Apache Arrow, that's completely fine -- that would be a good outcome.
Many of us had already been concerned for a while about Plasma's
maintenance status -- lots of stale PRs and low engagement on JIRA
issues and mailing list disc
We to rely heavily on Plasma (we use Ray as well, but also Plasma independent
of Ray). I’ve started a thread on ray dev list to see if Rays plasma can be
used standalone outside of ray as well. That would allow us who use Plasma to
move to a standalone “ray plasma” when/if it’s removed from Arro
Arrow Build Report for Job nightly-2020-09-27-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-27-0
Failed Tasks:
- conda-linux-gcc-py36-aarch64:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-27-0-drone-conda-linux-gcc-py3
12 matches
Mail list logo