> > > DataFusion.
> > > > It will be fantastic to have an opportunity to communicate with
> community
> > > > members "face to face".
> > > >
> > > > Best,
> > > > Yijie
> > > >
> >
I am also very interested in re-instoring these events, at least
occasionally.
I do think that sharing some higher level goals and ideas in more *informal
*discussions could help us understand each other better in our asynchronous
work (design documents, issues, PRs).
I also agree that no decisio
Thanks Andrew for bringing this PR forward. I would just like to give the
big picture that led to this modification.
We would like to make Datafusion more efficiently integratable with table
formats. I have recently written a design document in that sense [1] that
goes through the various ways we
d fit
> your architecture nicely and I think shouldn't be too hard to create the
> query from the filters/projection in the datasource scan method to spend
> less time in Lambda.
>
> On Wed, Feb 10, 2021, 18:44 Rémi Dettai wrote:
>
> > Thanks for the notes Andy. Here is
Thanks for the notes Andy. Here is the slide deck I presented, for further
reference:
https://docs.google.com/presentation/d/1uZ5PbazC1zCX24k0Hh-UItddIh9BRvD5GL7NUDgc9eQ/edit?usp=sharing
If anyone wants to see how it works in practice and does not have an AWS
account to try it out, feel free to re
Hi Andrew!
The book "How query engines work" (
https://leanpub.com/how-query-engines-work) that Andy wrote is pretty
great! It documents query engine APIs in Kotlin and not Rust, as it was
written during earlier Ballista experimentations, but almost all items
still apply to DataFusion (feel free t
mplementation of Arrow.
> >
> > On Tue, Jan 26, 2021 at 10:18 AM Rémi Dettai wrote:
> >
> > > Hi all,
> > >
> > > I have been following this community for nearly a year now, trying to
> > > contribute whenever I could. It was really a great experi
thanks Andy!!
Le mer. 27 janv. 2021 à 18:41, Andy Grove a écrit :
> Attendees
>
>-
>
>Mahmut Bulut
>-
>
>Remi Dettai
>-
>
>Andy Grove
>-
>
>Fernando Herrera
>-
>
>Jorn Horstmann
>-
>
>Andrew Lamb
>-
>
>Jorge Leitao
>-
>
>Mike Seddon
Hi all,
I have been following this community for nearly a year now, trying to
contribute whenever I could. It was really a great experience and I sure
learned a lot.
Today, it's my time to give back to the community with the open sourcing of
the project I have started to develop a few months ago.
Great topics Andrew, to my knowledge nothing has been decided on these
topics.
We also agreed last time that it would be nice to go round the table so
that each of us has an opportunity to present briefly its use case for the
Rust Arrow implementation.
Remi
Le dim. 24 janv. 2021 à 13:16, Andrew
Hi Andy! I am stuck in the waiting room!
Le mer. 13 janv. 2021 à 17:58, Andy Grove a écrit :
> The first of these calls will be starting shortly. I will try and remember
> to send reminders in advance for future calls.
>
> On Mon, Jan 11, 2021 at 4:40 PM Andy Grove wrote:
>
> > As discussed at
Congratulations to all contributors !
Rémi
Le sam. 9 janv. 2021 à 19:48, David Li a écrit :
> Congrats to all involved, this is indeed a big milestone!
>
> Best,
> David
>
> On Sat, Jan 9, 2021, at 13:13, Chao Sun wrote:
> > Congrats! this is awesome work!
> >
> > On Sat, Jan 9, 2021 at 4:28 AM
t;> two thread pools: one for synchronous tasks and one for async tasks
> >>
> >> I am fairly sure there can be only one global Runtime (because when I
> >> tried
> >> try to explicitly create one when an existing one is present, tokio
> >> panic!'
mplementation,
> to give you a sense of the kinds of issues we are hoping to avoid in
> DataFusion with using async
>
> Andrew
>
>
> On Fri, Oct 30, 2020 at 4:28 AM Rémi Dettai wrote:
>
> > Hi everyone!
> >
> > If you are reading this, it means that you f
e way that you've implemented. Just I need to
> understand the surface impact for the team.
>
> Best,
> Mahmut
>
> Rémi Dettai , 11 Kas 2020 Çar, 19:06 tarihinde şunu
> yazdı:
>
> > Hi Mahmut,
> >
> > The way of implementing sources for Parqu
Hi Mahmut,
The way of implementing sources for Parquet has changed. The new way is to
implement the ChunkReader trait. This is simpler (less methods to
implement) and more efficient (you have more information about the upcoming
bytes that will be read). The ParquetReader has been made private as i
Hi Jason!
I guess this question would better echo on the Parquet mailing list
https://parquet.apache.org/community/
Very interesting remark though. I looked into it and didn't find any
obvious explanation. The entire size of the file is taken up by the "data"
column as storing df[['data']] yields
Hi everyone!
If you are reading this, it means that you felt in the trap of my catchy
(but meaningless) title!
This discussion somewhat relates to [1].
DataFusion has recently made its top level "actions" (collect, write...)
async. The problem is that most of the codebase is not async (in partic
b.com/apache/arrow/blob/6c721c579f7d279aa006bfff9b701f8a2a6fe50d/cpp/src/arrow/array/builder_binary.h#L253
>
> On Tue, Jun 16, 2020 at 8:07 AM Rémi Dettai wrote:
>
> > Hi Antoine and all !
> >
> > Sorry for the delay, I wanted to understand things a bit better before
>
ion or not.
Hope this all makes sense. Took me a while to understand how the decoding
works ;-)
Remi
Le ven. 5 juin 2020 à 17:20, Antoine Pitrou a écrit :
>
> Le 05/06/2020 à 17:09, Rémi Dettai a écrit :
> > I looked into the details of why the decoder could not estimate the
> target
25, Uwe L. Korn a écrit :
>
> On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote:
> > Hi Antoine !
> > > I would indeed have expected jemalloc to do that (remap the pages)
> > I have no idea about the performance gain this would provide (if any).
> > Could be interest
e Pitrou a écrit :
>
> Le 05/06/2020 à 14:25, Rémi Dettai a écrit :
> > Hi Uwe!
> >
> >> As your suggestions don't seem to be specific to Arrow, why not
> > contribute them directly to jemalloc? They are much better in reviewing
> > allocator code than w
.
>
> Still, when we read a column, we should be able to determine its final
> size from the Parquet metadata. Maybe we're passing an information there
> not along?
>
> Best,
> Uwe
>
> On Thu, Jun 4, 2020, at 5:48 PM, Rémi Dettai wrote:
> > When creating large arr
r allocations ?
Le jeu. 4 juin 2020 à 17:58, Antoine Pitrou a écrit :
> On Thu, 4 Jun 2020 17:48:16 +0200
> Rémi Dettai wrote:
> > When creating large arrays, Arrow uses realloc quite intensively.
> >
> > I have an example where y read a gzipped parquet column (strings) tha
When creating large arrays, Arrow uses realloc quite intensively.
I have an example where y read a gzipped parquet column (strings) that
expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc
cannot anticipate this and every reallocate call above 1MB (the most
critical ones) ends up
It makes sense to me that the default behaviour of such a low level api as
kernel does not do any automagic promotion, but shouldn't this kind of
promotion still be requestable by the so called "system developer" user ?
Otherwise he would need to materialize a promoted version of each original
arra
t
> Projjal Chanda
> Rémi Dettai
> Laurent Goujon
> Andy Grove
> Uwe Korn
> Micah Kornfield
> Wes McKinney
> Rok Mihevc
> Neal Richardson
> François Saint-Jacques
>
> Discussion:
> * patch queue is growing, please review things
> * 1.0
> * Timeline: ta
Hi!
Does your point 1 also apply to the AWS SDK dependency ? Currently it seems
that it cannot be built in BUNDLED mode. As stated in
https://issues.apache.org/jira/browse/ARROW-8565 I struggled a lot to make
a static build with the S3 dependency activated ! I would really like to
help on this bec
elieved that they were being hindered
> by being a part of monorepo, we could create a new repository under
> apache/ on GitHub for the part that wants to split into a standalone
> GitHub repository. That wouldn't change the governance of that code.
>
> - Wes
>
> On Tue, Ap
This is a follow up on https://issues.apache.org/jira/browse/ARROW-8451.
First thanks for your answer!
It's true that I was also surprised to see all implementations of Arrow
mixed up in a single repository!
I was really considering the separation of the repositories as a mean to
separate concer
30 matches
Mail list logo