Re: Rust ParquetReader trait

Rémi Dettai Thu, 12 Nov 2020 05:47:25 -0800

Here is an example:
https://gist.github.com/rdettai/950b1ed3e8e2f0fc416a6e8f3659b7e6


Rusoto is kind of annoying because it's forcing you to use async...This is
not the solution I ended up with because I'm calling this from DataFusion
and *sync *was not playing very well with *async*. But it gives you a
pretty good idea!

Le jeu. 12 nov. 2020 à 12:03, vertexclique vertexclique <
vertexcli...@gmail.com> a écrit :

> Hi Remi;
>
> I see. I am unsure how much things need a change at our side since I
> haven't estimated the adaptation/refactoring needed for it as of yet.
> If it is possible can you share the S3 implementation that you've
> worked on? It will guide us to do the estimate and if possible we want to
> adopt the approach.
> After our talk and the use case that you've already implemented, I am
> pretty much convinced of the way that you've implemented. Just I need to
> understand the surface impact for the team.
>
> Best,
> Mahmut
>
> Rémi Dettai <rdet...@gmail.com>, 11 Kas 2020 Çar, 19:06 tarihinde şunu
> yazdı:
>
> > Hi Mahmut,
> >
> > The way of implementing sources for Parquet has changed. The new way is
> to
> > implement the ChunkReader trait. This is simpler (less methods to
> > implement) and more efficient (you have more information about the
> upcoming
> > bytes that will be read). The ParquetReader has been made private as it
> is
> > mostly relevant in combination with FileSource which is private (
> > https://github.com/apache/arrow/pull/8300#issuecomment-707712589). I
> guess
> > we could have even removed it and made FileSource specific to File.
> >
> > Are you sure making just the ParquetReader public would be sufficient to
> > make your current code compatible? SerializedFileReader does not work
> with
> > that trait any more, so I doubt this would solve your problem. You would
> > also need to expose FileSource, and then implement ChunkReader for that
> > (similar to the implem of ChunkReader for File), or make the implem of
> > ChunkReader for File generic on the ParquetReader trait instead (impl <T:
> > ParquetReader> ChunkReader for T)?  I find this brings in quite a bit of
> > complexity! Is there a usecase where you are not reading from the file
> > system and you really benefit from going through FileSource?
> >
> > Before opening this PR, can you quickly look at how complex it would be
> to
> > change your custom sources to implement ChunkReader? I think it might be
> a
> > lot easier than you think ! :-)
> >
> > Remi
> >
> > Le mer. 11 nov. 2020 à 14:14, vertexclique vertexclique <
> > vertexcli...@gmail.com> a écrit :
> >
> > > Hi All;
> > >
> > > I have implemented different data sources before for the
> > > ParquetReader(privately) but with the latest changes (esp.
> > >
> > >
> >
> https://github.com/apache/arrow/pull/8300/files#diff-0b220b2d327afc583fd75b2d3c52901e628026a11cfa694ffc252ffd45fb6db0L20
> > > (
> > > There is an orphanage of the ParquetReader trait. Is this intentional
> or
> > it
> > > is temporary? Since that prevents layering traits on top of it for
> > > implementing different data source perspective.
> > >
> > > This change kind of blocks us at Signavio to move to the latest Arrow
> > > nightly. It would be nice to resolve this together so we can adapt the
> > > parquet and arrow.
> > >
> > > Best,
> > > Mahmut Bulut (vertexclique)
> > >
> >
>

Re: Rust ParquetReader trait

Reply via email to