Here is what I had to do to our code in IOx to adapt it to use the new
Parquet interfaces -- perhaps it will be helpful to you too
https://github.com/influxdata/influxdb_iox/pull/395

On Thu, Nov 12, 2020 at 8:46 AM Rémi Dettai <rdet...@gmail.com> wrote:

> Here is an example:
> https://gist.github.com/rdettai/950b1ed3e8e2f0fc416a6e8f3659b7e6
>
> Rusoto is kind of annoying because it's forcing you to use async...This is
> not the solution I ended up with because I'm calling this from DataFusion
> and *sync *was not playing very well with *async*. But it gives you a
> pretty good idea!
>
> Le jeu. 12 nov. 2020 à 12:03, vertexclique vertexclique <
> vertexcli...@gmail.com> a écrit :
>
> > Hi Remi;
> >
> > I see. I am unsure how much things need a change at our side since I
> > haven't estimated the adaptation/refactoring needed for it as of yet.
> > If it is possible can you share the S3 implementation that you've
> > worked on? It will guide us to do the estimate and if possible we want to
> > adopt the approach.
> > After our talk and the use case that you've already implemented, I am
> > pretty much convinced of the way that you've implemented. Just I need to
> > understand the surface impact for the team.
> >
> > Best,
> > Mahmut
> >
> > Rémi Dettai <rdet...@gmail.com>, 11 Kas 2020 Çar, 19:06 tarihinde şunu
> > yazdı:
> >
> > > Hi Mahmut,
> > >
> > > The way of implementing sources for Parquet has changed. The new way is
> > to
> > > implement the ChunkReader trait. This is simpler (less methods to
> > > implement) and more efficient (you have more information about the
> > upcoming
> > > bytes that will be read). The ParquetReader has been made private as it
> > is
> > > mostly relevant in combination with FileSource which is private (
> > > https://github.com/apache/arrow/pull/8300#issuecomment-707712589). I
> > guess
> > > we could have even removed it and made FileSource specific to File.
> > >
> > > Are you sure making just the ParquetReader public would be sufficient
> to
> > > make your current code compatible? SerializedFileReader does not work
> > with
> > > that trait any more, so I doubt this would solve your problem. You
> would
> > > also need to expose FileSource, and then implement ChunkReader for that
> > > (similar to the implem of ChunkReader for File), or make the implem of
> > > ChunkReader for File generic on the ParquetReader trait instead (impl
> <T:
> > > ParquetReader> ChunkReader for T)?  I find this brings in quite a bit
> of
> > > complexity! Is there a usecase where you are not reading from the file
> > > system and you really benefit from going through FileSource?
> > >
> > > Before opening this PR, can you quickly look at how complex it would be
> > to
> > > change your custom sources to implement ChunkReader? I think it might
> be
> > a
> > > lot easier than you think ! :-)
> > >
> > > Remi
> > >
> > > Le mer. 11 nov. 2020 à 14:14, vertexclique vertexclique <
> > > vertexcli...@gmail.com> a écrit :
> > >
> > > > Hi All;
> > > >
> > > > I have implemented different data sources before for the
> > > > ParquetReader(privately) but with the latest changes (esp.
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/pull/8300/files#diff-0b220b2d327afc583fd75b2d3c52901e628026a11cfa694ffc252ffd45fb6db0L20
> > > > (
> > > > There is an orphanage of the ParquetReader trait. Is this intentional
> > or
> > > it
> > > > is temporary? Since that prevents layering traits on top of it for
> > > > implementing different data source perspective.
> > > >
> > > > This change kind of blocks us at Signavio to move to the latest Arrow
> > > > nightly. It would be nice to resolve this together so we can adapt
> the
> > > > parquet and arrow.
> > > >
> > > > Best,
> > > > Mahmut Bulut (vertexclique)
> > > >
> > >
> >
>

Reply via email to