Here is what I had to do to our code in IOx to adapt it to use the new Parquet interfaces -- perhaps it will be helpful to you too https://github.com/influxdata/influxdb_iox/pull/395
On Thu, Nov 12, 2020 at 8:46 AM Rémi Dettai <rdet...@gmail.com> wrote: > Here is an example: > https://gist.github.com/rdettai/950b1ed3e8e2f0fc416a6e8f3659b7e6 > > Rusoto is kind of annoying because it's forcing you to use async...This is > not the solution I ended up with because I'm calling this from DataFusion > and *sync *was not playing very well with *async*. But it gives you a > pretty good idea! > > Le jeu. 12 nov. 2020 à 12:03, vertexclique vertexclique < > vertexcli...@gmail.com> a écrit : > > > Hi Remi; > > > > I see. I am unsure how much things need a change at our side since I > > haven't estimated the adaptation/refactoring needed for it as of yet. > > If it is possible can you share the S3 implementation that you've > > worked on? It will guide us to do the estimate and if possible we want to > > adopt the approach. > > After our talk and the use case that you've already implemented, I am > > pretty much convinced of the way that you've implemented. Just I need to > > understand the surface impact for the team. > > > > Best, > > Mahmut > > > > Rémi Dettai <rdet...@gmail.com>, 11 Kas 2020 Çar, 19:06 tarihinde şunu > > yazdı: > > > > > Hi Mahmut, > > > > > > The way of implementing sources for Parquet has changed. The new way is > > to > > > implement the ChunkReader trait. This is simpler (less methods to > > > implement) and more efficient (you have more information about the > > upcoming > > > bytes that will be read). The ParquetReader has been made private as it > > is > > > mostly relevant in combination with FileSource which is private ( > > > https://github.com/apache/arrow/pull/8300#issuecomment-707712589). I > > guess > > > we could have even removed it and made FileSource specific to File. > > > > > > Are you sure making just the ParquetReader public would be sufficient > to > > > make your current code compatible? SerializedFileReader does not work > > with > > > that trait any more, so I doubt this would solve your problem. You > would > > > also need to expose FileSource, and then implement ChunkReader for that > > > (similar to the implem of ChunkReader for File), or make the implem of > > > ChunkReader for File generic on the ParquetReader trait instead (impl > <T: > > > ParquetReader> ChunkReader for T)? I find this brings in quite a bit > of > > > complexity! Is there a usecase where you are not reading from the file > > > system and you really benefit from going through FileSource? > > > > > > Before opening this PR, can you quickly look at how complex it would be > > to > > > change your custom sources to implement ChunkReader? I think it might > be > > a > > > lot easier than you think ! :-) > > > > > > Remi > > > > > > Le mer. 11 nov. 2020 à 14:14, vertexclique vertexclique < > > > vertexcli...@gmail.com> a écrit : > > > > > > > Hi All; > > > > > > > > I have implemented different data sources before for the > > > > ParquetReader(privately) but with the latest changes (esp. > > > > > > > > > > > > > > https://github.com/apache/arrow/pull/8300/files#diff-0b220b2d327afc583fd75b2d3c52901e628026a11cfa694ffc252ffd45fb6db0L20 > > > > ( > > > > There is an orphanage of the ParquetReader trait. Is this intentional > > or > > > it > > > > is temporary? Since that prevents layering traits on top of it for > > > > implementing different data source perspective. > > > > > > > > This change kind of blocks us at Signavio to move to the latest Arrow > > > > nightly. It would be nice to resolve this together so we can adapt > the > > > > parquet and arrow. > > > > > > > > Best, > > > > Mahmut Bulut (vertexclique) > > > > > > > > > >