Here is an example: https://gist.github.com/rdettai/950b1ed3e8e2f0fc416a6e8f3659b7e6
Rusoto is kind of annoying because it's forcing you to use async...This is not the solution I ended up with because I'm calling this from DataFusion and *sync *was not playing very well with *async*. But it gives you a pretty good idea! Le jeu. 12 nov. 2020 à 12:03, vertexclique vertexclique < vertexcli...@gmail.com> a écrit : > Hi Remi; > > I see. I am unsure how much things need a change at our side since I > haven't estimated the adaptation/refactoring needed for it as of yet. > If it is possible can you share the S3 implementation that you've > worked on? It will guide us to do the estimate and if possible we want to > adopt the approach. > After our talk and the use case that you've already implemented, I am > pretty much convinced of the way that you've implemented. Just I need to > understand the surface impact for the team. > > Best, > Mahmut > > Rémi Dettai <rdet...@gmail.com>, 11 Kas 2020 Çar, 19:06 tarihinde şunu > yazdı: > > > Hi Mahmut, > > > > The way of implementing sources for Parquet has changed. The new way is > to > > implement the ChunkReader trait. This is simpler (less methods to > > implement) and more efficient (you have more information about the > upcoming > > bytes that will be read). The ParquetReader has been made private as it > is > > mostly relevant in combination with FileSource which is private ( > > https://github.com/apache/arrow/pull/8300#issuecomment-707712589). I > guess > > we could have even removed it and made FileSource specific to File. > > > > Are you sure making just the ParquetReader public would be sufficient to > > make your current code compatible? SerializedFileReader does not work > with > > that trait any more, so I doubt this would solve your problem. You would > > also need to expose FileSource, and then implement ChunkReader for that > > (similar to the implem of ChunkReader for File), or make the implem of > > ChunkReader for File generic on the ParquetReader trait instead (impl <T: > > ParquetReader> ChunkReader for T)? I find this brings in quite a bit of > > complexity! Is there a usecase where you are not reading from the file > > system and you really benefit from going through FileSource? > > > > Before opening this PR, can you quickly look at how complex it would be > to > > change your custom sources to implement ChunkReader? I think it might be > a > > lot easier than you think ! :-) > > > > Remi > > > > Le mer. 11 nov. 2020 à 14:14, vertexclique vertexclique < > > vertexcli...@gmail.com> a écrit : > > > > > Hi All; > > > > > > I have implemented different data sources before for the > > > ParquetReader(privately) but with the latest changes (esp. > > > > > > > > > https://github.com/apache/arrow/pull/8300/files#diff-0b220b2d327afc583fd75b2d3c52901e628026a11cfa694ffc252ffd45fb6db0L20 > > > ( > > > There is an orphanage of the ParquetReader trait. Is this intentional > or > > it > > > is temporary? Since that prevents layering traits on top of it for > > > implementing different data source perspective. > > > > > > This change kind of blocks us at Signavio to move to the latest Arrow > > > nightly. It would be nice to resolve this together so we can adapt the > > > parquet and arrow. > > > > > > Best, > > > Mahmut Bulut (vertexclique) > > > > > >