Hi Mahmut, The way of implementing sources for Parquet has changed. The new way is to implement the ChunkReader trait. This is simpler (less methods to implement) and more efficient (you have more information about the upcoming bytes that will be read). The ParquetReader has been made private as it is mostly relevant in combination with FileSource which is private ( https://github.com/apache/arrow/pull/8300#issuecomment-707712589). I guess we could have even removed it and made FileSource specific to File.
Are you sure making just the ParquetReader public would be sufficient to make your current code compatible? SerializedFileReader does not work with that trait any more, so I doubt this would solve your problem. You would also need to expose FileSource, and then implement ChunkReader for that (similar to the implem of ChunkReader for File), or make the implem of ChunkReader for File generic on the ParquetReader trait instead (impl <T: ParquetReader> ChunkReader for T)? I find this brings in quite a bit of complexity! Is there a usecase where you are not reading from the file system and you really benefit from going through FileSource? Before opening this PR, can you quickly look at how complex it would be to change your custom sources to implement ChunkReader? I think it might be a lot easier than you think ! :-) Remi Le mer. 11 nov. 2020 à 14:14, vertexclique vertexclique < vertexcli...@gmail.com> a écrit : > Hi All; > > I have implemented different data sources before for the > ParquetReader(privately) but with the latest changes (esp. > > https://github.com/apache/arrow/pull/8300/files#diff-0b220b2d327afc583fd75b2d3c52901e628026a11cfa694ffc252ffd45fb6db0L20 > ( > There is an orphanage of the ParquetReader trait. Is this intentional or it > is temporary? Since that prevents layering traits on top of it for > implementing different data source perspective. > > This change kind of blocks us at Signavio to move to the latest Arrow > nightly. It would be nice to resolve this together so we can adapt the > parquet and arrow. > > Best, > Mahmut Bulut (vertexclique) >