Hi Mahmut,

The way of implementing sources for Parquet has changed. The new way is to
implement the ChunkReader trait. This is simpler (less methods to
implement) and more efficient (you have more information about the upcoming
bytes that will be read). The ParquetReader has been made private as it is
mostly relevant in combination with FileSource which is private (
https://github.com/apache/arrow/pull/8300#issuecomment-707712589). I guess
we could have even removed it and made FileSource specific to File.

Are you sure making just the ParquetReader public would be sufficient to
make your current code compatible? SerializedFileReader does not work with
that trait any more, so I doubt this would solve your problem. You would
also need to expose FileSource, and then implement ChunkReader for that
(similar to the implem of ChunkReader for File), or make the implem of
ChunkReader for File generic on the ParquetReader trait instead (impl <T:
ParquetReader> ChunkReader for T)?  I find this brings in quite a bit of
complexity! Is there a usecase where you are not reading from the file
system and you really benefit from going through FileSource?

Before opening this PR, can you quickly look at how complex it would be to
change your custom sources to implement ChunkReader? I think it might be a
lot easier than you think ! :-)

Remi

Le mer. 11 nov. 2020 à 14:14, vertexclique vertexclique <
vertexcli...@gmail.com> a écrit :

> Hi All;
>
> I have implemented different data sources before for the
> ParquetReader(privately) but with the latest changes (esp.
>
> https://github.com/apache/arrow/pull/8300/files#diff-0b220b2d327afc583fd75b2d3c52901e628026a11cfa694ffc252ffd45fb6db0L20
> (
> There is an orphanage of the ParquetReader trait. Is this intentional or it
> is temporary? Since that prevents layering traits on top of it for
> implementing different data source perspective.
>
> This change kind of blocks us at Signavio to move to the latest Arrow
> nightly. It would be nice to resolve this together so we can adapt the
> parquet and arrow.
>
> Best,
> Mahmut Bulut (vertexclique)
>

Reply via email to