Hi,
traditionally, you would do a join, but that would mean to read all Parquet
files that might contain relevant data which might be too much.
If you want to read data from within a user function (like GroupReduce),
you are pretty much up to your own.
You could create a HadoopInputFormat wrapping
Hello,
I will describe my use case shortly with steps for easier understanding:
1) currently my job is loading data from parquet files using
HadoopInputFormat along with AvroParquetInputFormat, with current approach:
AvroParquetInputFormat inputFormat = new
AvroParquetInputFormat();