Re: How to use org.apache.hadoop.mapreduce.lib.input.MultipleInputs in Flink

Fabian Hueske Sat, 17 Jan 2015 11:47:04 -0800

Why don't you just create two data sources that each wrap the ParquetFormat
using a HadoopInputFormat and join them as for example done in the TPCH Q3
example [1]


I always found the MultipleInputFormat to be an ugly workaround for
Hadoop's deficiency to read data from multiple sources.
AFAIK, Hadoop's MultipleInputFormat does not provide data colocation that a
join could exploit. Or is there any other beneficial property that I am not
aware of?

[1]
https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/TPCHQuery3.java



2015-01-17 20:15 GMT+01:00 Felix Neutatz <neut...@googlemail.com>:

> Hi,
>
> is there any example which shows how I can load several files with
> different Hadoop input formats at once? My use case is that I want to load
> two tables (in Parquet format) via Hadoop and join them within Flink.
>
> Best regards,
>
> Felix
>

Re: How to use org.apache.hadoop.mapreduce.lib.input.MultipleInputs in Flink

Reply via email to