Hi Felipe, The examples are pretty old (6 years), hence they still use DataSet.
You should be fine by mostly replacing sources with file sources (no need to write your own source, except you want to generators) and using global windows for joining. However, why not use SQL for TPC-H? We have an e2e test [1], where some TPC-H queries are used (in slightly modified form) [2]. We also have TPC-DS queries as e2e tests [3]. [1] https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-tpch-test [2] https://github.com/apache/flink/tree/master/flink-end-to-end-tests/test-scripts/test-data/tpch/modified-query [3] https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-tpcds-test On Mon, Jun 22, 2020 at 12:35 PM Felipe Gutierrez < felipe.o.gutier...@gmail.com> wrote: > Hi all, > > I would like to create some data stream queries tests using the TPC-H > benchmark. I saw that there are some examples of TPC Q3[1] and Q10[2], > however, they are using DataSet. If I consider creating these queries > but using DataStream what are the caveats that I have to ensure when > implementing the source function? I mean, the frequency of emitting > items is certainly the first. I suppose that I would change the > frequency of the workload globally for all data sources. Is only it or > do you have other things to consider? > > [1] > https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/relational/TPCHQuery3.java > [2] > https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/relational/TPCHQuery10.java > > Thanks, > Felipe > -- > -- Felipe Gutierrez > -- skype: felipe.o.gutierrez > -- https://felipeogutierrez.blogspot.com > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng