Flink Table Duplicate Evaluation

Niklas Teichmann Tue, 20 Nov 2018 08:13:37 -0800

Hi everybody,

I have a question concerning the Flink Table API, more precisely theway the results of tables statements are evaluated. In the followingcode example, the statement defining the table t1 is evaluated twice,an effect that leads to some issues of performance and logic in theprogram I am trying to write.


List<Long> longList = Arrays.asList(1L, 2L, 3L, 4L, 5L);

DataSet<Long> longDataSet =getExecutionEnvironment().fromCollection(longList);


tenv.registerDataSet("longs", longDataSet, "l");

tenv.registerFunction("time", new Time()); //an example UDF thatevaluates the current time


Table t1 = tenv.scan("longs");
t1 = t1.select("l, time() as t");

Table t2 = t1.as("l1, id1");
Table t3 = t1.as("l2, id2");

Table t4 = t2.join(t3).where("l1 == l2");

t4.writeToSink(new PrintTableSink() ); //a sink that prints thecontent of the table

I realize that this behaviour is defined in the documentation ("Aregistered Table is treated similarly to a VIEW ...") and probablystems from the DataStream API. But is there a preferred way to avoidthis?

Currently I'm using a workaround that defines a TableSink which inturn registers its output as a new table. That seems extremely hackythough.


Sorry if I missed something obvious!

All the best,
Niklas


--

Flink Table Duplicate Evaluation

Reply via email to