Re: Union of streams performance issue (10x)

2019-07-23 Thread Fabian Hueske
Hi Peter, The performance drops probably be due to de/serialization. When tasks are chained, records are simply forwarded as Java objects via method calls. When a task chain in broken into multiple operators, the records (Java objects) are serialized by the sending task, possibly shipped over the

Union of streams performance issue (10x)

2019-07-13 Thread Peter Zende
Hi all We have a pipeline (runs on YARN, Flink v1.7.1) which consumes a union of Kafka and HDFS sources. We remarked that the throughput is 10 times higher if only one of these sources is consumed. While trying to identify the problem I implemented a no-op source which was unioned with one of the