Re: Disk full problem faced due to the Flink tmp directory contents

2019-07-10 Thread Fabian Hueske
Hi, AFAIK Flink should remove temporary files automatically when they are not needed anymore. However, I'm not 100% sure that there are not corner cases when a TM crashes. In general it is a good idea to properly configure the directories that Flink uses for spilling, logging, blob storage, etc.

Re: Disk full problem faced due to the Flink tmp directory contents

2019-07-10 Thread Ken Krugler
Hi Konstantinos, Typically the data that you are seeing is from records being spilled to disk during groupBy/join operations, where the size of one (or multiple, for the join case) data sets exceeds what will fit in memory. And yes, these files can get big, e.g. as big as the sum of your input

Disk full problem faced due to the Flink tmp directory contents

2019-07-10 Thread Papadopoulos, Konstantinos
Hi all, We are developing several batch processing applications using the DataSet API of the Apache Flink. For the time being, we are facing an issue with one of our production environments since its disk usage increase enormously. After a quick investigation, we concluded that the /tmp/flink-i