Hi,
AFAIK Flink should remove temporary files automatically when they are not
needed anymore.
However, I'm not 100% sure that there are not corner cases when a TM
crashes.
In general it is a good idea to properly configure the directories that
Flink uses for spilling, logging, blob storage, etc.
Hi Konstantinos,
Typically the data that you are seeing is from records being spilled to disk
during groupBy/join operations, where the size of one (or multiple, for the
join case) data sets exceeds what will fit in memory.
And yes, these files can get big, e.g. as big as the sum of your input
Hi all,
We are developing several batch processing applications using the DataSet API
of the Apache Flink.
For the time being, we are facing an issue with one of our production
environments since its disk usage increase enormously. After a quick
investigation, we concluded that the /tmp/flink-i