Hi Yang, you are right. Since then, I looked for open files and found *.out/*.err files on that partition and as you mentioned they don't roll. I could implement a workaround to restart the streaming job every week or so, but I really don't want to go this way.
I tried to forward logs to files and then I could roll them, but then I don't see logs in the GUI. So my question would be, how to make them roll ? Regards, Maxim. On Tue, Aug 4, 2020 at 4:48 AM Yang Wang <danrtsey...@gmail.com> wrote: > Hi Maxim, > > First, i want to confirm with you that do you have checked all the > "yarn.nodemanager.log-dirs". If you > could access the logs in Flink webUI, the log files(e.g. taskmanager.log, > taskmanager.out, taskmanager.err) > should exist. I suggest to double check the multiple log-dirs. > > Since the *.out/err files do not roll, if you print some user logs to the > stdout/stderr, the two files will increase > over time. > > When you stop the Flink application, Yarn will clean up all the jars and > logs, so you find that the disk space get back. > > > Best, > Yang > > Maxim Parkachov <lazy.gop...@gmail.com> 于2020年7月30日周四 下午10:00写道: > >> Hi everyone, >> >> I have a strange issue with flink logging. I use pretty much standard >> log4 config, which is writing to standard output in order to see it in >> Flink GUI. Deployment is on YARN with job mode. I can see logs in UI, no >> problem. On the servers, where Flink YARN containers are running, there is >> disk quota on the partition where YARN normally creates logs. I see no >> specific files in the application_xx directory, but space on the disk is >> actually decreasing with time. After several weeks we eventually hit quota. >> It seems like some file or pipe is created but not closed, but still >> reserves the space. After I restart Flink job, space is >> immediately returned back. I'm sure that flink job is the problem, I have >> re-produces issue on a cluster where only 1 filnk job was running. Below is >> my log4 config. Any help or idea is appreciated. >> >> Thanks in advance, >> Maxim. >> ------------------------------------------- >> # This affects logging for both user code and Flink >> log4j.rootLogger=INFO, file, stderr >> >> # Uncomment this if you want to _only_ change Flink's logging >> #log4j.logger.org.apache.flink=INFO >> >> # The following lines keep the log level of common libraries/connectors on >> # log level INFO. The root logger does not override this. You have to >> manually >> # change the log levels here. >> log4j.logger.akka=INFO >> log4j.logger.org.apache.kafka=INFO >> log4j.logger.org.apache.hadoop=INFO >> log4j.logger.org.apache.zookeeper=INFO >> >> # Log all infos in the given file >> log4j.appender.file=org.apache.log4j.FileAppender >> log4j.appender.file.file=${log.file} >> log4j.appender.file.append=false >> log4j.appender.file.layout=org.apache.log4j.PatternLayout >> log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} >> %-5p %-60c %x - %m%n >> >> # Suppress the irrelevant (wrong) warnings from the Netty channel handler >> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR, >> file >> >>