AFAIK, there is no way to roll the *.out/err files except we hijack the stdout/stderr in Flink code. However, it is a temporary hack.
A good way is to write your logs to other separate files that could roll via log4j. If you want to access them in the Flink webUI, upgrade to the 1.11 version. Then you will find a "Log List" tab under JobManager sidebar. Best, Yang Maxim Parkachov <lazy.gop...@gmail.com> 于2020年8月4日周二 下午2:52写道: > Hi Yang, > > you are right. Since then, I looked for open files and found *.out/*.err > files on that partition and as you mentioned they don't roll. > I could implement a workaround to restart the streaming job every week or > so, but I really don't want to go this way. > > I tried to forward logs to files and then I could roll them, but then I > don't see logs in the GUI. > > So my question would be, how to make them roll ? > > Regards, > Maxim. > > On Tue, Aug 4, 2020 at 4:48 AM Yang Wang <danrtsey...@gmail.com> wrote: > >> Hi Maxim, >> >> First, i want to confirm with you that do you have checked all the >> "yarn.nodemanager.log-dirs". If you >> could access the logs in Flink webUI, the log files(e.g. taskmanager.log, >> taskmanager.out, taskmanager.err) >> should exist. I suggest to double check the multiple log-dirs. >> >> Since the *.out/err files do not roll, if you print some user logs to the >> stdout/stderr, the two files will increase >> over time. >> >> When you stop the Flink application, Yarn will clean up all the jars and >> logs, so you find that the disk space get back. >> >> >> Best, >> Yang >> >> Maxim Parkachov <lazy.gop...@gmail.com> 于2020年7月30日周四 下午10:00写道: >> >>> Hi everyone, >>> >>> I have a strange issue with flink logging. I use pretty much standard >>> log4 config, which is writing to standard output in order to see it in >>> Flink GUI. Deployment is on YARN with job mode. I can see logs in UI, no >>> problem. On the servers, where Flink YARN containers are running, there is >>> disk quota on the partition where YARN normally creates logs. I see no >>> specific files in the application_xx directory, but space on the disk is >>> actually decreasing with time. After several weeks we eventually hit quota. >>> It seems like some file or pipe is created but not closed, but still >>> reserves the space. After I restart Flink job, space is >>> immediately returned back. I'm sure that flink job is the problem, I have >>> re-produces issue on a cluster where only 1 filnk job was running. Below is >>> my log4 config. Any help or idea is appreciated. >>> >>> Thanks in advance, >>> Maxim. >>> ------------------------------------------- >>> # This affects logging for both user code and Flink >>> log4j.rootLogger=INFO, file, stderr >>> >>> # Uncomment this if you want to _only_ change Flink's logging >>> #log4j.logger.org.apache.flink=INFO >>> >>> # The following lines keep the log level of common libraries/connectors >>> on >>> # log level INFO. The root logger does not override this. You have to >>> manually >>> # change the log levels here. >>> log4j.logger.akka=INFO >>> log4j.logger.org.apache.kafka=INFO >>> log4j.logger.org.apache.hadoop=INFO >>> log4j.logger.org.apache.zookeeper=INFO >>> >>> # Log all infos in the given file >>> log4j.appender.file=org.apache.log4j.FileAppender >>> log4j.appender.file.file=${log.file} >>> log4j.appender.file.append=false >>> log4j.appender.file.layout=org.apache.log4j.PatternLayout >>> log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} >>> %-5p %-60c %x - %m%n >>> >>> # Suppress the irrelevant (wrong) warnings from the Netty channel handler >>> log4j.logger.org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline=ERROR, >>> file >>> >>>