[ https://issues.apache.org/jira/browse/FLINK-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
SUBRAMANYA SURESH updated FLINK-9471: ------------------------------------- Description: We are using Flink SQL, I see job ending logs that are logged at info level, that makes it very hard for me to tune out the Info messages in the configuration. Note: If I do end up using Info, the same executionGraph logs the entire query for the operationGraph for every info statement, and this fills up the logs easily if we have say 100-200 queries. Note the "-" below indicate an entire line of execution graph for this query (redacted for privacy). 018-03-30 03:32:09,942 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> - - - - - (208/725).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Source: Custom Source -> (- - - - - ) (208/725). ... 6 more Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not flush and close the file system output stream to hdfs://security-temp/savedSearches/checkpoint/561eb649376bef2f2d8daa1e3a0fa6db/chk-1/067924e4-c861-4de1-823e-b255a0bf9998 in order to obtain the stream state handle at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:892) was: We are using Flink SQL, I see job ending logs that are logged at info level, that makes it very hard for me to tune out the Info messages in the configuration. Note: If I do end up using Info, the same executionGraph logs the entire query for the operationGraph for every info statement, and this fills up the logs easily if we have say 100-200 queries. Note the "-" below indicate an entire line of execution graph for this query (redacted for privacy). 2018-03-30 03:32:09,943 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Discarding checkpoint 1 because: java.lang.Exception: Could not materialize checkpoint 1 for operator Source: Custom Source -> (Map -> where: (AND(=- - - - -.} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator Source: Custom Source -> (Map -> where: (AND(=(Environment, _UTF-16LE'SFDC-IT'), =(RuleMatch, _UTF-16LE'SFA'), =(LogType, _UTF-16LE'SAML-AUTH'), =(Outcome, _UTF-16LE'DENY'))), select: (proctime, CAST(_UTF-16LE'SFDC-IT') AS Environment, CollectedTimestamp, EventTimestamp, _raw, Aggregator), Map -> where: (AND(=(Environment, _UTF-16LE'SFDC-IT'), =(RuleMatch, _UTF-16LE'SFA'), =(LogType, _UTF-16LE'SAML-AUTH'), =(Outcome, _UTF-16LE'DENY'))), select: (proctime, CAST(_UTF-16LE'SFDC-IT') AS Environment, CollectedTimestamp, EventTimestamp, _raw, Aggregator)) (353/725). ... 6 more Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could not flush and close the file system output stream to hdfs://security-temp/savedSearches/checkpoint/561eb649376bef2f2d8daa1e3a0fa6db/chk-1/31b94717-9e6d-49b8-b64d-2a1a8ba04425 in order to obtain the stream state handle at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:892) ... 5 more Suppressed: java.lang.Exception: Could not properly cancel managed operator state future. at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) ... 5 more > Job ending exceptions being logged at Info level > ------------------------------------------------ > > Key: FLINK-9471 > URL: https://issues.apache.org/jira/browse/FLINK-9471 > Project: Flink > Issue Type: Bug > Affects Versions: 1.4.2 > Reporter: SUBRAMANYA SURESH > Priority: Major > > We are using Flink SQL, I see job ending logs that are logged at info level, > that makes it very hard for me to tune out the Info messages in the > configuration. Note: If I do end up using Info, the same executionGraph logs > the entire query for the operationGraph for every info statement, and this > fills up the logs easily if we have say 100-200 queries. > Note the "-" below indicate an entire line of execution graph for this query > (redacted for privacy). > 018-03-30 03:32:09,942 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: > Custom Source -> - > - > - > - > - (208/725).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 1 for > operator Source: Custom Source -> (- > - > - > - > - > ) (208/725). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > Could not flush and close the file system output stream to > hdfs://security-temp/savedSearches/checkpoint/561eb649376bef2f2d8daa1e3a0fa6db/chk-1/067924e4-c861-4de1-823e-b255a0bf9998 > in order to obtain the stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:892) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)