补充下,jobmanager日志异常:

2021-02-01 08:54:43,639 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:44,642 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:45,644 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:46,647 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:47,649 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:48,652 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:49,655 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:50,658 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:50,921 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering
checkpoint 8697 (type=CHECKPOINT) @ 1612169690917 for job
1299f2f27e56ec36a4e0ffd3472ad399.
2021-02-01 08:54:50,999 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Decline
checkpoint 8697 by task 320d2c162f17265435777bb65e1a8934 of job
1299f2f27e56ec36a4e0ffd3472ad399 at
container_e21_1596002540781_1159_01_000134 @
ip-10-120-83-22.ap-northeast-1.compute.internal (dataPort=42984).
2021-02-01 08:54:51,661 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:52,654 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
1800000, 600000)], properties=[w$start, w$end, w$rowtime, w$proctime],
select=[COUNT(DISTINCT $f1) AS totalCount, start('w$) AS w$start, end('w$)
AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]) ->
Calc(select=[(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'yyyy-MM-dd
HH:mm:ss')) * 1000) AS requestTime, totalCount]) (1/1)
(6beee54a923323c369b046e199f572c4) switched from RUNNING to FAILED on
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@379a8f9c.
java.io.IOException: Could not perform checkpoint 8697 for operator
GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
1800000, 600000)], properties=[w$start, w$end, w$rowtime, w$proctime],
select=[COUNT(DISTINCT $f1) AS totalCount, start('w$) AS w$start, end('w$)
AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]) ->
Calc(select=[(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'yyyy-MM-dd
HH:mm:ss')) * 1000) AS requestTime, totalCount]) (1/1).
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:897)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:113)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:137)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:93)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:158)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
                at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:351)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:567)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:536)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could
not complete snapshot 8697 for operator
GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
1800000, 600000)], properties=[w$start, w$end, w$rowtime, w$proctime],
select=[COUNT(DISTINCT $f1) AS totalCount, start('w$) AS w$start, end('w$)
AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]) ->
Calc(select=[(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'yyyy-MM-dd
HH:mm:ss')) * 1000) AS requestTime, totalCount]) (1/1). Failure reason:
Checkpoint was declined.
        at
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:215)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:156)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:314)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:614)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:540)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:507)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:266)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:926)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:916)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:884)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        ... 13 more
Caused by: org.apache.flink.util.SerializedThrowable: While open a file for
appending:
/server/yarn/nm/usercache/yarn/appcache/application_1596002540781_1159/flink-io-1ad6bdc6-aea8-4dc5-a133-7c7b5e2361fe/job_1299f2f27e56ec36a4e0ffd3472ad399_op_AggregateWindowOperator_fa157648fdadffa65122f5b4200f4fda__1_1__uuid_9744ef17-bf12-471c-b486-19140201517f/db/038968.sst:
Too many open files
        at org.rocksdb.Checkpoint.createCheckpoint(Native Method)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at org.rocksdb.Checkpoint.createCheckpoint(Checkpoint.java:51)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.takeDBNativeCheckpoint(RocksIncrementalSnapshotStrategy.java:255)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:159)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:126)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:459)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:198)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:156)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:314)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:614)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:540)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:507)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:266)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:926)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:916)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:884)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
        ... 13 more
2021-02-01 08:54:52,654 INFO 
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
[] - Calculating tasks to restart to recover the failed task
fa157648fdadffa65122f5b4200f4fda_0.
2021-02-01 08:54:52,654 INFO 
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
[] - 7 tasks should be restarted to recover the failed task
fa157648fdadffa65122f5b4200f4fda_0. 
2021-02-01 08:54:52,654 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
insert-into_default_catalog.default_database.risk_final_accept_sink,default_catalog.default_database.risk_final_accept_grafana_sink
(1299f2f27e56ec36a4e0ffd3472ad399) switched from state RUNNING to
RESTARTING.
2021-02-01 08:54:52,654 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
1800000, 600000)], properties=[w$start, w$end, w$rowtime, w$proctime],
select=[COUNT(DISTINCT merchantReferenceCode) AS acceptCount, start('w$) AS
w$start, end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS
w$proctime]) -> Calc(select=[_UTF-16LE'risk_final_accept_hop10min30min' AS
eventCode, (w$start DATE_FORMAT _UTF-16LE'yyyy-MM-dd HH:mm:ss') AS
timeStart, (w$end DATE_FORMAT _UTF-16LE'yyyy-MM-dd HH:mm:ss') AS timeEnd,
(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'yyyy-MM-dd HH:mm:ss')) *
1000) AS requestTime, _UTF-16LE'0' AS userId, acceptCount]) (1/1)
(52f55328f6bf756dd1c63bb0d149e55b) switched from RUNNING to CANCELING.




--
Sent from: http://apache-flink.147419.n8.nabble.com/

回复