[ https://issues.apache.org/jira/browse/FLINK-32535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weijie Guo updated FLINK-32535: ------------------------------- Fix Version/s: 2.1.0 (was: 2.0.0) > CheckpointingStatisticsHandler periodically returns NullArgumentException > after job restarts > -------------------------------------------------------------------------------------------- > > Key: FLINK-32535 > URL: https://issues.apache.org/jira/browse/FLINK-32535 > Project: Flink > Issue Type: Bug > Components: Runtime / REST > Affects Versions: 2.1.0 > Reporter: Hong Liang Teoh > Priority: Major > Fix For: 2.1.0 > > > *What* > When making requests to /checkpoints REST API after a job restart, we see 500 > for a short period of time. We should handle this gracefully in the > CheckpointingStatisticsHandler. > > *How to replicate* > * Checkpointing interval 1s > * Job is constantly restarting > * Make constant requests to /checkpoints REST API. > See [here|https://github.com/apache/flink/pull/22901#issuecomment-1617830035] > for more info > > Stack trace: > {{org.apache.commons.math3.exception.NullArgumentException: input array}} > {{ at > org.apache.commons.math3.util.MathArrays.verifyValues(MathArrays.java:1753)}} > {{ at > org.apache.commons.math3.stat.descriptive.AbstractUnivariateStatistic.test(AbstractUnivariateStatistic.java:158)}} > {{ at > org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:272)}} > {{ at > org.apache.commons.math3.stat.descriptive.rank.Percentile.evaluate(Percentile.java:241)}} > {{ at > org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics$CommonMetricsSnapshot.getPercentile(DescriptiveStatisticsHistogramStatistics.java:159)}} > {{ at > org.apache.flink.runtime.metrics.DescriptiveStatisticsHistogramStatistics.getQuantile(DescriptiveStatisticsHistogramStatistics.java:53)}} > {{ at > org.apache.flink.runtime.checkpoint.StatsSummarySnapshot.getQuantile(StatsSummarySnapshot.java:108)}} > {{ at > org.apache.flink.runtime.rest.messages.checkpoints.StatsSummaryDto.valueOf(StatsSummaryDto.java:81)}} > {{ at > org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.createCheckpointingStatistics(CheckpointingStatisticsHandler.java:133)}} > {{ at > org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleCheckpointStatsRequest(CheckpointingStatisticsHandler.java:85)}} > {{ at > org.apache.flink.runtime.rest.handler.job.checkpoints.CheckpointingStatisticsHandler.handleCheckpointStatsRequest(CheckpointingStatisticsHandler.java:59)}} > {{ at > org.apache.flink.runtime.rest.handler.job.checkpoints.AbstractCheckpointStatsHandler.lambda$handleRequest$1(AbstractCheckpointStatsHandler.java:62)}} > {{ at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)}} > {{ at > java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)}} > {{ at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)}} > {{ at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)}} > {{ at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)}} > {{ at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)}} > {{ at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)}} > {{ at java.base/java.lang.Thread.run(Thread.java:829)\n}} > > See graphs here for tests. The dips in the green line correspond to the > failures immediately after a job restart. > !https://user-images.githubusercontent.com/35062175/250529297-908a6714-ea15-4aac-a7fc-332589da2582.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)