[ https://issues.apache.org/jira/browse/FLINK-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephan Ewen resolved FLINK-8856. --------------------------------- Resolution: Fixed Fixed in - 1.5.0 via 36dccdc520c676ca2c53046cad4ec80c351068b7 - 1.6.0 via 1e1237628a6bf05cdcdde6f9f3a236d961f05b5d > Move all interrupt() calls to TaskCanceler > ------------------------------------------ > > Key: FLINK-8856 > URL: https://issues.apache.org/jira/browse/FLINK-8856 > Project: Flink > Issue Type: Bug > Components: TaskManager > Reporter: Stephan Ewen > Assignee: Stephan Ewen > Priority: Blocker > Fix For: 1.5.0, 1.6.0 > > > We need this to work around the following JVM bug: > https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8138622 > To circumvent this problem, the {{TaskCancelerWatchDog}} must not call > {{interrupt()}} at all, but only join on the executing thread (with timeout) > and cause a hard exit once cancellation takes to long. > A user affected by this problem reported this in FLINK-8834 > Personal note: The Thread.join(...) method unfortunately is not 100% reliable > as well, because it uses {{System.currentTimeMillis()}} rather than > {{System.nanoTime()}}. Because of that, sleeps can take overly long when the > clock is adjusted. I wonder why the JDK authors do not follow their own > recommendations and use {{System.nanoTime()}} for all relative time > measures... > EDIT: I am not the only one wondering why: > https://stackoverflow.com/questions/42544387/why-does-thread-join-use-currenttimemillis -- This message was sent by Atlassian JIRA (v7.6.3#76005)