stateStoreCoordinator uses runId to deal with a small chance that Spark cannot turn a bad task down. Please see https://github.com/apache/spark/pull/18355
On Fri, Oct 27, 2017 at 3:40 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I'm wondering why StreamingQueryManager.notifyQueryTermination [1] use a > query id to remove it from the activeQueries internal registry [2] while > notifies stateStoreCoordinator using runId [3]? > > My understanding is that id is the same across different runs of a query > so once StreamingQueryManager removes the query (by its id) it effectively > knows nothing about the query yet stateStoreCoordinator may have other > instances running (since we only deactivated a single run). > > Why is the "inconsistency"? > > [1] https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala# > L325 > > [2] https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala# > L327 > > [3] https://github.com/apache/spark/blob/master/sql/core/ > src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala# > L335 > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski >