Seems like https://issues.apache.org/jira/browse/SPARK-13346
is likely the same issue. Seems like for some people persist() doesn't work and they have to convert to RDDs and back. On Fri, Apr 14, 2017 at 1:39 PM, Everett Anderson <ever...@nuna.com> wrote: > Hi, > > We keep hitting a situation on Spark 2.0.2 (haven't tested later versions, > yet) where the driver spins forever seemingly in query plan optimization > for moderate queries, such as the union of a few (~5) other DataFrames. > > We can see the driver spinning with one core in the nioEventLoopGroup-2-2 > thread in a deep trace like the attached. > > Throwing in a MEMORY_OR_DISK persist() so the query plan is collapsed > works around this, but it's a little surprising how often we encounter the > problem, forcing us to work to manage persisting/unpersisting tables and > potentially suffering unnecessary disk I/O. > > I've looking through JIRA but don't see open issues about this -- might've > just not found them successfully. > > Anyone else encounter this? > >