Hi Stan,

Looks like it is the same issue we are working to solve. Related PRs are:

https://github.com/apache/spark/pull/16998
https://github.com/apache/spark/pull/16785

You can take a look of those PRs and help review too. Thanks. 


StanZhai wrote
> Thanks for Cheng's help.
> 
> 
> It must be something wrong with InferFiltersFromConstraints, I just
> removed InferFiltersFromConstraints from
> org/apache/spark/sql/catalyst/optimizer/Optimizer.scala to avoid this
> issue. I will analysis this issue with the method you provided.
> 
> 
> 
> 
> ------------------ Original ------------------
> From:  "Cheng Lian [via Apache Spark Developers
> List]";<ml-node+s1001551n21069...@n3.nabble.com>;
> Send time: Friday, Feb 24, 2017 2:28 AM
> To: "Stan Zhai"<m...@zhaishidan.cn>; 
> 
> Subject:  Re: The driver hangs at DataFrame.rdd in Spark 2.1.0
> 
> 
> 
>                          
> This one seems to be relevant, but it's already fixed in 2.1.0.
>      
> One way to debug is to turn on trace log and check how the      
> analyzer/optimizer behaves.
>      
>      
>      On 2/22/17 11:11 PM, StanZhai wrote:
>      
>             Could this be related to
> https://issues.apache.org/jira/browse/SPARK-17733 ?
>                 
>          
>          
>          
>          ------------------ Original ------------------
>                     From:  "Cheng Lian-3 [via Apache Spark Developers         
>    
> List]";<[hidden               email]>;
>            Send time: Thursday, Feb 23, 2017 9:43 AM
>            To: "Stan Zhai"<[hidden               email]>; 
>            Subject:  Re: The driver hangs at DataFrame.rdd             in
> Spark 2.1.0
>          
>          
>          
>          
> Just from the thread dump you provided, it seems that this          
> particular query plan jams our optimizer. However, it's also          
> possible that the driver just happened to be running optimizer          
> rules at that particular time point.
>          
>          
> Since query planning doesn't touch any actual data, could you          
> please try to minimize this query by replacing the actual          
> relations with temporary views derived from Scala local          
> collections? In this way, it would be much easier for others           to
> reproduce issue.
>          
> Cheng
>          
>          
>          On 2/22/17 5:16 PM, Stan Zhai           wrote:
>          
>                     Thanks for lian's reply.
>            
>            
>            Here is the QueryPlan generated by Spark 1.6.2(I can't            
> get it in Spark 2.1.0):
>                         ...           
>                         
>                         
>              
>              ------------------ Original ------------------
>                             Subject:  Re: The driver hangs at                
> DataFrame.rdd in Spark 2.1.0
>              
>              
>              
>              
> What is the query plan? We had once observed query plans              
> that grow exponentially in iterative ML workloads and the              
> query planner hangs forever. For example, each iteration              
> combines 4 plan trees of the last iteration and forms a              
> larger plan tree. The size of the plan tree can easily               reach
> billions of nodes after 15 iterations.
>              
>              
>              On 2/22/17 9:29 AM, Stan Zhai               wrote:
>              
>                             Hi all,
>                
>                
>                The driver hangs at DataFrame.rdd in Spark 2.1.0 when          
>       
> the DataFrame(SQL) is complex, Following thread dump of                 my
> driver:
>                ...
>                           
>            
>                   
>          
>          
>          
>                     If you reply to this email, your             message
> will be added to the discussion below:
>           
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21053.html
>          
>                     To start a new topic under Apache Spark Developers
> List, email           [hidden email]           
>            To unsubscribe from Apache Spark Developers List, click here.
>            NAML 
>        
>        
>        
>        View this message in context: Re:         The driver hangs at
> DataFrame.rdd in Spark 2.1.0
>        Sent from the Apache         Spark Developers List mailing list
> archive at Nabble.com.
>           
>                               
>       
>       
>                       If you reply to this email, your message will be added 
> to the
> discussion below:
>       
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21069.html
>       
>                       To start a new topic under Apache Spark Developers 
> List, email
> ml-node+s1001551n1...@n3.nabble.com 
>               To unsubscribe from Apache Spark Developers List, click here.
>               NAML





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21084.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to