[jira] [Updated] (SPARK-33620) Task not started after filtering

Vladislav Sterkhov (Jira) Tue, 01 Dec 2020 02:38:12 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-33620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vladislav Sterkhov updated SPARK-33620:
---------------------------------------
    Description: 
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png|width=644,height=150!

 

!mgg1s.png|width=651,height=182!

 

This my code:

var allTrafficRDD = sparkContext.emptyRDD[String]
 for (traffic <- trafficBuffer) {
 logger.info("Load traffic path - "+traffic)
 val trafficRDD = sparkContext.textFile(traffic)
 if (isValidTraffic(trafficRDD, isMasterData))

{ allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }

}
 
hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
 outTable, isMasterData)

  was:
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory 
used task starting and complete, but we need use unlimited stack. Please help

 

!VlwWJ.png!

 

!mgg1s.png!

 

This my code:

var allTrafficRDD = sparkContext.emptyRDD[String]
 for (traffic <- trafficBuffer) {
 logger.info("Load traffic path - "+traffic)
 val trafficRDD = sparkContext.textFile(traffic)
 if (isValidTraffic(trafficRDD, isMasterData))

{ allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }

}
 
hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
 outTable, isMasterData)


> Task not started after filtering
> --------------------------------
>
>                 Key: SPARK-33620
>                 URL: https://issues.apache.org/jira/browse/SPARK-33620
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 2.4.7
>            Reporter: Vladislav Sterkhov
>            Priority: Major
>         Attachments: VlwWJ.png, mgg1s.png
>
>
> Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb 
> memory used task starting and complete, but we need use unlimited stack. 
> Please help
>  
> !VlwWJ.png|width=644,height=150!
>  
> !mgg1s.png|width=651,height=182!
>  
> This my code:
> var allTrafficRDD = sparkContext.emptyRDD[String]
>  for (traffic <- trafficBuffer) {
>  logger.info("Load traffic path - "+traffic)
>  val trafficRDD = sparkContext.textFile(traffic)
>  if (isValidTraffic(trafficRDD, isMasterData))
> { allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD)) }
> }
>  
> hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
>  outTable, isMasterData)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-33620) Task not started after filtering

Reply via email to