Vladislav Sterkhov created SPARK-33620:
------------------------------------------
Summary: Task not started after filtering
Key: SPARK-33620
URL: https://issues.apache.org/jira/browse/SPARK-33620
Project: Spark
Issue Type: Question
Components: Spark Core
Affects Versions: 2.4.7
Reporter: Vladislav Sterkhov
Hello i have problem with big memory used ~2000gb hdfs stack. With 300gb memory
used task starting and complete, but we need use unlimited stack. Please help
!image-2020-12-01-13-34-17-283.png!
!image-2020-12-01-13-34-31-288.png!
This my code:
{{var allTrafficRDD = sparkContext.emptyRDD[String]
for (traffic <- trafficBuffer) \{
logger.info("Load traffic path - "+traffic)
val trafficRDD = sparkContext.textFile(traffic)
if (isValidTraffic(trafficRDD, isMasterData)) {
allTrafficRDD = allTrafficRDD.++(filterTraffic(trafficRDD))
}
}
hiveService.insertTrafficRDD(allTrafficRDD.repartition(beforeInsertPartitionsNum),
outTable, isMasterData)}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]