getStageInfo in self._jtracker.getStageInfo below seems not
implemented/included in the current python library.
def getStageInfo(self, stageId):
"""
Returns a :class:`SparkStageInfo` object, or None if the stage
info could not be found or was garbage collected.
"
We have a large file and we used to read chunks and then use parallelize
method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk
by chunk. Recently we read the whole file using textFile method and found
the map/reduce job is much faster. Anybody can help us to understand why? We
When we compare the performance, we already excluded this part of time
difference.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871p12873.html
Sent from the Apache Spark Developers List mailing list archive