Re: Error: 'SparkContext' object has no attribute 'getActiveStageIds'
getStageInfo in self._jtracker.getStageInfo below seems not implemented/included in the current python library. def getStageInfo(self, stageId): """ Returns a :class:`SparkStageInfo` object, or None if the stage info could not be found or was garbage collected. """ stage = self._jtracker.getStageInfo(stageId) if stage is not None: # TODO: fetch them in batch for better performance attrs = [getattr(stage, f)() for f in SparkStageInfo._fields[1:]] return SparkStageInfo(stageId, *attrs) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Error-SparkContext-object-has-no-attribute-getActiveStageIds-tp11136p11140.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
parallelize method v.s. textFile method
We have a large file and we used to read chunks and then use parallelize method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk by chunk. Recently we read the whole file using textFile method and found the map/reduce job is much faster. Anybody can help us to understand why? We have verified that reading file is NOT a bottleneck. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: parallelize method v.s. textFile method
When we compare the performance, we already excluded this part of time difference. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871p12873.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org