Re: Error: 'SparkContext' object has no attribute 'getActiveStageIds'

2015-03-20 Thread xing
getStageInfo in self._jtracker.getStageInfo below seems not implemented/included in the current python library. def getStageInfo(self, stageId): """ Returns a :class:`SparkStageInfo` object, or None if the stage info could not be found or was garbage collected. "

parallelize method v.s. textFile method

2015-06-24 Thread xing
We have a large file and we used to read chunks and then use parallelize method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk by chunk. Recently we read the whole file using textFile method and found the map/reduce job is much faster. Anybody can help us to understand why? We

Re: parallelize method v.s. textFile method

2015-06-24 Thread xing
When we compare the performance, we already excluded this part of time difference. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871p12873.html Sent from the Apache Spark Developers List mailing list archive