Re: Error: 'SparkContext' object has no attribute 'getActiveStageIds'

2015-03-20 Thread xing
getStageInfo in self._jtracker.getStageInfo below seems not
implemented/included in the current python library.

   def getStageInfo(self, stageId):
"""
Returns a :class:`SparkStageInfo` object, or None if the stage
info could not be found or was garbage collected.
"""
stage = self._jtracker.getStageInfo(stageId)
if stage is not None:
# TODO: fetch them in batch for better performance
attrs = [getattr(stage, f)() for f in
SparkStageInfo._fields[1:]]
return SparkStageInfo(stageId, *attrs)



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Error-SparkContext-object-has-no-attribute-getActiveStageIds-tp11136p11140.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



parallelize method v.s. textFile method

2015-06-24 Thread xing
We have a large file and we used to read chunks and then use parallelize
method (distData = sc.parallelize(chunk)) and then do the map/reduce chunk
by chunk. Recently we read the whole file using textFile method and found
the map/reduce job is much faster. Anybody can help us to understand why? We
have verified that reading file is NOT a bottleneck.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: parallelize method v.s. textFile method

2015-06-24 Thread xing
When we compare the performance, we already excluded this part of time
difference.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/parallelize-method-v-s-textFile-method-tp12871p12873.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org