Re: When will spark support "push" style shuffle?

2015-01-07 Thread Xuelin Cao.2015
Got it. The explain makes sense. Thank you. On Thu, Jan 8, 2015 at 1:06 PM, Patrick Wendell [via Apache Spark Developers List] wrote: > This question is conflating a few different concepts. I think the main > question is whether Spark will have a shuffle implementation that > streams data rathe

Need Help to display output of the the command on UI

2015-01-07 Thread Indu Chaube
Hi Folk, I need to print output of the below command on Web UI val conf=new SparkConf().setMaster("local") val sc=new SparkContext(conf) val file1=sc.textFile("/var/log/dpkg.log") //Applying filter onto the data val data1=file1.filter(line => line.contains("installed")) {data

Re: When will spark support "push" style shuffle?

2015-01-07 Thread Patrick Wendell
This question is conflating a few different concepts. I think the main question is whether Spark will have a shuffle implementation that streams data rather than persisting it to disk/cache as a buffer. Spark currently decouples the shuffle write from the read using disk/OS cache as a buffer. The t

Fwd: When will spark support "push" style shuffle?

2015-01-07 Thread 曹雪林
Hi, I've heard a lot of complain about spark's "pull" style shuffle. Is there any plan to support "push" style shuffle in the near future? Currently, the shuffle phase must be completed before the next stage starts. While, it is said, in Impala, the shuffled data is "streamed" to the

Re: Missing Catalyst API docs

2015-01-07 Thread Reynold Xin
I'm in the middle of revamping the SchemaRDD public API and in 1.3, we will have a public, documented version of the expression library. The Catalyst expression library will remain hidden. You can track it with this ticket: https://issues.apache.org/jira/browse/SPARK-5097 On Wed, Jan 7, 2015 at

Missing Catalyst API docs

2015-01-07 Thread Alessandro Baretta
Gents, It looks like some of the Catalyst classes' API docs are missing. For instance, the Expression class, referred to by the SchemaRDD docs seems to be missing. (See here: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SchemaRDD ) Is this intended or is it due to

Spark on teradata?

2015-01-07 Thread gen tang
Hi, I have a stupid question: Is it possible to use spark on Teradata data warehouse, please? I read some news on internet which say yes. However, I didn't find any example about this issue Thanks in advance. Cheers Gen

Re: Hang on Executor classloader lookup for the remote REPL URL classloader

2015-01-07 Thread Patrick Wendell
Hey Andrew, So the executors in Spark will fetch classes from the driver node for classes defined in the repl from an HTTP server on the driver. Is this happening in the context of a repl session? Also, is it deterministic or does it happen only periodically? The reason all of the other threads a

Re: spark-yarn_2.10 1.2.0 artifacts

2015-01-07 Thread Jong Wook Kim
I don't think that's the case. spark-yarn contains `org.apache.spark.deploy.yarn` package, whereas spark-network-yarn contains `org.apache.spark.network.yarn`, and they do different things. The former contains codes for deploying Spark applications to YARN cluster, and called when running `spark