Got it. The explain makes sense. Thank you.
On Thu, Jan 8, 2015 at 1:06 PM, Patrick Wendell [via Apache Spark
Developers List] wrote:
> This question is conflating a few different concepts. I think the main
> question is whether Spark will have a shuffle implementation that
> streams data rathe
Hi Folk,
I need to print output of the below command on Web UI
val conf=new SparkConf().setMaster("local")
val sc=new SparkContext(conf)
val file1=sc.textFile("/var/log/dpkg.log")
//Applying filter onto the data
val data1=file1.filter(line => line.contains("installed"))
{data
This question is conflating a few different concepts. I think the main
question is whether Spark will have a shuffle implementation that
streams data rather than persisting it to disk/cache as a buffer.
Spark currently decouples the shuffle write from the read using
disk/OS cache as a buffer. The t
Hi,
I've heard a lot of complain about spark's "pull" style shuffle. Is
there any plan to support "push" style shuffle in the near future?
Currently, the shuffle phase must be completed before the next stage
starts. While, it is said, in Impala, the shuffled data is "streamed" to
the
I'm in the middle of revamping the SchemaRDD public API and in 1.3, we will
have a public, documented version of the expression library. The Catalyst
expression library will remain hidden.
You can track it with this ticket:
https://issues.apache.org/jira/browse/SPARK-5097
On Wed, Jan 7, 2015 at
Gents,
It looks like some of the Catalyst classes' API docs are missing. For
instance, the Expression class, referred to by the SchemaRDD docs seems to
be missing. (See here:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SchemaRDD
)
Is this intended or is it due to
Hi,
I have a stupid question:
Is it possible to use spark on Teradata data warehouse, please? I read some
news on internet which say yes. However, I didn't find any example about
this issue
Thanks in advance.
Cheers
Gen
Hey Andrew,
So the executors in Spark will fetch classes from the driver node for
classes defined in the repl from an HTTP server on the driver. Is this
happening in the context of a repl session? Also, is it deterministic
or does it happen only periodically?
The reason all of the other threads a
I don't think that's the case.
spark-yarn contains `org.apache.spark.deploy.yarn` package, whereas
spark-network-yarn contains `org.apache.spark.network.yarn`, and they do
different things.
The former contains codes for deploying Spark applications to YARN cluster,
and called when running `spark