Can someone please help me on this?
bit1...@163.com
发件人: bit1...@163.com
发送时间: 2015-05-24 13:53
收件人: user
主题: How to use zookeeper in Spark Streaming
Hi,
In my spark streaming application, when the application starts and get running,
the Tasks running on the Worker nodes need to be
Can someone help take a look at my questions? Thanks.
bit1...@163.com
From: bit1...@163.com
Date: 2015-05-29 18:57
To: user
Subject: How Broadcast variable works
Hi,
I have a spark streaming application. SparkContext uses broadcast vriables to
broadcast Configuration information that each
still get good response times, without waiting for the
long job to finish. This mode is best for multi-user settings
bit1...@163.com
Hi,
I am looking for some articles/blogs on the topic about how spark handles the
various failures,such as Driver,Worker,Executor, Task..etc
Are there some articles/blogs on this topic? Detailes into source code would be
the best.
Thanks very much!
bit1...@163.com
hanks.
BTW, BlockManagerMaster is there, it makes no sense that BlockManagerWorker is
gone.
bit1...@163.com
ns, So, in my opinion it
should be about 600M * 2, Looks some compression happens under the scene or
something else?
Thanks!
bit1...@163.com
Could someone help explain what happens that leads to the Task not serializable
issue?
Thanks.
bit1...@163.com
From: bit1...@163.com
Date: 2015-06-08 19:08
To: user
Subject: Wired Problem: Task not serializable[Spark Streaming]
Hi,
With the following simple code, I got an exception that
ta.
2. From the user end, since tasks may process already processed data, user end
should detect that some data has already been processed,eg,
use some unique ID.
Not sure if I have understood correctly.
bit1...@163.com
From: prajod.vettiyat...@wipro.com
Date: 2015-06-18 16:56
To:
!
bit1...@163.com
Thank you for the reply.
"Run the application locally" means that I run the application in my IDE with
master as local[*].
When spark stuff is marked as provided, then I can't run it because the spark
stuff is missing.
So, how do you work around this? Thanks!
bit1...
ClusterRun
provided
bit1...@163.com
From: prajod.vettiyat...@wipro.com
Date: 2015-06-19 15:22
To: bit1...@163.com; ak...@sigmoidanalytics.com
CC: user@spark.apache.org
Subject: RE: Re: Build spark application into uber jar
Hi,
When running inside Eclipse IDE, I use another maven target
Sure, Thanks Projod for the detailed steps!
bit1...@163.com
From: prajod.vettiyat...@wipro.com
Date: 2015-06-19 16:56
To: bit1...@163.com; ak...@sigmoidanalytics.com
CC: user@spark.apache.org
Subject: RE: RE: Build spark application into uber jar
Multiple maven profiles may be the ideal way
tics; if it set "largest", then it will be at most once semantics?
bit1...@163.com
From: Haopu Wang
Date: 2015-06-19 18:47
To: Enno Shioji; Tathagata Das
CC: prajod.vettiyat...@wipro.com; Cody Koeninger; bit1...@163.com; Jordan
Pilat; Will Briggs; Ashish Soni; ayan guha; user@spark
how to suppress it
bit1...@163.com
Hi, Akhil,
Thank you for the explanation!
bit1...@163.com
From: Akhil Das
Date: 2015-06-23 16:29
To: bit1...@163.com
CC: user
Subject: Re: What does [Stage 0:> (0 + 2) / 2] mean on the console
Well, you could that (Stage information) is an ASCII representation of the
WebUI (running on p
Hi,
I am using spark1.3.1, and have 2 receivers,
On the web UI, I can only see the total records received by all these 2
receivers, but I can't figure out the records received by individual receiver?
Not sure whether the information is shown on the UI in spark1.4.
bit1...@163.com
I am kind of consused about when cached RDD will unpersist its data. I know we
can explicitly unpersist it with RDD.unpersist ,but can it be unpersist
automatically by the spark framework?
Thanks.
bit1...@163.com
What do you mean by "new file", do you upload an already existing file onto
HDFS or create a new one locally and then upload it to HDFS?
bit1...@163.com
From: ravi tella
Date: 2015-06-30 09:59
To: user
Subject: spark streaming HDFS file issue
I am running a spark streaming ex
received records are many more than
processed records, I can't understand why the total delay or scheduling day is
not obvious(5 secs) here.
Can someone help explain what clues from this UI?
Thanks.
bit1...@163.com
1.run(DriverRunner.scala:72)
bit1...@163.com
Thanks Shixiong for the reply.
Yes, I confirm that the file exists there ,simply checks with ls -l
/data/software/spark-1.3.1-bin-2.4.0/applications/pss.am.core-1.0-SNAPSHOT-shaded.jar
bit1...@163.com
From: Shixiong Zhu
Date: 2015-07-06 18:41
To: bit1...@163.com
CC: user
Subject: Re
with fewer cores, but I didn't get a chance to
try/test it.
Thanks.
bit1...@163.com
rdd = sc.parallelize(List(1, 2, 3, 4, 5))
val heavyOpRDD = rdd.map(squareWithHeavyOp)
heavyOpRDD.checkpoint()
heavyOpRDD.foreach(println)
println("Job 0 has been finished, press ENTER to do job 1")
readLine()
heavyOpRDD.foreach(println)
}
}
bit1...@163.com
Hi,
I don't get a good understanding how RDD lineage works, so I would ask whether
spark provides a unit test in the code base to illustrate how RDD lineage works.
If there is, What's the class name is it?
Thanks!
bit1...@163.com
Thanks TD and Zhihong for the guide. I will check it
bit1...@163.com
From: Tathagata Das
Date: 2015-07-31 12:27
To: Ted Yu
CC: bit1...@163.com; user
Subject: Re: How RDD lineage works
You have to read the original Spark paper to understand how RDD lineage works.
https://www.cs.berkeley.edu
that partition. Thus, lost data can be recovered, often quite quickly,
without requiring costly replication.
bit1...@163.com
From: bit1...@163.com
Date: 2015-07-31 13:11
To: Tathagata Das; yuzhihong
CC: user
Subject: Re: Re: How RDD lineage works
Thanks TD and Zhihong for the guide. I will
Thanks TD, I have got some understanding now.
bit1...@163.com
From: Tathagata Das
Date: 2015-07-31 13:45
To: bit1...@163.com
CC: yuzhihong; user
Subject: Re: Re: How RDD lineage works
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FailureSuite.scala
This may
I have also thought that Hadoop mapper output result is saved on HDFS, at least
if the job only has Mapper but doesn't have Reducer.
If there is reducer, then the map output will be saved on local disk?
From: Shao, Saisai
Date: 2015-01-26 15:23
To: Larry Liu
CC: u...@spark.incubator.apache.o
You can use prebuilt version that is built upon hadoop2.4.
From: Siddharth Ubale
Date: 2015-01-30 15:50
To: user@spark.apache.org
Subject: Hi: hadoop 2.5 for spark
Hi ,
I am beginner with Apache spark.
Can anyone let me know if it is mandatory to build spark with the Hadoop
version I am us
Hi,
I am trying Spark Streaming + Flume example:
1. Code
object SparkFlumeNGExample {
def main(args : Array[String]) {
val conf = new SparkConf().setAppName("SparkFlumeNGExample")
val ssc = new StreamingContext(conf, Seconds(10))
val lines = FlumeUtils.createStream(ssc,"localhost"
14:31
To: bit1...@163.com
CC: user
Subject: Re: Question about spark streaming+Flume
Hi
Can you try this
val lines = FlumeUtils.createStream(ssc,"localhost",)
// Print out the count of events received from this server in each batch
lines.count().map(cnt => "Received
Ok, you are missing a letter in foreachRDD.. let me proceed..
bit1...@163.com
From: Arush Kharbanda
Date: 2015-02-17 14:31
To: bit1...@163.com
CC: user
Subject: Re: Question about spark streaming+Flume
Hi
Can you try this
val lines = FlumeUtils.createStream(ssc,"localhost&
Hi Arush,
With your code, I still didn't see the output "Received X flumes events"..
bit1...@163.com
From: bit1...@163.com
Date: 2015-02-17 14:08
To: Arush Kharbanda
CC: user
Subject: Re: Re: Question about spark streaming+Flume
Ok, you are missing a letter in foreachRDD..
But I am able to run the SparkPi example:
./run-example SparkPi 1000 --master spark://192.168.26.131:7077
Result:Pi is roughly 3.14173708
bit1...@163.com
From: bit1...@163.com
Date: 2015-02-18 16:29
To: user
Subject: Problem with 1 master + 2 slaves cluster
Hi sparkers,
I setup a spark(1.2.1
Sure, thanks Akhil.
A further question : Is local file system(file:///) not supported in standalone
cluster?
bit1...@163.com
From: Akhil Das
Date: 2015-02-18 17:35
To: bit1...@163.com
CC: user
Subject: Re: Problem with 1 master + 2 slaves cluster
Since the cluster is standalone, you are
I am using spark 1.2.0(prebuild with hadoop 2.4) on windows7
I found a same bug here https://issues.apache.org/jira/browse/SPARK-4208,but it
is still open, is there a workaround for this? Thanks!
The stack trace:
StackOverflow Exception occurs
Exception in thread "main" java.lang.StackOverflowEr
Hi,
I am trying the spark streaming log analysis reference application provided by
Databricks at
https://github.com/databricks/reference-apps/tree/master/logs_analyzer
When I deploy the code to the standalone cluster, there is no output at will
with the following shell script.Which means, the w
only be allocated one processor.
This leads to me another question:
Although I have only one core, If I have specified the master and executor as
--master local[3] --executor-memory 512M --total-executor-cores 3. Since I have
only one core, why does this work?
bit1...@163.com
From: Akhil
Thanks Akhil.
From: Akhil Das
Date: 2015-02-20 16:29
To: bit1...@163.com
CC: user
Subject: Re: Re: Spark streaming doesn't print output when working with
standalone master
local[3] spawns 3 threads on 1 core :)
Thanks
Best Regards
On Fri, Feb 20, 2015 at 12:50 PM, bit1...@163.com
Hi,
In the spark streaming application, I write the code,
FlumeUtils.createStream(ssc,"localhost",),which means spark will listen on
the port, and wait for Flume Sink to write to it.
My question is: when I submit the application to the Spark Standalone cluster,
will be opened onl
"main" java.net.ConnectException: Call From
hadoop.master/192.168.26.137 to hadoop.master:9000 failed on connection
exception.
From the exception, it tries to connect to 9000 which is for Hadoop/HDFS. and I
don't use Hadoop at all in my code(such as save to HDFS).
bit1...@163.com
:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 32 more
bit1...@163.com
From: Ted Yu
Date: 2015-02-24 10:24
To: bit1...@163.com
CC: user
Subject: Re: Does Spark Streaming depend on Hadoop?
Can you pastebin the whole stack trace ?
Thanks
On Feb 23, 2015, at 6:14 PM, "
akfa has 6 partitions, here create 6 Receiver
val streams = (1 to 6).map ( _ =>
KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
)
//repartition to 18, 3 times of the receiver
val partitions = ssc.union(streams).repartition(18).map("DataReceived: " + _)
partitions.print()
ssc.start()
ssc.awaitTermination()
}
}
bit1...@163.com
guys
elaborate on this. Thank
bit1...@163.com
Thanks Tathagata! You are right, I have packaged the contents of the spark
shipped example jar into my jarwhich contains serveral HDFS configuration
files like hdfs-default.xml etc. Thanks!
bit1...@163.com
From: Tathagata Das
Date: 2015-02-24 12:04
To: bit1...@163.com
CC: yuzhihong
Thanks both of you guys on this!
bit1...@163.com
From: Akhil Das
Date: 2015-02-24 12:58
To: Tathagata Das
CC: user; bit1129
Subject: Re: About FlumeUtils.createStream
I see, thanks for the clarification TD.
On 24 Feb 2015 09:56, "Tathagata Das" wrote:
Akhil, that is incorrect.
ill stay on one cluster node, or will they distributed
among the cluster nodes?
bit1...@163.com
From: Akhil Das
Date: 2015-02-24 12:58
To: Tathagata Das
CC: user; bit1129
Subject: Re: About FlumeUtils.createStream
I see, thanks for the clarification TD.
On 24 Feb 2015 09:56, "Tatha
The behvior is exactly what I expected. Thanks Akhil and Tathagata!
bit1...@163.com
From: Akhil Das
Date: 2015-02-24 13:32
To: bit1129
CC: Tathagata Das; user
Subject: Re: Re: About FlumeUtils.createStream
That depends on how many machines you have in your cluster. Say you have 6
workers and
Thanks Akhil.
Not sure whether thelowlevel consumer.will be officially supported by Spark
Streaming. So far, I don't see it mentioned/documented in the spark streaming
programming guide.
bit1...@163.com
From: Akhil Das
Date: 2015-02-24 16:21
To: bit1...@163.com
CC: user
Subject: Re:
My understanding is if you run multi applications on the work node, then each
application will have an executorbackend process and an executor as well.
bit1...@163.com
From: Judy Nash
Date: 2015-02-26 09:58
To: user@spark.apache.org
Subject: spark standalone with multiple executors in one
Sure, Thanks Tathagata!
bit1...@163.com
From: Tathagata Das
Date: 2015-02-26 14:47
To: bit1...@163.com
CC: Akhil Das; user
Subject: Re: Re: Many Receiver vs. Many threads per Receiver
Spark Streaming has a new Kafka direct stream, to be release as experimental
feature with 1.3. That uses a
Hi ,
I know that spark on yarn has a configuration parameter(executor-cores NUM) to
specify the number of cores per executor.
How about spark standalone? I can specify the total cores, but how could I know
how many cores each executor will take(presume one node one executor)?
bit1...@163
and Hive on
Hadoop?
2. Does Hive in the spark assembly use Spark execution engine or Hadoop MR
engine?
Thanks.
bit1...@163.com
Can anyone have a look on this question? Thanks.
bit1...@163.com
From: bit1...@163.com
Date: 2015-03-13 16:24
To: user
Subject: Explanation on the Hive in the Spark assembly
Hi, sparkers,
I am kind of confused about hive in the spark assembly. I think hive in the
spark assembly is not the
as enough resources for
the application.
My question is: Assume that the data the application will process is spread on
all the worker nodes, then the data locality is lost if using the above policy?
Not sure whether I have unstandood correctly or I have missed something.
bit1...@163.com
Thanks Daoyuan.
What do you mean by running some native command, I never thought that hive will
run without an computing engine like Hadoop MR or spark. Thanks.
bit1...@163.com
From: Wang, Daoyuan
Date: 2015-03-13 16:39
To: bit1...@163.com; user
Subject: RE: Explanation on the Hive in the
Thanks Cheng for the great explanation!
bit1...@163.com
From: Cheng Lian
Date: 2015-03-16 00:53
To: bit1...@163.com; Wang, Daoyuan; user
Subject: Re: Explanation on the Hive in the Spark assembly
Spark SQL supports most commonly used features of HiveQL. However, different
HiveQL statements
se).
Thanks.
bit1...@163.com
From: eric wong
Date: 2015-03-14 22:36
To: bit1...@163.com; user
Subject: Re: How does Spark honor data locality when allocating computing
resources for an application
you seem like not to note the configuration varible "spreadOutApps"
And it's comme
Please make sure that you have given more cores than Receiver numbers.
From: James King
Date: 2015-04-01 15:21
To: user
Subject: Spark + Kafka
I have a simple setup/runtime of Kafka and Sprak.
I have a command line consumer displaying arrivals to Kafka topic. So i know
messages are being rec
tches: 23
Waiting batches: 1
Received records: 0
Processed records: 0
bit1...@163.com
Thanks Tathagata for the explanation!
bit1...@163.com
From: Tathagata Das
Date: 2015-04-04 01:28
To: Ted Yu
CC: bit1129; user
Subject: Re: About Waiting batches on the spark streaming UI
Maybe that should be marked as waiting as well. Will keep that in mind. We plan
to update the ui soon, so
Looks the message is consumed by the another console?( can see messages typed
on this port from another console.)
bit1...@163.com
From: Shushant Arora
Date: 2015-04-15 17:11
To: Akhil Das
CC: user@spark.apache.org
Subject: Re: spark streaming printing no output
When I launched spark-shell
Hi,
I am frequently asked why spark is also much faster than Hadoop MapReduce on
disk (without the use of memory cache). I have no convencing answer for this
question, could you guys elaborate on this? Thanks!
Is it? I learned somewhere else that spark's speed is 5~10 times faster than
Hadoop MapReduce.
bit1...@163.com
From: Ilya Ganelin
Date: 2015-04-28 10:55
To: bit1...@163.com; user
Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk
I believe the typical answer is
es used is 5. I think
the memory used should be executor-memory*numOfWorkers=3G*3=9G, and the Vcores
used shoulde be executor-cores*numOfWorkers=6
Can you please explain the result?Thanks.
bit1...@163.com
Looks to me that the same thing also applies to the SparkContext.textFile or
SparkContext.wholeTextFile, there is no way in RDD to figure out the file
information where the data in RDD is from
bit1...@163.com
From: Saisai Shao
Date: 2015-04-29 10:10
To: lokeshkumar
CC: spark users
Subject
For the SparkContext#textFile, if a directory is given as the path parameter
,then it will pick up the files in the directory, so the same thing will occur.
bit1...@163.com
From: Saisai Shao
Date: 2015-04-29 10:54
To: Vadim Bichutskiy
CC: bit1...@163.com; lokeshkumar; user
Subject: Re: Re
Correct myself:
For the SparkContext#wholeTextFile, the RDD's elements are kv pairs, the key is
the file path, and the value is the file content
So,for the SparkContext#wholeTextFile, the RDD has already carried the file
information.
bit1...@163.com
From: Saisai Shao
Date: 2015-04-29
Thanks Sandy, it is very useful!
bit1...@163.com
From: Sandy Ryza
Date: 2015-04-29 15:24
To: bit1...@163.com
CC: user
Subject: Re: Question about Memory Used and VCores Used
Hi,
Good question. The extra memory comes from spark.yarn.executor.memoryOverhead,
the space used for the
The error hints that the maven module scala-compiler can't be fetched from
repo1.maven.org. Should some repositoy urls be added to the Maven's settings
file?
bit1...@163.com
From: Manoj Kumar
Date: 2015-01-03 18:46
To: user
Subject: Unable to build spark from source
Hello,
there something missing? I am using Spark 1.2.0.
Thanks.
bit1...@163.com
This is a noise,please ignore
I figured out what happens...
bit1...@163.com
From: bit1...@163.com
Date: 2015-01-03 19:03
To: user
Subject: sqlContext is undefined in the Spark Shell
Hi,
In the spark shell, I do the following two things:
1. scala> val cxt =
Thank you, Tobias. I will look into the Spark paper. But it looks that the
paper has been moved,
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf.
A web page is returned (Resource not found)when I access it.
bit1...@163.com
From: Tobias Pfeiffer
Date: 2015-01-07 09:24
To: Todd
Thanks Eric. Yes..I am Chinese, :-). I will read through the articles, thank
you!
bit1...@163.com
From: eric wong
Date: 2015-01-07 10:46
To: bit1...@163.com
CC: user
Subject: Re: Re: I think I am almost lost in the internals of Spark
A good beginning if you are chinese.
https://github.com
Hi,
When I fetch the Spark code base and import into Intellj Idea as SBT project,
then I build it with SBT, but there is compiling errors in the examples
module,complaining that the EventBatch and SparkFlumeProtocol,looks they should
be in
org.apache.spark.streaming.flume.sink package.
Not su
When I run the following spark sql example within Idea, I got the
StackOverflowError, lookes like the scala.util.parsing.combinator.Parsers are
calling recursively and infinitely.
Anyone encounters this?
package spark.examples
import org.apache.spark.{SparkContext, SparkConf}
import org.apa
76 matches
Mail list logo