Karen,
It looks like the Kafka version is incorrect. You mention Kafka 0.10
however the classpath references Kafka 0.9
Thanks,
David
On July 10, 2017 at 1:44:06 PM, karan alang (karan.al...@gmail.com) wrote:
Hi All,
I'm running Spark Streaming - Kafka integration using Spark 2.x & Kafka 10.
&
Hi Ben,
This seems more like a question for community.cloudera.com. However, it would
be in hbase not spark I believe.
https://repository.cloudera.com/artifactory/webapp/#/artifacts/browse/tree/General/cloudera-release-repo/org/apache/hbase/hbase-spark
David Newberger
-Original Message
DataFrame is a collection of data which is organized into named columns.
DataFrame.write is an interface for saving the contents of a DataFrame to
external storage.
Hope this helps
David Newberger
From: pseudo oduesp [mailto:pseudo20...@gmail.com]
Sent: Thursday, June 16, 2016 9:43 AM
To
Try adding wordCounts.print() before ssc.start()
David Newberger
From: Lee Ho Yeung [mailto:jobmatt...@gmail.com]
Sent: Wednesday, June 15, 2016 9:16 PM
To: David Newberger
Cc: user@spark.apache.org
Subject: Re: streaming example has error
got another error StreamingContext: Error starting the
the maximum amount of CPU cores to request for the application from across the
cluster (not from each machine). If not set, the default will
bespark.deploy.defaultCores on Spark's standalone cluster manager, or infinite
(all available cores) on Mesos.”
David Newberger
From: agateaaa [mail
x27;ll
process the RDD further.
David Newberger
-Original Message-
From: Yogesh Vyas [mailto:informy...@gmail.com]
Sent: Wednesday, June 15, 2016 8:30 AM
To: David Newberger
Subject: Re: Handle empty kafka in Spark Streaming
I am looking for something which checks the JavaPairReceiverI
If you're asking how to handle no messages in a batch window then I would add
an isEmpty check like:
dStream.foreachRDD(rdd => {
if (!rdd.isEmpty())
...
}
Or something like that.
David Newberger
-Original Message-
From: Yogesh Vyas [mailto:informy...@gmail.com]
Sent: W
Have you tried to “set spark.driver.allowMultipleContexts = true”?
David Newberger
From: Lee Ho Yeung [mailto:jobmatt...@gmail.com]
Sent: Tuesday, June 14, 2016 8:34 PM
To: user@spark.apache.org
Subject: streaming example has error
when simulate streaming with nc -lk
got error below,
then
Could you be looking at 2 jobs trying to use the same file and one getting to
it before the other and finally removing it?
David Newberger
From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Wednesday, June 8, 2016 1:33 PM
To: user; user @spark
Subject: Creating a Hive table through
Hi Mich,
My gut says you are correct that each application should have its own
checkpoint directory. Though honestly I’m a bit fuzzy on checkpointing still as
I’ve not worked with it much yet.
Cheers,
David Newberger
From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Friday, June
I was going to ask if you had 2 jobs running. If the checkpointing for both are
setup to look at the same location I could see an error like this happening. Do
both spark jobs have a reference to a checkpointing dir?
David Newberger
From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent
Have you tried UseG1GC in place of UseConcMarkSweepGC? This article really
helped me with GC a few short weeks ago
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html
David Newberger
-Original Message-
From: Marco1982 [mailto:marco.plata
rk, it is cloned and can no
longer be modified by the user. Spark does not support modifying the
configuration at runtime.
“
David Newberger
From: Alonso Isidoro Roman [mailto:alons...@gmail.com]
Sent: Friday, June 3, 2016 10:37 AM
To: David Newberger
Cc: user@spark.apache.org
Subject: Re: About
What does your processing time look like. Is it consistently within that 20sec
micro batch window?
David Newberger
From: Adrian Tanase [mailto:atan...@adobe.com]
Sent: Friday, June 3, 2016 8:14 AM
To: user@spark.apache.org
Cc: Cosmin Ciobanu
Subject: [REPOST] Severe Spark Streaming performance
Alonso,
The CDH VM uses YARN and the default deploy mode is client. I’ve been able to
use the CDH VM for many learning scenarios.
http://www.cloudera.com/documentation/enterprise/latest.html
http://www.cloudera.com/documentation/enterprise/latest/topics/spark.html
David Newberger
From
Have you tried it without either of the setMaster lines?
Also, CDH 5.7 uses spark 1.6.0 with some patches. I would recommend using the
cloudera repo for spark files in build sbt. I’d also check other files in the
build sbt to see if there are cdh specific versions.
David Newberger
From
Is
https://github.com/alonsoir/awesome-recommendation-engine/blob/master/build.sbt
the build.sbt you are using?
David Newberger
QA Analyst
WAND - The Future of Restaurant Technology
(W) www.wandcorp.com<http://www.wandcorp.com/>
(E) david.newber...@wandcorp.com<mailto:dav
Hi All,
The error you are seeing looks really similar to Spark-13514 to me. I could be
wrong though
https://issues.apache.org/jira/browse/SPARK-13514
Can you check yarn.nodemanager.local-dirs in your YARN configuration for
"file://"
Cheers!
David Newberger
-Original Message
Can we assume your question is “Will Spark replace Hadoop MapReduce?” or do you
literally mean replacing the whole of Hadoop?
David
From: Ashok Kumar [mailto:ashok34...@yahoo.com.INVALID]
Sent: Thursday, April 14, 2016 2:13 PM
To: User
Subject: Spark replacing Hadoop
Hi,
I hear that some sayin
Hi Natu,
I believe you are correct one RDD would be created for each file.
Cheers,
David
From: Natu Lauchande [mailto:nlaucha...@gmail.com]
Sent: Tuesday, April 12, 2016 1:48 PM
To: David Newberger
Cc: user@spark.apache.org
Subject: Re: DStream how many RDD's are created by batch
Hi
Hi,
Time is usually the criteria if I’m understanding your question. An RDD is
created for each batch interval. If your interval is 500ms then an RDD would be
created every 500ms. If it’s 2 seconds then an RDD is created every 2 seconds.
Cheers,
David
From: Natu Lauchande [mailto:nlaucha...@g
used this approach yet
and if so what has you experience been with using it? If it helps we'd be
looking to implement it using Scala. Secondly, in general what has people
experience been with using experimental features in Spark?
Cheers,
David Newberger
Hi Eran,
Based on the limited information the first things that come to my mind are
Processor, RAM, and Disk speed.
David Newberger
QA Analyst
WAND - The Future of Restaurant Technology
(W) www.wandcorp.com<http://www.wandcorp.com/>
(E) david.newber...@wandcorp.com<mailto:dav
23 matches
Mail list logo