I am fairly new to spark streaming and i have a basic question on how spark
streaming works on s3 bucket which is periodically getting new files once
in 10 mins.
When i use spark streaming to process these files in this s3 path, will it
process all the files in this path (old+new files) every batch
There should be a spark-defaults.conf file somewhere in your machine;
that's where the config is. You can try to change it, but if you're
using some tool to manage configuration for you, your changes might
end up being overwritten, so be careful with that.
You can also try "--properties-file /blah
Thank you Marcelo. I don't know how to remove it. Could you please tell me
how I can remove that configuration?
On Mon, Jun 6, 2016 at 5:04 PM, Marcelo Vanzin wrote:
> This sounds like your default Spark configuration has an
> "enabledAlgorithms" config in the SSL settings, and that is listing
This sounds like your default Spark configuration has an
"enabledAlgorithms" config in the SSL settings, and that is listing an
algorithm name that is not available in jdk8. Either remove that
configuration (to use the JDK's default algorithm list), or change it
so that it lists algorithms supporte
Thank you Ted for the reference. I am going through it in detail.
Thank you Marco for your suggestion.
I created a properties file with these two lines
spark.driver.extraJavaOptions -Djsse.enableSNIExtension=false
spark.executor.extraJavaOptions -Djsse.enableSNIExtension=false
and gave this fil
HI
have you tried to add this flag?
-Djsse.enableSNIExtension=false
i had similar issues in another standalone application when i switched to
java8 from java7
hth
marco
On Mon, Jun 6, 2016 at 9:58 PM, Koert Kuipers wrote:
> mhh i would not be very happy if the implication is that i have to s
mhh i would not be very happy if the implication is that i have to start
maintaining separate spark builds for client clusters that use java 8...
On Mon, Jun 6, 2016 at 4:34 PM, Ted Yu wrote:
> Please see:
> https://spark.apache.org/docs/latest/security.html
>
> w.r.t. Java 8, probably you need
Please see:
https://spark.apache.org/docs/latest/security.html
w.r.t. Java 8, probably you need to rebuild 1.5.2 using Java 8.
Cheers
On Mon, Jun 6, 2016 at 1:19 PM, verylucky...@gmail.com <
verylucky...@gmail.com> wrote:
> Thank you for your response.
>
> I have seen this and couple of other s
Thank you for your response.
I have seen this and couple of other similar ones about java ssl in
general. However, I am not sure how it applies to Spark and specifically to
my case.
This error I mention above occurs when I switch from java 7 to java 8 by
changing the env variable JAVA_HOME.
The
Have you seen this ?
http://stackoverflow.com/questions/22423063/java-exception-on-sslsocket-creation
On Mon, Jun 6, 2016 at 12:31 PM, verylucky Man
wrote:
> Hi,
>
> I have a cluster (Hortonworks supported system) running Apache spark on
> 1.5.2 on Java 7, installed by admin. Java 8 is also ins
Hi,
I have a cluster (Hortonworks supported system) running Apache spark on
1.5.2 on Java 7, installed by admin. Java 8 is also installed.
I don't have admin access to this cluster and would like to run spark
(1.5.2 and later versions) on java 8.
I come from HPC/MPI background. So I naively copi
I'm not an expert on YARN so anyone please correct me if I'm wrong, but I
believe the Resource Manager will schedule the application to be run on the
AM of any node that has a Node Manager, depending on available resources.
So you would normally query the RM via the REST API to determine that. You
Can you give us a bit more information ?
how you packaged the code into jar
command you used for execution
version of Spark
related log snippet
Thanks
On Mon, Jun 6, 2016 at 10:43 AM, Daniel Haviv <
daniel.ha...@veracity-group.com> wrote:
> Hi,
> I'm wrapped the following code into a jar:
>
> v
Hi,
I'm wrapped the following code into a jar:
val test = sc.parallelize(Seq(("daniel", "a"), ("daniel", "b"), ("test", "1)")))
val agg = test.groupByKey()
agg.collect.foreach(r=>{println(r._1)})
The result of groupByKey is an empty RDD, when I'm trying the same
code using the spark-shell it's
That kind of stuff is likely fixed in 2.0. If you can get a reproduction
working there it would be very helpful if you could open a JIRA.
On Mon, Jun 6, 2016 at 7:37 AM, Richard Marscher
wrote:
> A quick unit test attempt didn't get far replacing map with as[], I'm only
> working against 1.6.1
On Mon, Jun 6, 2016 at 4:22 AM, shengzhixia wrote:
> In my previous Java project I can change class loader without problem. Could
> I know why the above method couldn't change class loader in spark shell?
> Any way I can achieve it?
The spark-shell for Scala 2.10 will reset the context class load
How can I specify the node where application master should run in the yarn
conf? I haven't found any useful information regarding that.
Thanks.
On Mon, Jun 6, 2016 at 4:52 PM, Bryan Cutler wrote:
> In that mode, it will run on the application master, whichever node that
> is as specified in you
In that mode, it will run on the application master, whichever node that is
as specified in your yarn conf.
On Jun 5, 2016 4:54 PM, "Saiph Kappa" wrote:
> Hi,
>
> In yarn-cluster mode, is there any way to specify on which node I want the
> driver to run?
>
> Thanks.
>
Hi, just to update the thread, i have just submited a simple wordcount job
using yarn using this command:
[cloudera@quickstart simple-word-count]$ spark-submit --class
com.example.Hello --master yarn --deploy-mode cluster --driver-memory
1024Mb --executor-memory 1G --executor-cores 1
target/scala-
A quick unit test attempt didn't get far replacing map with as[], I'm only
working against 1.6.1 at the moment though, I was going to try 2.0 but I'm
having a hard time building a working spark-sql jar from source, the only
ones I've managed to make are intended for the full assembly fat jar.
Exa
Hi,
I'd like to send some performance metrics from some of the transformations
to StatsD.
I understood that I should create a new connection to StatsD from each
transformation which I'm afraid would harm performance.
I've also read that there is a workaround for this in Scala by defining an
object
Hi,
Thanks for the quick replies. I've tried those suggestions but Eclipse is
showing:
*Unable** to find encoder for type stored in a Dataset. Primitive
types (Int, String, etc) and Product types (case classes) are supported by
importing sqlContext.implicits._ Support for serializing other
Hi,
I think encoders for case classes are already provided in spark. You'll
just need to import them.
val sql = new SQLContext(sc)
import sql.implicits._
And then do the cast to Dataset.
2016-06-06 14:13 GMT+02:00 Dave Maughan :
> Hi,
>
> I've figured out how to select data from a remo
Hi,
I've figured out how to select data from a remote Hive instance and encode
the DataFrame -> Dataset using a Java POJO class:
TestHive.sql("select foo_bar as `fooBar` from table1"
).as(Encoders.bean(classOf[Table1])).show()
However, I'm struggling to find out to do the equivalent in Scala
It's not that clear what you are trying to achieve - what type is myRDD and
where do k and v come from?
Anyway it seems you want to end up with a map or a dictionary which is what
PairRDD is for e.g.
val rdd = sc.makeRDD(Array("1","2","3"))
val pairRDD = rdd.map(el => (el.toInt, el))
-
Hello guys!
I am using spark shell which uses TranslatingClassLoader.
scala> Thread.currentThread().getContextClassLoader
res13: ClassLoader =
org.apache.spark.repl.SparkIMain$TranslatingClassLoader@23c767e6
For some reason I want to use another class loader, but when I do
val myclassloader =
have you tried master local that should work. This works as a test
${SPARK_HOME}/bin/spark-submit \
--driver-memory 2G \
--num-executors 1 \
--executor-memory 2G \
--master local[2] \
--executor-cores 2 \
Hi guys, i finally understand that i cannot use sbt-pack to use
programmatically the spark-streaming job as unix commands, i have to use
yarn or mesos in order to run the jobs.
I have some doubts, if i run the spark streaming jogs as yarn client mode,
i am receiving this exception:
[cloudera@qu
Hi,
I've set spark.broadcast.factory to
org.apache.spark.broadcast.HttpBroadcastFactory and it indeed resolve my
issue.
I'm creating a dataframe which creates a broadcast variable internally and
then fails due to the torrent broadcast with the following stacktrace:
Caused by: org.apache.spark.Spa
HI Ashok
this is not really a spark-related question so i would not use this
mailing list.
Anyway, my 2 cents here
as outlined by earlier replies, if the class you are referencing is in a
different jar, at compile time you will need to add that dependency to your
build.sbt,
I'd personally lea
I don't think that quote is true in general. Given a map-only task, or a
map+shuffle+reduce, I'd expect MapReduce to be the same speed or a little
faster. It is the simpler, lower-level, narrower mechanism. It's all
limited by how fast you can read/write data and execute the user code.
There's a b
Anyone can help me with this please
On Sunday, 5 June 2016, 11:06, Ashok Kumar wrote:
Hi all,
Appreciate any advice on this. It is about scala
I have created a very basic Utilities.scala that contains a test class and
method. I intend to add my own classes and methods as I expand and ma
33 matches
Mail list logo