I would just like to add that we do the very same/similar thing at Webtrends,
updateStateByKey has been a life-saver for our sessionization use-cases.
Cheers,
Sean
On Jul 15, 2015, at 11:20 AM, Silvio Fiorito
mailto:silvio.fior...@granturing.com>> wrote:
Hi Cody,
I’ve had success using upda
.map is just a transformation, so no work will actually be performed until
something takes action against it. Try adding a .count(), like so:
inputRDD.map { x => {
counter += 1
} }.count()
In case it is helpful, here are the docs on what exactly the transformations
and actions are:
htt
We have gone down a similar path at Webtrends, Spark has worked amazingly well
for us in this use case. Our solution goes from REST, directly into spark, and
back out to the UI instantly.
Here is the resulting product in case you are curious (and please pardon the
self promotion):
https://www
Hi Kane-
http://spark.apache.org/docs/latest/tuning.html has excellent information that
may be helpful. In particular increasing the number of tasks may help, as well
as confirming that you don’t have more data than you're expecting landing on a
key.
Also, if you are using spark < 1.2.0, set
> Can the 0.8.1.1 client still talk to 0.8.0 versions of Kafka
Yes it can.
"0.8.1 is fully compatible with 0.8." It is buried on this page:
http://kafka.apache.org/documentation.html
In addition to the pom version bump SPARK-2492 would bring the kafka streaming
receiver (which was originally
Are you using spark streaming?
On Oct 14, 2014, at 10:35 AM, Salman Haq wrote:
> Hi,
>
> In my application, I am successfully using foreachPartition to write large
> amounts of data into a Cassandra database.
>
> What is the recommended practice if the application wants to know that the
> ta
I’ve run into this as well. I haven’t had a chance to troubleshoot what
exactly was going on, but I got around it by building my app as a single
uberjar.
Sean
On Oct 13, 2014, at 6:40 PM, HARIPRIYA AYYALASOMAYAJULA
mailto:aharipriy...@gmail.com>> wrote:
Helo,
Can you check if the jar file
ache.org/jira/browse/SPARK-2492).
Thanks
Jerry
From: Abraham Jacob [mailto:abe.jac...@gmail.com<mailto:abe.jac...@gmail.com>]
Sent: Saturday, October 11, 2014 6:57 AM
To: Sean McNamara
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark Streaming KafkaUtils I
ion {
return tuple2;
}
}
)
);
}
JavaPairDStream unifiedStream;
if (kafkaStreams.size() > 1) {
unifiedStream = jssc.union(kafkaStreams.get(0), kafkaStreams.subList(1,
kafkaStreams.size()));
} else {
unifiedStream = kafkaStreams.get(0);
}
unifiedStream.print();
jssc.start();
jssc.
Would you mind sharing the code leading to your createStream? Are you also
setting group.id?
Thanks,
Sean
On Oct 10, 2014, at 4:31 PM, Abraham Jacob wrote:
> Hi Folks,
>
> I am seeing some strange behavior when using the Spark Kafka connector in
> Spark streaming.
>
> I have a Kafka top
unless task scheduling time
> beats spark.locality.wait. Can cause overall low performance for all
> subsequent tasks.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> O
We have a use case where we’d like something to execute once on each node and I
thought it would be good to ask here.
Currently we achieve this by setting the parallelism to the number of nodes and
use a mod partitioner:
val balancedRdd = sc.parallelize(
(0 until Settings.parallelism)
12 matches
Mail list logo