Tim
>
> On Sat, Jan 30, 2016 at 8:19 AM, PhuDuc Nguyen
> wrote:
>
>> I have a spark job running on Mesos in multi-master and supervise mode.
>> If I kill it, it is resilient as expected and respawns on another node.
>> However, I cannot kill it whe
I have a spark job running on Mesos in multi-master and supervise mode. If
I kill it, it is resilient as expected and respawns on another node.
However, I cannot kill it when I need to. I have tried 2 methods:
1) ./bin/spark-class org.apache.spark.deploy.Client kill
2) ./bin/spark-submit --mast
Vivek,
Did you say you have 8 spark jobs that are consuming from the same topic
and all jobs are using the same consumer group name? If so, each job would
get a subset of messages from that kafka topic, ie each job would get 1 out
of 8 messages from that topic. Is that your intent?
regards,
Duc
There is a way and it's called "map-side-join". To be clear, there is no
explicit function call/API to execute a map-side-join. You have to code it
using a local/broadcast value combined with the map() function. A caveat
for this to work is that one side of the join must be small-ish to exist as
a
Have you tried setting spark.mesos.uri property like
val conf = new SparkConf().set("spark.mesos.uris", ...)
val sc = new SparkContext(conf)
...
http://spark.apache.org/docs/latest/running-on-mesos.html
HTH,
Duc
On Thu, Dec 10, 2015 at 1:04 PM, PHELIPOT, REMY
wrote:
> Hello!
>
> I'm usi
Kafka Receiver-based approach:
This will maintain the consumer offsets in ZK for you.
Kafka Direct approach:
You can use checkpointing and that will maintain consumer offsets for you.
You'll want to checkpoint to a highly available file system like HDFS or S3.
http://spark.apache.org/docs/latest/s
I believe the "Streaming" tab is dynamic - it appears once you have a
streaming job running, not when the cluster is simply up. It does not
depend on 1.6 and has been in there since at least 1.0.
HTH,
Duc
On Fri, Dec 4, 2015 at 7:28 AM, patcharee wrote:
> Hi,
>
> We tried to get the streaming t
Kafka only guarantees ordering within a single partition in a topic, not
for an entire topic. Unless you're creating topics in Kafka with only a
single partition (you probably shouldn't be doing this), messages won't be
served to consumers as FIFO. As for Spark, there are many operations that
will
You should try passing your solr writer into rdd.foreachPartition() for max
parallelism - each partition on each executor will execute the function
passed in.
HTH,
Duc
On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar
wrote:
> Any input/suggestions on parallelizing below operations using Spark ove
You can try this for G1GC:
.../spark-submit --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
-XX:+UseCompressedOops -XX:-UseGCOverheadLimit" ...
However, I would suggest ensuring your job is properly tuned. If you're
experiencing 60% GC in a task it's likely garbage collection is not the
probl
o get the scheduling delay and processing times,
> and use that do a request or kill executors.
>
> TD
>
> On Wed, Nov 11, 2015 at 9:48 AM, PhuDuc Nguyen
> wrote:
>
>> Dean,
>>
>> Thanks for the reply. I'm searching (via spark mailing list archive and
>> g
h.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Wed, Nov 11, 2015 at 8:09 AM, PhuD
I'm trying to get Spark Streaming to scale up/down its number of executors
within Mesos based on workload. It's not scaling down. I'm using Spark
1.5.1 reading from Kafka using the direct (receiver-less) approach.
Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287 with
the right
13 matches
Mail list logo