from:"PhuDuc Nguyen"

Re: can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen

Tim > > On Sat, Jan 30, 2016 at 8:19 AM, PhuDuc Nguyen > wrote: > >> I have a spark job running on Mesos in multi-master and supervise mode. >> If I kill it, it is resilient as expected and respawns on another node. >> However, I cannot kill it whe

can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen

I have a spark job running on Mesos in multi-master and supervise mode. If I kill it, it is resilient as expected and respawns on another node. However, I cannot kill it when I need to. I have tried 2 methods: 1) ./bin/spark-class org.apache.spark.deploy.Client kill 2) ./bin/spark-submit --mast

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread PhuDuc Nguyen

Vivek, Did you say you have 8 spark jobs that are consuming from the same topic and all jobs are using the same consumer group name? If so, each job would get a subset of messages from that kafka topic, ie each job would get 1 out of 8 messages from that topic. Is that your intent? regards, Duc

Re: Preventing an RDD from shuffling

2015-12-16 Thread PhuDuc Nguyen

There is a way and it's called "map-side-join". To be clear, there is no explicit function call/API to execute a map-side-join. You have to code it using a local/broadcast value combined with the map() function. A caveat for this to work is that one side of the join must be small-ish to exist as a

Re: [mesos][docker] addFile doesn't work properly

2015-12-10 Thread PhuDuc Nguyen

Have you tried setting spark.mesos.uri property like val conf = new SparkConf().set("spark.mesos.uris", ...) val sc = new SparkContext(conf) ... http://spark.apache.org/docs/latest/running-on-mesos.html HTH, Duc On Thu, Dec 10, 2015 at 1:04 PM, PHELIPOT, REMY wrote: > Hello! > > I'm usi

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread PhuDuc Nguyen

Kafka Receiver-based approach: This will maintain the consumer offsets in ZK for you. Kafka Direct approach: You can use checkpointing and that will maintain consumer offsets for you. You'll want to checkpoint to a highly available file system like HDFS or S3. http://spark.apache.org/docs/latest/s

Re: Spark UI - Streaming Tab

2015-12-04 Thread PhuDuc Nguyen

I believe the "Streaming" tab is dynamic - it appears once you have a streaming job running, not when the cluster is simply up. It does not depend on 1.6 and has been in there since at least 1.0. HTH, Duc On Fri, Dec 4, 2015 at 7:28 AM, patcharee wrote: > Hi, > > We tried to get the streaming t

Re: Spark DStream Data stored out of order in Cassandra

2015-11-30 Thread PhuDuc Nguyen

Kafka only guarantees ordering within a single partition in a topic, not for an entire topic. Unless you're creating topics in Kafka with only a single partition (you probably shouldn't be doing this), messages won't be served to consumers as FIFO. As for Spark, there are many operations that will

Re: Parallelizing operations using Spark

2015-11-17 Thread PhuDuc Nguyen

You should try passing your solr writer into rdd.foreachPartition() for max parallelism - each partition on each executor will execute the function passed in. HTH, Duc On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar wrote: > Any input/suggestions on parallelizing below operations using Spark ove

Re: spark 1.4 GC issue

2015-11-15 Thread PhuDuc Nguyen

You can try this for G1GC: .../spark-submit --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+UseCompressedOops -XX:-UseGCOverheadLimit" ... However, I would suggest ensuring your job is properly tuned. If you're experiencing 60% GC in a task it's likely garbage collection is not the probl

Re: dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen

o get the scheduling delay and processing times, > and use that do a request or kill executors. > > TD > > On Wed, Nov 11, 2015 at 9:48 AM, PhuDuc Nguyen > wrote: > >> Dean, >> >> Thanks for the reply. I'm searching (via spark mailing list archive and >> g

Re: dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen

h.D. > Author: Programming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Wed, Nov 11, 2015 at 8:09 AM, PhuD

dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen

I'm trying to get Spark Streaming to scale up/down its number of executors within Mesos based on workload. It's not scaling down. I'm using Spark 1.5.1 reading from Kafka using the direct (receiver-less) approach. Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287 with the right

Re: can't kill spark job in supervise mode

can't kill spark job in supervise mode

Re: Spark Streaming + Kafka + scala job message read issue

Re: Preventing an RDD from shuffling

Re: [mesos][docker] addFile doesn't work properly

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

Re: Spark UI - Streaming Tab

Re: Spark DStream Data stored out of order in Cassandra

Re: Parallelizing operations using Spark

Re: spark 1.4 GC issue

Re: dynamic allocation w/ spark streaming on mesos?

Re: dynamic allocation w/ spark streaming on mesos?

dynamic allocation w/ spark streaming on mesos?

13 matches

Site Navigation

Mail list logo

Footer information