Re: ClassLoader problem - java.io.InvalidClassException: scala.Option; local class incompatible

2017-02-20 Thread Kohki Nishio
Created a jira, I believe SBT is a valid use case, but it's resolved as Not a Problem .. https://issues.apache.org/jira/browse/SPARK-19675 On Mon, Feb 20, 2017 at 10:36 PM, Kohki Nishio wrote: > Hello, I'm writing a Play Framework application which does Spark, however > I'm getting below > > j

ClassLoader problem - java.io.InvalidClassException: scala.Option; local class incompatible

2017-02-20 Thread Kohki Nishio
Hello, I'm writing a Play Framework application which does Spark, however I'm getting below java.io.InvalidClassException: scala.Option; local class incompatible: stream classdesc serialVersionUID = -114498752079829388, local class serialVersionUID = 5081326844987135632 at java.io.ObjectStream

[SparkSQL] pre-check syntex before running spark job?

2017-02-20 Thread Linyuxin
Hi All, Is there any tool/api to check the sql syntax without running spark job actually? Like the siddhiQL on storm here: SiddhiManagerService. validateExecutionPlan https://github.com/wso2/siddhi/blob/master/modules/siddhi-core/src/main/java/org/wso2/siddhi/core/SiddhiManagerService.java it can

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Cody Koeninger
So there's no reason to use checkpointing at all, right? Eliminate that as a possible source of problems. Probably unrelated, but this also isn't a very good way to benchmark. Kafka producers are threadsafe, there's no reason to create one for each partition. On Mon, Feb 20, 2017 at 4:43 PM, Muh

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
This is the code that I have been trying is giving me this error. No complicated operation being performed on the topics as far as I can see. class Identity() extends BenchBase { override def process(lines: DStream[(Long, String)], config: SparkBenchConfig): Unit = { val reportTopic = con

Fwd: Will Spark ever run the same task at the same time

2017-02-20 Thread Mark Hamstra
First, the word you are looking for is "straggler", not "strangler" -- very different words. Second, "idempotent" doesn't mean "only happens once", but rather "if it does happen more than once, the effect is no different than if it only happened once". It is possible to insert a nearly limitless v

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Cody Koeninger
That's an indication that the beginning offset for a given batch is higher than the ending offset, i.e. something is seriously wrong. Are you doing anything at all odd with topics, i.e. deleting and recreating them, using compacted topics, etc? Start off with a very basic stream over the same kaf

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
Update: I am using Spark 2.0.2 and Kafka 0.8.2 with Scala 2.10 On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed < 11besemja...@seecs.edu.pk> wrote: > I am PhD student at Ohio State working on a study to evaluate streaming > frameworks (Spark Streaming, Storm, Flink) using the the Intel HiB

Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
I am PhD student at Ohio State working on a study to evaluate streaming frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench benchmarks. But I think I am having a problem with Spark. I have Spark Streaming application which I am trying to run on a 5 node cluster (including master

Re: Will Spark ever run the same task at the same time

2017-02-20 Thread Steve Loughran
> On 16 Feb 2017, at 18:34, Ji Yan wrote: > > Dear spark users, > > Is there any mechanism in Spark that does not guarantee the idempotent > nature? For example, for stranglers, the framework might start another task > assuming the strangler is slow while the strangler is still running. This

Re: Basic Grouping Question

2017-02-20 Thread ayan guha
Hi Once you specify the aggregates on group By function (I am assuming you mean dataframe here?), grouping and aggregate both works in distributed fashion (you may want to look into how reduceByKey and/or aggregateBykey work). On Mon, Feb 20, 2017 at 10:23 PM, Marco Mans wrote: > Hi! > > I'm ne

Message loss in streaming even with graceful shutdown

2017-02-20 Thread Noorul Islam K M
Hi all, I have a streaming application with batch interval 10 seconds. val sparkConf = new SparkConf().setAppName("RMQWordCount") .set("spark.streaming.stopGracefullyOnShutdown", "true") val ssc = new StreamingContext(sparkConf, Seconds(10)) I also use reduceByKeyAndWindow() API f

Basic Grouping Question

2017-02-20 Thread Marco Mans
Hi! I'm new to Spark and trying to write my first spark job on some data I have. The data is in this (parquet) format: Code,timestamp, value A, 2017-01-01, 123 A, 2017-01-02, 124 A, 2017-01-03, 126 B, 2017-01-01, 127 B, 2017-01-02, 126 B, 2017-01-03, 123 I want to write a little map-reduce appli

Spark streaming on AWS EC2 error . Please help

2017-02-20 Thread shyla deshpande
I am running Spark streaming on AWS EC2 in standalone mode. When I do a spark-submit, I get the following message. I am subscribing to 3 kafka topics and it is reading and processing just 2 topics. Works fine in local mode. Appreciate your help. Thanks Exception in thread "pool-26-thread-132" jav

Re: Spark Worker can't find jar submitted programmatically

2017-02-20 Thread Cosmin Posteuca
Hi Zoran, I think you are looking for --jars parameter/argument to spark-submit When using spark-submit, the application jar along with any jars included > with the --jars option will be automatically transferred to the cluster. > URLs supplied after --jars must be separated by commas. ( > http:/