If you haven't looked at the offset ranges in the logs for the time period in question, I'd start there.
On Jan 24, 2017 2:51 PM, "Hakan İlter" <hakanil...@gmail.com> wrote: Sorry for misunderstanding. When I said that, I meant there are no lag in consumer. Kafka Manager shows each consumer's coverage and lag status. On Tue, Jan 24, 2017 at 10:45 PM, Cody Koeninger <c...@koeninger.org> wrote: > When you said " I check the offset ranges from Kafka Manager and don't > see any significant deltas.", what were you comparing it against? The > offset ranges printed in spark logs? > > On Tue, Jan 24, 2017 at 2:11 PM, Hakan İlter <hakanil...@gmail.com> wrote: > > First of all, I can both see the "Input Rate" from Spark job's statistics > > page and Kafka producer message/sec from Kafka manager. The numbers are > > different when I have the problem. Normally these numbers are very near. > > > > Besides, the job is an ETL job, it writes the results to Elastic Search. > An > > another legacy app also writes the same results to a database. There are > > huge difference between DB and ES. I know how many records we process > daily. > > > > Everything works fine if I run a job instance for each topic. > > > > On Tue, Jan 24, 2017 at 5:26 PM, Cody Koeninger <c...@koeninger.org> > wrote: > >> > >> I'm confused, if you don't see any difference between the offsets the > >> job is processing and the offsets available in kafka, then how do you > >> know it's processing less than all of the data? > >> > >> On Tue, Jan 24, 2017 at 12:35 AM, Hakan İlter <hakanil...@gmail.com> > >> wrote: > >> > I'm using DirectStream as one stream for all topics. I check the > offset > >> > ranges from Kafka Manager and don't see any significant deltas. > >> > > >> > On Tue, Jan 24, 2017 at 4:42 AM, Cody Koeninger <c...@koeninger.org> > >> > wrote: > >> >> > >> >> Are you using receiver-based or direct stream? > >> >> > >> >> Are you doing 1 stream per topic, or 1 stream for all topics? > >> >> > >> >> If you're using the direct stream, the actual topics and offset > ranges > >> >> should be visible in the logs, so you should be able to see more > >> >> detail about what's happening (e.g. all topics are still being > >> >> processed but offsets are significantly behind, vs only certain > topics > >> >> being processed but keeping up with latest offsets) > >> >> > >> >> On Mon, Jan 23, 2017 at 3:14 PM, hakanilter <hakanil...@gmail.com> > >> >> wrote: > >> >> > Hi everyone, > >> >> > > >> >> > I have a spark (1.6.0-cdh5.7.1) streaming job which receives data > >> >> > from > >> >> > multiple kafka topics. After starting the job, everything works > fine > >> >> > first > >> >> > (like 700 req/sec) but after a while (couples of days or a week) it > >> >> > starts > >> >> > processing only some part of the data (like 350 req/sec). When I > >> >> > check > >> >> > the > >> >> > kafka topics, I can see that there are still 700 req/sec coming to > >> >> > the > >> >> > topics. I don't see any errors, exceptions or any other problem. > The > >> >> > job > >> >> > works fine when I start the same code with just single kafka topic. > >> >> > > >> >> > Do you have any idea or a clue to understand the problem? > >> >> > > >> >> > Thanks. > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > View this message in context: > >> >> > > >> >> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-st > reaming-multiple-kafka-topic-doesn-t-work-at-least-once-tp28334.html > >> >> > Sent from the Apache Spark User List mailing list archive at > >> >> > Nabble.com. > >> >> > > >> >> > ------------------------------------------------------------ > --------- > >> >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> >> > > >> > > >> > > > > > >