Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Sachin Mittal
I had a question regarding http://docs.confluent.io/3.1.2/streams/developer-guide.html#enable-disable-state-store-changelogs When disabling this in 0.10.2 what does this exactly means. Dos this means no longer any rocksdb state store would get created? On this subject we had started with spark st

Re: Kafka offset being reset

2017-02-27 Thread Bhavi C
What version of Kafka are you running? From: Vishnu Gopal Sent: Monday, February 27, 2017 5:50:06 PM To: users@kafka.apache.org Subject: Kafka offset being reset Team, I came across an issue where the offset positing is being reset and the consumer app is rece

Kafka offset being reset

2017-02-27 Thread Vishnu Gopal
Team, I came across an issue where the offset positing is being reset and the consumer app is receiving the same message in the same offset repeatedly. this is happing every 3 or 4 days apart. I have purged the topic for now and re-started the app. but can any body give me a direction on how t

Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Guozhang Wang
Kohki, Thanks for the explanation, it's very helpful. As we have talked in another email thread you started, originally I thought the motivation to use "explicit triggers" (i.e. what it achieves with your watermark) was due to application logic, i.e. whenever you have received a record that trigg

Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Guozhang Wang
Tianji, To add to what Michael already mentioned: Setting commit interval higher can help with the producer batching changelog records sent to Kafka broker's changelog topic and hence better leverage bandwidth, however the traffic would still be the same in your case as you mentioned that each key

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
Hi Ian, As discussed in KAFKA-3775, proper memory management is better than throttling and we've made some steps towards that in 0.10.1 and 0.10.2 (reduce the memory RocksDb uses, provide a global memory limit for buffers within streams). The scenario you mention is possible, and needs to be ad

Re: Overriding JAAS config for connector

2017-02-27 Thread Stephen Durfey
Ah, I figured out the issue. It was user error. I needed to add that config to my connectors ConfigDef, and after that everything was working just fine. On Fri, Feb 24, 2017 at 1:41 PM, Ismael Juma wrote: > I suggest filing a JIRA with the details to reproduce and the stacktrace. > The way it wo

Re: Kafka Connect

2017-02-27 Thread Hans Jespersen
Maybe look at this Kafka source connector for salesforce https://github.com/jcustenborder/kafka-connect-salesforce -hans Sent from my iPhone > On Feb 27, 2017, at 4:06 PM, VIVEK KUMAR MISHRA 13BIT0066 > wrote: > > Actually my data sources are salesforce and mailchimp. i have developed an > a

Kafka MirrorMaker issues

2017-02-27 Thread Le Cyberian
Hi Kafka Gurus :) I am facing issues with KafkaMirror, I am using Kafka 0.10.1.1 and trying to use mirroring to create backup of kafka logs or perhaps this might be a great idea to do it, please let me know if its not. my consumer.properties: bootstrap.servers=localhost:9092 group.id=mirror pro

Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Kohki Nishio
Guozhang, It's a bit difficult to explain, but let me try ... the basic idea is that we can assume most of messages have the same clock (per partition at least), then if an offset has information about metadata about the target time of the offset, fail-over works. Offset = 1 Metadata Time = 2/

Re: Large state directory with Kafka Streams

2017-02-27 Thread Ian Duffy
> Yes, the partitions reflect those of the input topic. You could try to create the topic manually before streams start, however, that might not be an ideal operational way of doing things (it's best if streams continues to do these things automatically). I'd suggest the scaling out approach first.

Re: Kafka Connect

2017-02-27 Thread VIVEK KUMAR MISHRA 13BIT0066
Actually my data sources are salesforce and mailchimp. i have developed an api that will fetch data from there but now i want that if any changes are happening in data of salesforce and mailchimp sources then that changes should be reflected in my topic data. On Mon, Feb 27, 2017 at 8:54 PM, Tauze

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
Hi Ian, Yes, the partitions reflect those of the input topic. You could try to create the topic manually before streams start, however, that might not be an ideal operational way of doing things (it's best if streams continues to do these things automatically). I'd suggest the scaling out appro

RE: Kafka Connect

2017-02-27 Thread Tauzell, Dave
Also, see this article on streaming changes from MySQL to kafka: https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka -Original Message- From: Tauzell, Dave Sent: Monday, February 27, 2017 9:07 AM To: users@kafka.apache.org Subject: RE: Kafka Connect

RE: Kafka Connect

2017-02-27 Thread Tauzell, Dave
Are you specifically talking about relational databases?Kafka Connect has a JDBC source (http://docs.confluent.io/3.1.1/connect/connect-jdbc/docs/source_connector.html) which can push data changes to kafka. It can only run sql queries, though, so out of the box it will just get you update

zookeeper.set.acl=false not considered by ZookeeperLeaderElector

2017-02-27 Thread Stevo Slavić
Hello Apache Kafka community, There's nice documentation on enabling ZooKeeper security on an existing Apache Kafka cluster at https://kafka.apache.org/documentation/#zk_authz_migration For your convenience here are the first two steps of migration: 1. Perform a rolling restart setting the J

Re: Large state directory with Kafka Streams

2017-02-27 Thread Ian Duffy
Hi Eno, Thanks for the fast response. > It looks like you have a lot of partitions for the count store. I believe this isn't configurable? They were auto created by the stream. I'm assuming its mirrored based of the amount off partitions our input topic has. > The locking part was supposed to h

Updation of data in kafka topic based on changes in data sources.

2017-02-27 Thread VIVEK KUMAR MISHRA 13BIT0066
Hi All, Is it possible to update kafka topic data based on changes in data sources using python?

Re: Stream topology with multiple Kaka clusters

2017-02-27 Thread Mahendra Kariya
Thanks a lot Eno! On Mon, Feb 27, 2017 at 7:26 PM, Eno Thereska wrote: > Hi Mahendra, > > The short answer is "not yet", but see this link for more: > https://groups.google.com/forum/?pli=1#!msg/confluent- > platform/LC88ijQaEMM/sa96OfK9AgAJ;context-place=forum/confluent-platform > > Thanks > En

Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Michael Noll
> Also, is it possible to stop the syncing between state stores to brokers, if I am fine with failures? Yes, you can disable the syncing (or the "changelog" feature) of state stores: http://docs.confluent.io/current/streams/developer-guide.html#enable-disable-state-store-changelogs > I do have a

Re: Large state directory with Kafka Streams

2017-02-27 Thread Eno Thereska
Hi Ian, It looks like you have a lot of partitions for the count store. Each RocksDb database uses off heap memory (around 60-70MB in 0.10.2) which will add up if you have these many stores in one instance. One solution would be to scale out your streams application by using another Kafka Strea

Large state directory with Kafka Streams

2017-02-27 Thread Ian Duffy
Hi All, I'm using Kafka Client 10.2 with Kafka Streams. I'm performing a groupByKey on a stream and seeing large files appear within my state directory. Is this expected? 90M 1_0/rocksdb/content-count-store 82M 1_1/rocksdb/content-count-store 102M 1_10/rocksdb/content-count-store 86M 1_11/rocksd

Re: Stream topology with multiple Kaka clusters

2017-02-27 Thread Eno Thereska
Hi Mahendra, The short answer is "not yet", but see this link for more: https://groups.google.com/forum/?pli=1#!msg/confluent-platform/LC88ijQaEMM/sa96OfK9AgAJ;context-place=forum/confluent-platform Thanks Eno > On 27 Feb 2017, at 13:37, Mahendra Kariya wrote: > > Hi, > > I have a couple of q

Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Tianji Li
Hi Guozhang and Kohki, Thanks for your replies. I think I know how to deal with partitioning now, but I am still not sure how to deal with the traffic between the hidden state store sizes and Kafka Brokers (same as Kohki). I feel like the easiest thing to do is to set a larger commit window, s

Stream topology with multiple Kaka clusters

2017-02-27 Thread Mahendra Kariya
Hi, I have a couple of questions regarding Kafka streams. 1. Can we merge two streams from two different Kafka clusters? 2. Can my sink topic be in Kafka cluster different from source topic? Thanks!

Kafka Connect

2017-02-27 Thread VIVEK KUMAR MISHRA 13BIT0066
How to use kafka connect using python to get information about update,delete and insertion of data at various data sources?