How to use multiple kerberos principals for kafka installation ?

2019-07-17 Thread Srinivas, Kaushik (Nokia - IN/Bangalore)
Hi kafka users, We have below scenario and need inputs for the same. Installed kafka as a pod in kubernetes cluster. Created a service (kind : k8s service) for the kafka service. Kafka was installed using the hostname of the docker container as the principal with Kerberos enabled. So while tryi

RE: kafka with kerberos - works even after renewal lifetime expires

2019-07-17 Thread Srinivas, Kaushik (Nokia - IN/Bangalore)
Hello folks, Anybody seen this behaviour or has some information ? -kaushik From: Srinivas, Kaushik (Nokia - IN/Bangalore) Sent: Tuesday, July 16, 2019 3:42 PM To: users@kafka.apache.org Subject: kafka with kerberos - works even after renewal lifetime expires Hi kafka users, We are seeing the

How to start more than one zookeeper server

2019-07-17 Thread Gagan Sabharwal
Hi team, When I look at the config\zookeeper.properties in Kafka folder, while starting, I can't see an option to start more than one zookeper service, either on the same machine or on a different server. This is to ensure that kafka broker is managed irrespective of the fact that one of the zk se

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Peter Bukowinski
Indeed, something seems wrong. I have a kafka (2.0.1) cluster that aggregates data from multiple locations. It has so much data moving through it I can’t afford to keep more than 24 hours on disk. The retention is working correctly. I don’t restrict topics by size, only by time. What version of

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Sachin Nikumbh
I am not setting the group id for the console consumer. When I say, the .log files are all 0 bytes long it is after the producer has gone through 96 GB worth of data. Apart from this topic where I am dumping 96GB of data, I have some test topics where I am publishing very small amount of data.

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Peter Bukowinski
Are you setting a group.id for your console consumer, perhaps, and keeping it static? That would explain the inability to reconsume the data. As to why your logs look empty, kafka likes to hold the data in memory and leaves it to the OS to flush the data to disk. On a non-busy broker, the interv

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Sachin Nikumbh
Hi Jamie, I have 3 brokers and the replication factor for my topic is set to 3. I know for sure that the producer is producing data successfully because I am running a console consumer at the same time and it shows me the messages.  After the producer produces all the data, I have /var/log/kafka

Best practices for compacting topics with tombstones

2019-07-17 Thread Chris Baumgartner
Hello, I'm wondering if anyone has advice on configuring compaction. Here is my scenario: A producer writes raw data to topic #1. A stream app reads the data from topic #1, processes it, writes the processed data to topic #2, and then writes a tombstone record to topic #1. So, I don't intend for

Re: Window aggregation skipping some data

2019-07-17 Thread Alessandro Tagliapietra
Just as future reference, John Roesler confirmed me that the grace period is global in each partition, so if the partition receives older data it is discarded as the window is already closed. I'll try to remove the grace period and use a custom transformer to act as a a custom suppressor to make su

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Sophie Blee-Goldman
Hm. These directories shouldn't be created if using only an in memory store. Can you print your topology? On Wed, Jul 17, 2019 at 11:02 AM Muhammed Ashik wrote: > Hi I just did `du -mh` on `\tmp\kafka-streams` below are the folders listed > with some .lock files inside. > not sure if these are c

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Jamie
Hi Sachin,  My understanding is that the active segment is never deleted which means you should have at least 1GB of data in your partition, if the data is indeed being produced to Kafka, Are there are errors in your broker logs? How many brokers do you have have and what is the replication fact

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Sachin Nikumbh
Broker configs:===broker.id=36num.network.threads=3num.io.threads=8socket.send.buffer.bytes=102400socket.receive.buffer.bytes=102400socket.request.max.bytes=104857600log.dirs=/var/log/kafkanum.partitions=1num.recovery.threads.per.data.dir=1offsets.topic.replication.factor=1transaction.sta

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Muhammed Ashik
Hi I just did `du -mh` on `\tmp\kafka-streams` below are the folders listed with some .lock files inside. not sure if these are coming from rocksdb.. and looks like the sizes of these files are less. 4.0K ./buzzard.MonitoringSeWlanStatsAggregator/0_5 4.0K ./buzzard.MonitoringSeWlanStatsAggregator/

Kafka Rolling Upgrade to 2.2.1

2019-07-17 Thread Jose Manuel Vega Monroy
Hi there, I'm trying to rolling upgrade from 2.1.1 with inter broker protocol version 2.0 to broker version 2.2.1. So which inter protocol version I should set, 2.2 or keeping the current 2.0? Thanks Get Outlook for Android

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Sophie Blee-Goldman
Hm. Just to be absolutely sure, could you try throwing an exception or something in your RocksDBConfigSetter? On Wed, Jul 17, 2019 at 10:43 AM Muhammed Ashik wrote: > I can confirm the /tmp/kafka-streams doesn't have any data related to > rocksdb. > > On Wed, Jul 17, 2019 at 11:11 PM Sophie Blee

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Peter Bukowinski
Can you share your broker and topic config here? > On Jul 17, 2019, at 10:09 AM, Sachin Nikumbh > wrote: > > Thanks for the quick response, Tom. > I should have mentioned in my original post that I am always using > --from-beginning with my console consumer. Even then I don't get any data. >

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Muhammed Ashik
I can confirm the /tmp/kafka-streams doesn't have any data related to rocksdb. On Wed, Jul 17, 2019 at 11:11 PM Sophie Blee-Goldman wrote: > You can describe your topology to see if there are any state stores in it > that you aren't aware of. Alternatively you could check out the state > directo

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Sophie Blee-Goldman
You can describe your topology to see if there are any state stores in it that you aren't aware of. Alternatively you could check out the state directory (/tmp/kafka-streams by default) and see if there is any data in there On Wed, Jul 17, 2019 at 10:36 AM Muhammed Ashik wrote: > Thanks How can

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Muhammed Ashik
Thanks How can I verify If there is some data really going on rocksdb I tried printing the statistics with no success. class CustomRocksDBConfig extends RocksDBConfigSetter { override def setConfig(storeName: String, options: Options, configs: util.Map[String, AnyRef]): Unit = { val stats =

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Sophie Blee-Goldman
Sorry, didn't see the "off-heap" part of the email. Are you using any stateful DSL operators? The default stores are persistent, so you may have a RocksDB store in your topology without explicitly using one. On Wed, Jul 17, 2019 at 10:12 AM Sophie Blee-Goldman wrote: > If you are using inMemoryK

Re: Kafka Streams - unbounded memory growth

2019-07-17 Thread Sophie Blee-Goldman
If you are using inMemoryKeyValueStore, the records are stored by definition in memory. RocksDB is not used at all. This store will continue to grow proportionally to your keyspace. If you do not have sufficient memory to hold your entire dataset in memory, consider adding another instance or switc

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Sachin Nikumbh
Thanks for the quick response, Tom. I should have mentioned in my original post that I am always using --from-beginning with my console consumer. Even then  I don't get any data. And as mentioned, the .log files are of size 0 bytes. On Wednesday, July 17, 2019, 11:09:22 AM EDT, Thomas Aley

Re: Window aggregation skipping some data

2019-07-17 Thread Alessandro Tagliapietra
Seems that completely removing the grace period fixes the problem, is it expected? Is the grace period per key or global? -- Alessandro Tagliapietra On Wed, Jul 17, 2019 at 12:07 AM Alessandro Tagliapietra < tagliapietra.alessan...@gmail.com> wrote: > I've added a reproduction repo here if some

Re: Justin Trudeau: support Justin Trudeau to postpone the decision of banning HUAWEI

2019-07-17 Thread Peter Bukowinski
I’m not even Canadian. No. -- Peter (from phone) > On Jul 17, 2019, at 7:33 AM, jiang0...@gmail.com wrote: > > Hey, > > I just signed the petition "Justin Trudeau: support Justin Trudeau to > postpone the decision of banning HUAWEI" and wanted to see if you could > help by adding your name. >

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Peter Bukowinski
Are you running the console consumer with the ‘--from-beginning’ option? It defaults to reading from tail of the log, so if there is nothing being produced it will be idle. -- Peter (from phone) > On Jul 17, 2019, at 8:00 AM, Sachin Nikumbh > wrote: > > Hi all, > I have ~ 96GB of data in fil

Re: Kafka logs are getting deleted too soon

2019-07-17 Thread Thomas Aley
Hi Sachin, Try adding --from-beginning to your console consumer to view the historically produced data. By default the console consumer starts from the last offset. Tom Aley thomas.a...@ibm.com From: Sachin Nikumbh To: Kafka Users Date: 17/07/2019 16:01 Subject:[EXTERNAL] K

Kafka logs are getting deleted too soon

2019-07-17 Thread Sachin Nikumbh
Hi all, I have ~ 96GB of data in files that I am trying to get into a Kafka cluster. I have ~ 11000 keys for the data and I have created 15 partitions for my topic. While my producer is dumping data in Kafka, I have a console consumer that shows me that kafka is getting the data. The producer ru

Justin Trudeau: support Justin Trudeau to postpone the decision of banning HUAWEI

2019-07-17 Thread jiang01yi
Hey, I just signed the petition "Justin Trudeau: support Justin Trudeau to postpone the decision of banning HUAWEI" and wanted to see if you could help by adding your name. Our goal is to reach 100 signatures and we need more support. You can read more and sign the petition here: http://chng.it/

Print RocksDb Stats

2019-07-17 Thread Muhammed Ashik
Hi I'm trying to log the rocksdb stats with the below code, but not observing any logs.. I'm enabling this as the off-heap memory grows indefinitely over a period of time. We were using inMemoryKeyValueStore only, I was not sure kafka-streams uses rockdb as default in memory store. Kafka Streams v

Kafka Streams - unbounded memory growth

2019-07-17 Thread Muhammed Ashik
Kafka Streams version - 2.0.0 Hi, in our streaming instance we are observing a steady growth in the off-heap memory (out of 2gb allocated memory 1.3 is reserved for heap memory and the ~700mb memory is utilised over a time of ~6hrs and the process is OOM killed eventually). we are using only the

Re: Doubts

2019-07-17 Thread Omar Al-Safi
ii) If you want to move data in-house that in a DB (let's say a rational DB like MySql ..etc), I strongly advise you to look at https://debezium.io/ which is a CDC (Change Data Capture) Kafka Connect plugin which records DDL events from your DB and propagate them directly to Kafka, if you have a Ka

Re: Window aggregation skipping some data

2019-07-17 Thread Alessandro Tagliapietra
I've added a reproduction repo here if someone wants to have a look at a full working example https://github.com/alex88/kafka-error-repro you can see at the top of WindowTest.java the messages it sends and underneath you have the part where it generates the sequence and the window I've also incl