Each empty partition has 21MB by default

2019-02-19 Thread Soheil Pourbafrani
Recently I noticed that when I create a new empty topic, each partition will have 21MB data! I even change the Kafka version from 0.11.3 to 2.1.0 but it shows the same behavior. For example, if I create a new topic using the command : kafka-topics.sh --zookeeper zoo1:2181 --create --topic test --r

Re: Kafka streams exactly_once auto commit timeout transaction issue

2019-02-19 Thread Xander Uiterlinden
Thanks for your reply. I figured out what was wrong, and it turned out to be a stupid mistake at my end as I did not use a consumer with isolation level "read_committed" to verify. Xander On Fri, Feb 8, 2019 at 8:58 PM Guozhang Wang wrote: > Hello Xander, > > Upon committing the state with `exa

Re: Each empty partition has 21MB by default

2019-02-19 Thread Christopher Shannon
Kafka preallocates the index files so this is normal. This size is configurable using the property: segment.index.bytes You can find more information by searching that property in the documentation: https://kafka.apache.org/documentation/ On Tue, Feb 19, 2019 at 6:39 AM Soheil Pourbafrani wrote:

Questions on Exactly Once Semantics

2019-02-19 Thread Greenhorn Techie
Hi, Our data getting into Kafka is transactional in nature and hence I am trying to understand EOS better. My present understanding is as below: It is mentioned that when producer starts, it will have a new PID, but only valid till the session. Does that mean, is it a pre-requisite to have the sa

Re: Each empty partition has 21MB by default

2019-02-19 Thread Soheil Pourbafrani
Why it happened suddenly? In about two years I didn't observe such behavior from Kafka! On Tue, Feb 19, 2019 at 4:20 PM Christopher Shannon < christopher.l.shan...@gmail.com> wrote: > Kafka preallocates the index files so this is normal. This size is > configurable using the property: segment.ind

Lag checking from producer

2019-02-19 Thread Filipp Zhinkin
Hi! I'm trying to implement backpressure mechanism that asks producers to stop doing any work when consumers are not able to process all messages in time (producers require statistics calculated by consumers in order to answer client requests, when consumers are lagging behind we have to stop prod

Questions on Kafka Exactly Once Semantics

2019-02-19 Thread bhoomireddy . vijay
Hi, Our data getting into Kafka is transactional in nature and hence I am trying to understand EOS better. My present understanding is as below: It is mentioned that when producer starts, it will have a new PID, but only valid till the session. Does that mean, is it a pre-requisite to have the

Questions - Exactly Once Semantics

2019-02-19 Thread bhoomireddy . vijay
Hi, Our data getting into Kafka is transactional in nature and hence I am trying to understand EOS better. My present understanding is as below: It is mentioned that when producer starts, it will have a new PID, but only valid till the session. Does that mean, is it a pre-requisite to have the

Questions regarding Exactly-Once semantics

2019-02-19 Thread bhoomireddy . vijay
Hi, Our data getting into Kafka is transactional in nature and hence I am trying to understand EOS better. My present understanding is as below: It is mentioned that when producer starts, it will have a new PID, but only valid till the session. Does that mean, is it a pre-requisite to have the

Re: Lag checking from producer

2019-02-19 Thread Javier Arias Losada
Hi, could you please be more specific on your use case? One of the theoretical advantages of a system like kafka is that you can decouple producers and consumers, so you don't need to to do backpressure. A different topic is how to handle lagging consumers, in that scenario you could scale up your

Re: DSL - Deliver through a table and then to a stream?

2019-02-19 Thread Trey Hutcheson
Ok, I have a solution - I implemented a custom Transformer implementation that simply accepts a key/value and writes it to a state store, then returns the input values. It's a "write-through" transformer. It's basically like a peek operation, but saves it to the backing state store. But since it's

Re: Lag checking from producer

2019-02-19 Thread Filipp Zhinkin
Hi, thank you for the reply! I'm developing system where producers are spending money every time a request arrives. Consumers account money spent using the data from producers as well as few other sources. Consumers are also responsible to calculatate statistics that affect policies used by produ

Re: Lag checking from producer

2019-02-19 Thread Peter Bukowinski
From your description, it sounds like kafka may be ill-suited for your project. A backpressure mechanism essentially requires producers to be aware of consumers and that is counter to Kafka’s design. Also, it sounds like your producers are logical (if not actual) consumers of data generated by t

Re: Questions on Exactly Once Semantics

2019-02-19 Thread Matthias J. Sax
Even if the question was sent 4 times to the mailing list, I am only answering is exactly-once (sorry for the bad joke -- could not resist...) You have to distinguish between "idempotent producer" and "transactional producer". If you enable idempotent writes (config `enable.idempotence`), your p

[ANNOUNCE] Apache Kafka 2.1.1

2019-02-19 Thread Colin McCabe
The Apache Kafka community is pleased to announce the release for Apache Kafka 2.1.1. This is a bugfix release for Kafka 2.1.0. All of the changes in this release can be found in the release notes: https://www.apache.org/dist/kafka/2.1.1/RELEASE_NOTES.html You can download the source and binar

Re: [ANNOUNCE] Apache Kafka 2.1.1

2019-02-19 Thread Gwen Shapira
Yay! Thanks for running the release Colin, and to everyone who reported and fixed bugs :) On Tue, Feb 19, 2019, 3:37 PM Colin McCabe wrote: > The Apache Kafka community is pleased to announce the release for Apache > Kafka 2.1.1. > > This is a bugfix release for Kafka 2.1.0. All of the changes

Re: Kafka Topic Volume and (possibly ACL) question

2019-02-19 Thread Evelyn Bayes
Hi, I would use ACLs or something similar. For instance, you might assign the records which are limited to a subset of clients to a specific topic with an associated ACL. I expect you’ll find having 8k extra topics very problematic in a range of ways, such as: * Replication issues; * Poor bat