Re: kafka streams re-partitioning on incoming events

2023-07-24 Thread Pushkar Deole
Thanks a lot Bruno! I am just trying the Processor API as you mentioned above, so the processor will write record to another kafka topic with new key. I am just having difficulty to read in another processor from that kafka topic and wondering if I need to create another stream with source as inte

Re: kafka streams re-partitioning on incoming events

2023-07-14 Thread Bruno Cadonna
Hi Pushkar, The events after repartitioning are processed by a different task than the task that read the events from the source topic. The task assignor assigns those tasks to stream threads. So events with the same key will be processed by the same task. As far as I understood from your earl

Re: kafka streams re-partitioning on incoming events

2023-07-14 Thread Pushkar Deole
Thanks Bruno.. What do you mean exactly with "...and then process them in that order"? By this, I mean to say if the order of events in partition will be processed after repartition. Probably I don't need to go through internal details but does the partitions of topic are again assigned to stream

Re: kafka streams re-partitioning on incoming events

2023-07-14 Thread Bruno Cadonna
Hi Pushkar, you can use repartition() for repartition your data. Method through() is actually deprecated in favor of repartition(). Before you repartition you need to specify the new key with selectKey(). What do you mean exactly with "...and then process them in that order"? The order of th

Re: kafka streams re-partitioning on incoming events

2023-07-13 Thread Pushkar Deole
Hello, *Kafka dev community, @matthiasJsax* Can you comment on below question? It is very important for us since we are getting inconsistencies due to current design On Sun, Jul 9, 2023 at 6:15 PM Pushkar Deole wrote: > Hi, > > We have a kafka streams application that consumes from multiple top

kafka streams re-partitioning on incoming events

2023-07-09 Thread Pushkar Deole
Hi, We have a kafka streams application that consumes from multiple topic with different keys. Before processing these events in the application, we want to repartition those events on a single key that will ensure related events are processed by same application instance. e.g. the events on multi

Re: Kafka Streams And Partitioning

2021-03-20 Thread Gareth Collins
A --> > A:B > .groupByKey() > .aggregate(...); > .to("output-topic"); // keys are of the form A:B, but should be > partitioned by A > > If that looks about right, then there are two things to watch out for: > 1) To keep partitioning on A instead of A:B,

Re: Kafka Streams And Partitioning

2021-03-19 Thread Sophie Blee-Goldman
are two things to watch out for: 1) To keep partitioning on A instead of A:B, you'll need to provide a custom Partitioner when writing to the output topic. See Produced#streamPartitioner 2) Since you have a key-changing operation (selectKey) upstream of a stateful operation (aggregate),

Re: Kafka Streams And Partitioning

2021-03-17 Thread Gareth Collins
? So WindowedStreamPartitioner is a special case for avoiding the repartition topic? thanks in advance, Gareth On Wed, Mar 17, 2021 at 7:59 PM Sophie Blee-Goldman wrote: > Hey Gareth, > > Kafka Streams state store partitioning is based on the partitioning of the > upstream input topi

Re: Kafka Streams And Partitioning

2021-03-17 Thread Sophie Blee-Goldman
Hey Gareth, Kafka Streams state store partitioning is based on the partitioning of the upstream input topics. If you want your RocksDB stores to be partitioned based on the prefix of a key, then you should make sure the input topic feeding into it uses whatever partitioning strategy you had in

Kafka Streams And Partitioning

2021-03-15 Thread Gareth Collins
Hi, This may be a newbie question but is it possible to control the partitioning of a RocksDB KeyValueStore in Kafka Streams? For example, I perhaps only want to partition based on a prefix of a key rather than the full key. I assume something similar must be done for the WindowStore to

Re: Partitioning per team

2020-10-30 Thread Guozhang Wang
Hello Jan, One alternative approach you can consider is to use combo as the key, hence it achieves the small aggregation, while customizing your partitioner for the repartition topic such that keys with the same prefix always go to the same partition. Then when cleaning up data, similarly within

Partitioning per team

2020-10-26 Thread Jan Bols
For a kafka-streams application, we keep data per team. Data from 2 teams never meet but within a team, data is highly integrated. A team has team members but also has several types of equipment. A team has a lifespan of about 1-3 days after which the team is removed and all data relating to that t

Re: Partitioning issue when a broker is going down

2020-05-17 Thread Victoria Zuberman
Regards number of partitions: Still don't understand it fully. I revisited Java default partitioner. I see that there available partitions are used only when key is not provided (virtually when it is round-robin). When key is provided, it uses number of partitions (regardless of availability). Thi

Re: Partitioning issue when a broker is going down

2020-05-17 Thread Peter Bukowinski
> On May 17, 2020, at 11:45 AM, Victoria Zuberman > wrote: > >  Regards acks=all: > - > Interesting point. Will check acks and min.insync.replicas values. > If I understand the root cause that you are suggesting correctly, given my > RF=2 and 3 brokers in cluster: > m

Re: Partitioning issue when a broker is going down

2020-05-17 Thread Victoria Zuberman
Regards acks=all: - Interesting point. Will check acks and min.insync.replicas values. If I understand the root cause that you are suggesting correctly, given my RF=2 and 3 brokers in cluster: min.insync.replicas > 1 and acks=all, removing one broker ---> partition th

Re: Partitioning issue when a broker is going down

2020-05-17 Thread Peter Bukowinski
If your producer is set to use acks=all, then it won’t be able to produce to the topic topic partitions that had replicas on the missing broker until the replacement broker has finished catching up to be included in the ISR. What method are you using that reports on the number of topic partition

Partitioning issue when a broker is going down

2020-05-17 Thread Victoria Zuberman
Hi, Kafka cluster with 3 brokers, version 1.0.1. Topic with 15 partitions, replication factor 2. All replicas in sync. Bringing down one of the brokers (ungracefully), then adding a broker in version 1.0.1 During this process, are we expected either of the following to happen: 1. Some of the

RE: EXTERNAL: Re: Separate Kafka partitioning from key compaction

2020-05-12 Thread Young, Ben
I'm not sure that's feasible in this case, but I'll have a look! Thanks, Ben -Original Message- From: Liam Clarke-Hutchinson Sent: 06 May 2020 19:47 To: users@kafka.apache.org Subject: EXTERNAL: Re: Separate Kafka partitioning from key compaction Could you deploy a Ka

Re: Separate Kafka partitioning from key compaction

2020-05-06 Thread Liam Clarke-Hutchinson
Could you deploy a Kafka Streams app that implemented your desired partitioning? Obviously this would require a duplication in topics between those produced to initially, and those partitioned the way you'd like, but it would solve the issue you're having. On Wed, 6 May 2020, 10:2

Separate Kafka partitioning from key compaction

2020-05-06 Thread Young, Ben
use cases where we'd like compaction to keep more than we'd use for partitioning. Obviously we could write our own hashing etc but that's hard when our main producers and consumers are in C# and we want to integration with KSQL etc. My desert island solution would be to have the

Re: Does Merging two kafka-streams preserve co-partitioning

2020-01-20 Thread Yair Halberstadt
sked this question on stack-overflow and was wondering if anyone here > > could answer it: > > > https://stackoverflow.com/questions/59820243/does-merging-two-kafka-streams-preserve-co-partitioning > > > > > > I have 2 co-partitioned kafka topics. One contains aut

Re: Does Merging two kafka-streams preserve co-partitioning

2020-01-20 Thread John Roesler
answer it: > https://stackoverflow.com/questions/59820243/does-merging-two-kafka-streams-preserve-co-partitioning > > > I have 2 co-partitioned kafka topics. One contains automatically generated > data, and the other manual overrides. > > I want to merge them and filter out an

Does Merging two kafka-streams preserve co-partitioning

2020-01-20 Thread Yair Halberstadt
Hi I asked this question on stack-overflow and was wondering if anyone here could answer it: https://stackoverflow.com/questions/59820243/does-merging-two-kafka-streams-preserve-co-partitioning I have 2 co-partitioned kafka topics. One contains automatically generated data, and the other manual

Re: Kafka partitioning and auto-scaling in k8s

2019-02-21 Thread Ali Nazemian
Hi Peter, Yes, I meant the data rate. The only issue is our application traffic is very fluctuated, so if I go with the high rate for the partition number it doesn't perform very well for the low data rate as it brings unnecessary network latency. I have found out sometimes the latency becomes hi

Re: Kafka partitioning and auto-scaling in k8s

2019-02-21 Thread Peter Bukowinski
I’ll assume when you say load, you mean data rate flowing into your kafka topic(s). One instance can consume from multiple partitions, so on a variable load workflow, it’s a good idea to have more partitions than your average workload will require. When the data rate is low, fewer consumers wil

Kafka partitioning and auto-scaling in k8s

2019-02-21 Thread Ali Nazemian
Hi All, I was wondering how an application can be auto-scalable if only a single instance can read from the single Kafka partition and two instances cannot read from the single partition at the same time with the same consumer group. Suppose there is an application that has 10 instances running o

Re: Dynamic Partitioning

2019-01-09 Thread Dimitry Lvovsky
ition, regardless > of how many partitions there are. > > Ryanne > > On Tue, Jan 8, 2019, 8:57 AM marimuthu eee wrote: > > > Hi Ryanne Dolan, > > > > My requirement is to process notification messages in parallel.So i have > > an idea to pick kafka as a mes

Re: Dynamic Partitioning

2019-01-08 Thread Ryanne Dolan
otification messages in parallel.So i have > an idea to pick kafka as a messaging system.But i need dynamic partitioning* > to handle individual user notification messages in serial manner*.Is it > possible to create dynamic partition creation for a particular topic. > > > > On Tu

Re: Dynamic Partitioning

2019-01-08 Thread marimuthu eee
Hi Ryanne Dolan, My requirement is to process notification messages in parallel.So i have an idea to pick kafka as a messaging system.But i need dynamic partitioning* to handle individual user notification messages in serial manner*.Is it possible to create dynamic partition creation for a

Re: Dynamic Partitioning

2019-01-08 Thread Ryanne Dolan
Can you explain why you think you need dynamic partitioning? Maybe we can help you design without it. Ryanne On Tue, Jan 8, 2019, 4:04 AM marimuthu eee Hi, > > My requirement is to process notification messages in parallel.So i have an > idea to pick kafka as a messaging system.B

Re: Dynamic Partitioning

2019-01-08 Thread Jonathan Santilli
wrote: > Hi, > > My requirement is to process notification messages in parallel.So i have an > idea to pick kafka as a messaging system.But i need dynamic partitioning to > handle individual user notification messages.Is it possible to create > dynamic partition creation for

Dynamic Partitioning

2019-01-08 Thread marimuthu eee
Hi, My requirement is to process notification messages in parallel.So i have an idea to pick kafka as a messaging system.But i need dynamic partitioning to handle individual user notification messages.Is it possible to create dynamic partition creation for a particular topic.

Re: Disk-size aware partitioning

2018-10-09 Thread Brett Rann
ing a cluster balanced. On Tue, Oct 9, 2018 at 8:11 PM Vincent Bernardi wrote: > Hello everyone, > I couldn't find an answer to this on the web so: > If I have a Kafka cluster where nodes have different disk size, is there a > way to have an automatic partitioning aware of

Disk-size aware partitioning

2018-10-09 Thread Vincent Bernardi
Hello everyone, I couldn't find an answer to this on the web so: If I have a Kafka cluster where nodes have different disk size, is there a way to have an automatic partitioning aware of these disk sizes (i.e. allocating 4 partitions to the 4TB node and 1 partition to the 1TB node)? I know

Re: Re-partitioning topic with through (Kafka Streams)

2018-07-31 Thread Guozhang Wang
Hello Francesco, Streams auto-created repartition topics's num.partitions are determined by the num.tasks of the writing sub-topology, which is then determined by the source topic's num.partitions in turn. There are some proposals about extending this coupling but not yet implemented: https://cwik

Re-partitioning topic with through (Kafka Streams)

2018-07-31 Thread Francesco Frontera
Hi, I have a question about topic repartitioning in Kafka Streams using `through` function. I try to explain the context Briefly: I have single topic A with two partitions: A:1:9 A:0:0 I try to create a repartitioned topic using Kafka Streams API: builder.stream("A").map<>((key, val) => KeyV

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Artur Mrozowski
Yes you are probably right. So I was inspired be the KIP 150 blog post, so the entire statement would be like this: KTable customerGrouped= kStreamBuilder.stream(stringSerde, customerMessageSerde, CUSTOMER_TOPIC) .groupBy((key,value) -> Integer.parseInt(value.customer.replaceFirs

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Jan Filipiak
There are some oddities in your topology that make make we wonder if they are the true drivers of your question. https://github.com/afuyo/KStreamsDemo/blob/master/src/main/java/kstream.demo/CustomerStreamPipelineHDI.java#L300 Feels like it should be a KTable to begin with for example otherwise i

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Artur Mrozowski
what if I start two instances of that application? Does the state migrate between the applications? Is it then I have to use a global table? BR Artur On Thu, Nov 30, 2017 at 7:40 PM, Jan Filipiak wrote: > Hi, > > Haven't checked your code. But from what you describe you should be fine. > Upgra

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Jan Filipiak
Hi, Haven't checked your code. But from what you describe you should be fine. Upgrading the version might help here and there but should still work with 0.10 I guess. Best Jan On 30.11.2017 19:16, Artur Mrozowski wrote: Thank you Damian, it was very helpful. I have implemented my solution i

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-30 Thread Artur Mrozowski
Thank you Damian, it was very helpful. I have implemented my solution in version 0.11.0.2 but there is one thing I still wonder. So what I try to do is what is described in KIP 150. Since it didn't make to the release for 1.0 I do it the old fashioned way. https://cwiki.apache.org/confluence/displa

Re: Joins in Kafka Streams and partitioning of the topics

2017-11-22 Thread Damian Guy
Hi Artur, KafkaStreams 0.10.0.0 is quite old and a lot has changed and been fixed since then. If possible i'd recommend upgrading to at least 0.11.0.2 or 1.0. For joins you need to ensure that the topics have the same number of partitions (which they do) and that they are keyed the same. Thanks,

Joins in Kafka Streams and partitioning of the topics

2017-11-22 Thread Artur Mrozowski
Hi, I am joining 4 different topic with 4 partitions each using 0.10.0.0 version of Kafka Streams. The joins are KTable to KTable. Is there anything I should be aware of considering partitions or version of Kafka Streams? In other words should I be expecting consistent results or do I need to for

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-07 Thread Steven Schlansker
o understand the use case better. >>>> >>>> Rephrase to make sure I understood it correctly: >>>> >>>> 1) you are providing a custom partitioner to Streams that is base on >> one >>>> field in your value (that's fine with regar

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-06 Thread Guozhang Wang
t; > > > > > Rephrase to make sure I understood it correctly: > > > > > > 1) you are providing a custom partitioner to Streams that is base on > one > > > field in your value (that's fine with regard to fault-tolerance :)) > > > 2) you want to

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-06 Thread Michael Noll
case better. > > > > Rephrase to make sure I understood it correctly: > > > > 1) you are providing a custom partitioner to Streams that is base on one > > field in your value (that's fine with regard to fault-tolerance :)) > > 2) you want to use interactive qu

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-02 Thread Steven Schlansker
(that's fine with regard to fault-tolerance :)) > 2) you want to use interactive queries to query the store > 3) because of your custom partitioning schema, you need to manually > figure out the right application instance that hosts a key > 4) thus, you use a GlobalKTable to mainta

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-02 Thread Matthias J. Sax
y the store 3) because of your custom partitioning schema, you need to manually figure out the right application instance that hosts a key 4) thus, you use a GlobalKTable to maintain the information from K to D and thus to the partition ie, streams instance that hosts K If this is correct, tha

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-02 Thread Steven Schlansker
> On Jun 2, 2017, at 2:11 PM, Matthias J. Sax wrote: > > I am not sure if I understand the use case correctly. Could you give > some more context? Happily, thanks for thinking about this! > >> backing store whose partitioning is value dependent > > In infer that y

Re: Finding StreamsMetadata with value-dependent partitioning

2017-06-02 Thread Matthias J. Sax
I am not sure if I understand the use case correctly. Could you give some more context? > backing store whose partitioning is value dependent In infer that you are using a custom store and not default RocksDB? If yes, what do you use? What does "value dependent" mean in this context

Finding StreamsMetadata with value-dependent partitioning

2017-06-02 Thread Steven Schlansker
I have a KTable and backing store whose partitioning is value dependent. I want certain groups of messages to be ordered and that grouping is determined by one field (D) of the (possibly large) value. When I lookup by only K, obviously you don't know the partition it should be on. So I will

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-10 Thread Matthias J. Sax
What Mathieu say makes a lot of sense. Just some background info: A KStream is always created from a Kafka topic. > KStream stream = builder.stream("topicName"); A #through() is just syntactic sugar to stream.to(XXX) followed by builder.stream(XXX). So StreamPartitioner problem solved :) Abou

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-10 Thread Mathieu Fenniak
Well, I think what you're doing is unusual for sure. The Streams API is really about transforming streams of data from input to output... so therefore the API doesn't have an injection point like you're looking for. I'd say it's intentional (I'm just a user though). If I were in your shoes, I'd p

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-10 Thread Steven Schlansker
> On Feb 10, 2017, at 1:09 PM, Mathieu Fenniak > wrote: > > Hey Steven, > > If you have one KStream, and you want to produce to a topic that is read by > another KStream, you'd use the ".through" method of the first KStream. > ".through" both outputs to a topic and returns a KStream that reads

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-10 Thread Mathieu Fenniak
Hey Steven, If you have one KStream, and you want to produce to a topic that is read by another KStream, you'd use the ".through" method of the first KStream. ".through" both outputs to a topic and returns a KStream that reads from that topic. (".to" just outputs to a topic) If you want to prod

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-10 Thread Steven Schlansker
So then I guess my problem really is that I am operating at two different levels of abstraction. How do I produce to a KStream? I could imagine a method: public void KStream.put(K, V, Callback?); but I don't see anything like that. Nor do the "QueryableStoreTypes" really seem like what I want

Re: Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-09 Thread Matthias J. Sax
It's by design. The reason it, that Streams uses a single producer to write to different output topic. As different output topics might have different key and/or value types, the producer is instantiated with byte[] as key and value type, and Streams serialized the data before handing it to the pr

Partitioning behavior of Kafka Streams without explicit StreamPartitioner

2017-02-09 Thread Steven Schlansker
Hi, I discovered what I consider to be really confusing behavior -- wondering if this is by design or a bug. The Kafka Partitioner interface: public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster); has both "Object value" and "byte[] va

Re: Kafka Streams dynamic partitioning

2016-10-06 Thread Michael Noll
> I think this should be ' pick number of partitions that matches max number > of possible keys in stream to be partitioned '. > At least in my usecase , in which I am trying to partition stream by key > and make windowed aggregations, if there are less number of topic > partitions than possible ke

Re: Kafka Streams dynamic partitioning

2016-10-05 Thread Adrienne Kole
Thanks, I got the point. That solves my problem. On Wed, Oct 5, 2016 at 10:58 PM, Matthias J. Sax wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > Hi, > > even if you have more distinct keys than partitions (ie, different key > go to the same partition), if you do "aggregate by k

Re: Kafka Streams dynamic partitioning

2016-10-05 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi, even if you have more distinct keys than partitions (ie, different key go to the same partition), if you do "aggregate by key" Streams will automatically separate the keys and compute an aggregate per key. Thus, you do not need to worry about wh

Re: Kafka Streams dynamic partitioning

2016-10-05 Thread Adrienne Kole
Hi, @Ali IMO, Yes. That is the job of kafka server to assign kafka instances partition(s) to process. Each instance can process more than one partition but one partition cannot be processed by more than one instance. @Michael, Thanks for reply. >Rather, pick the number of partitions in a way

Re: Kafka Streams dynamic partitioning

2016-10-05 Thread Ali Akhtar
> It's often a good idea to over-partition your topics. For example, even if today 10 machines (and thus 10 partitions) would be sufficient, pick a higher number of partitions (say, 50) so you have some wiggle room to add more machines (11...50) later if need be. If you create e.g 30 partitions,

Re: Kafka Streams dynamic partitioning

2016-10-05 Thread Michael Noll
> So, in this case I should know the max number of possible keys so that > I can create that number of partitions. Assuming I understand your original question correctly, then you would not need to do/know this. Rather, pick the number of partitions in a way that matches your needs to process the

Re: Kafka Streams dynamic partitioning

2016-10-05 Thread Adrienne Kole
Hi Guozhang, So, in this case I should know the max number of possible keys so that I can create that number of partitions. Thanks Adrienne On Wed, Oct 5, 2016 at 1:00 AM, Guozhang Wang wrote: > By default the partitioner will use murmur hash on the key and mode on > current num.partitions to

Re: Kafka Streams dynamic partitioning

2016-10-04 Thread Guozhang Wang
By default the partitioner will use murmur hash on the key and mode on current num.partitions to determine which partitions to go to, so records with the same key will be assigned to the same partition. Would that be OK for your case? Guozhang On Tue, Oct 4, 2016 at 3:00 PM, Adrienne Kole wrot

Kafka Streams dynamic partitioning

2016-10-04 Thread Adrienne Kole
Hi, >From Streams documentation, I can see that each Streams instance is processing data independently (from other instances), reads from topic partition(s) and writes to specified topic. So here, the partitions of topic should be determined beforehand and should remain static. In my usecase I w

Re: Partitioning at the edges

2016-09-07 Thread Andy Chambers
Looks like re-partitioning is probably the way to go. I've seen reference to this pattern a couple of times but wanted to make sure I wasn't missing something obvious. Looks like kafka streams makes this kind of thing a bit easier than samza. Thanks for sharing your wisdom folks :-)

Re: Partitioning at the edges

2016-09-07 Thread David Garcia
Obviously for the keys you don’t have, you would have to look them up…sorry, I kinda missed that part. That is indeed a pain. The job that looks those keys up would probably have to batch queries to the external system. Maybe you could use kafka-connect-jdbc to stream in updates to that syste

Re: Partitioning at the edges

2016-09-07 Thread David Garcia
The “simplest” way to solve this is to “repartition” your data (i.e. the streams you wish to join) with the partition key you wish to join on. This obviously introduces redundancy, but it will solve your problem. For example.. suppose you want to join topic T1 and topic T2…but they aren’t part

Re: Partitioning at the edges

2016-09-06 Thread Eno Thereska
given that the transactions do not have the customer ID). It's worth mentioning that in Kafka trunk the repartitioning happens automatically (while in 0.10.0.0 the user needs to manually repartition topics). Eno Begin forwarded message: *From: *Andy Chambers *Subject: **Re: Partitioning at

Re: Partitioning at the edges

2016-09-03 Thread Andy Chambers
Hi Eno, I'll try. We have a feed of transaction data from the bank. Each of which we must try to associate with a customer in our system. Unfortunately the transaction data doesn't include the customer-id itself but rather a variety of other identifiers that we can use to lookup the customer-id in

Re: Partitioning at the edges

2016-09-03 Thread Eno Thereska
Hi Andy, Could you share a bit more info or pseudocode so that we can understand the scenario a bit better? Especially around the streams at the edges. How are they created and what is the join meant to do? Thanks Eno > On 3 Sep 2016, at 02:43, Andy Chambers wrote: > > Hey Folks, > > We are

Partitioning at the edges

2016-09-02 Thread Andy Chambers
Hey Folks, We are having quite a bit trouble modelling the flow of data through a very kafka centric system As I understand it, every stream you might want to join with another must be partitioned the same way. But often streams at the edges of a system *cannot* be partitioned the same way becaus

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-29 Thread Michael Noll
Michael, > Guozhang, in (2) above did you mean "some keys *may be* hashed to different > partitions and the existing local state stores will not be valid?" > That fits with out understanding. Yes, that's what Guozhang meant. Corrected version: When you increase the number of input partition

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Michael Ruberry
Thank you both for your replies. This is incredibly helpful. Guozhang, in (2) above did you mean "some keys* may be* hashed to different partitions and the existing local state stores will not be valid?" That fits with out understanding. As to your caveats in (3) and (4), we are trying to be sure

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Guozhang Wang
Hi Michael, 1. Kafka Streams always tries to colocate the local stores with the processing nodes based on the partition key. For example, if you want to do an aggregation based on key K1, but the input topic is not keyed on K1 and hence not partitioned on that. The library then will auto-repartiti

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Eno Thereska
Hi Michael, These are good questions and I can confirm that the system works in the way you hope it works, if you use the DSL and don't make up keys arbitrarily. In other words, there is nothing currently that prevents you from shooting yourself in the foot e.g., by making up keys and using the

Questions about Kafka Streams Partitioning & Deployment

2016-07-18 Thread Michael Ruberry
Hi all, My company, Taboola, has been looking at Kafka Streams and we are still confused about some details of partitions and store persistence. So let's say we have an app consuming a single topic and we're using an embedded and persisted key-value store: 1. If we restart the app, will each

Re: How do I achieve round robin based partitioning for topic?

2015-09-21 Thread Ewen Cheslack-Postava
cer API send without the key and that would result in a round robin > based partitioning, but I only see one particular partition getting all the > data. > > Bijay -- Thanks, Ewen

How do I achieve round robin based partitioning for topic?

2015-09-21 Thread Bijay Bisht
Hi, I am using kafka_2.10-0.8.2.0. Per documentation I just need to invoke the Producer API send without the key and that would result in a round robin based partitioning, but I only see one particular partition getting all the data. Bijay

Mirrormaker Re-partitioning Keyed Messages

2015-08-05 Thread Johnny Sheeley
Hi, Based on the description of https://issues.apache.org/jira/browse/KAFKA-957 (along with a number of random sources in the documentation), I expected keyed messages to remain in the same partition when using MirrorMaker. It looks like there's been a lot of change since then. I've been doing som

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Ewen Cheslack-Postava
.8.0 ( > https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning > key is not specified or null, a producer will pick a random partition and > stick to it for some time (default is 10 mins) before switching to another > one. So, if there are fewer producers tha

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Lance Laursen
>From the FAQ: "To reduce # of open sockets, in 0.8.0 ( https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning key is not specified or null, a producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one. So,

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
Maybe there is some reason why produce sticks with a partition for some period of time - mostly performance related. I can imagine that constant switching between partitions can be kind of slow in such sense that producer has to "refocus" on another partition to send a message to and this switching

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
Nice one! That might be it as well. Do you have an idea what is that configuration parameter called? On Thu, Jul 16, 2015 at 12:53 AM, JIEFU GONG wrote: > This is a total shot in the dark here so please ignore this if it fails to > make sense, but I remember that on some previous implementation o

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
I think I figured it out. I had to use custom parititioner which does basically nothing. Even I used it before, it was not taken into consideration because I was sending KeyedMessage without any key. Just partition and payload. Now I am doing it like this: producer.send(new KeyedMessage("topic"

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread JIEFU GONG
This is a total shot in the dark here so please ignore this if it fails to make sense, but I remember that on some previous implementation of the producer prior to when round-robin was enabled, producers would send messages to only one of the partitions for a set period of time (configurable, I bel

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Jagbir Hooda
Hi Stefan, Have you looked at the following output for message distribution across the topic-partitions and which topic-partition is consumed by which consumer thread? kafaka-server/bin>./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group Jagbir On Wed, Jul

Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
I have following problem, I tried almost everything I could but without any luck All I want to do is to have 1 producer, 1 topic, 10 partitions and 10 consumers. All I want is to send 1M of messages via producer to these 10 consumers. I am using built Kafka 0.8.3 from current upstream so I have

MirrorMaker producer partitioning

2015-04-22 Thread Devon Crouse
Hello, We're running 0.8.1.1, and having an issue with the MirrorMaker not evenly distributing production over all topic partitions ( http://i.imgur.com/03u7bA9.png). Each producer seems to be using the default behavior of sending to one partition for a period of time before switching. Over time,

Re: Different partitioning between new producer and old producer

2014-09-18 Thread Jay Kreps
than 30MB/sec. > start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB > MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:50:31:7202014-09-16 > 20:50:41:24123000200286.1030.04961010503.098 > I was excited about new producer's performance but its partitioning logic > is different.

Re: Different partitioning between new producer and old producer

2014-09-17 Thread Bae, Jae Hyeon
ner. > > The new producer is much more flexible and allows the user to use custom > partitioning logic and provide the partition number in the ProducerRecord. > That way it is broadly applicable to a variety of applications that require > different partitioning logic. > > Thanks,

Re: Different partitioning between new producer and old producer

2014-09-17 Thread Neha Narkhede
Could you make them same logic? Otherwise, I have to change implementation of kafka producer container. The new producer is much more flexible and allows the user to use custom partitioning logic and provide the partition number in the ProducerRecord. That way it is broadly applicable to a

Different partitioning between new producer and old producer

2014-09-17 Thread Bae, Jae Hyeon
10503.098 I was excited about new producer's performance but its partitioning logic is different. Without partition number in ProducerRecord, its partitioning logic is based on murmur2 hash key. But in the old partitioner, partitioning logic is based on key.hashCode. Could you make them same

Re: Dynamic partitioning

2014-09-12 Thread Joe Stein
That command will change how many partitions the topic has. What you are looking for I think is https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-6.ReassignPartitionsTool which allows you to change what partitions are running on which replicas and which replicas

Dynamic partitioning

2014-09-12 Thread István
Hi all, My understanding is that with 0.8.1.x you can manually change the number of partitions on the broker, and this change is going to be picked up by the producers and consumers (high level). kafka-topics.sh --alter --zookeeper zk.net:2181/stream --topic test --partitions 3 Is that the case?

Re: Strange partitioning behavior with 0.8.1.1

2014-06-11 Thread Guozhang Wang
lt; > > prakash.shan...@gmail.com> wrote: > > > > > Can you please tell me how to set this property ? > > > topic.metadata.refresh.interval.ms > > > Is a value of 100 low enough to solve this issue ? > > > Im guessing I can set it to 100 and restart

Re: Strange partitioning behavior with 0.8.1.1

2014-06-11 Thread Prakash Gowri Shankor
ough to solve this issue ? > > Im guessing I can set it to 100 and restart the command line producer and > > the partitioning should work ? Please confirm. > > > > Thanks > > > > > > On Mon, Jun 9, 2014 at 5:09 PM, Prakash Gowri Shankor < > > prakash.shan.

  1   2   >