Hi, as Evelin has already said, I would recommend you to think about your topic / data modeling, from your email
> We have a requirement where, based on business requirementes, we need to publish data only for a specific set of clients. For example, an invoice update shouldn't go to all clients, only the specific client. But a company remittance info should be published to all clients. Also, in some cases, a specific client changes some contract info which is published in a P2P fashion. We have about 8k clients >From this paragraph I do understand you're thinking to have a topic per client, right? what happen when your client base grows more? My recommendation here would be to think about having action based topics for example, something where your different kinds of notification goes to. Hope this helps, -- Pere Missatge de Evelyn Bayes <eve...@confluent.io> del dia dc., 20 de febr. 2019 a les 7:53: > Hi, > > I would use ACLs or something similar. > > For instance, you might assign the records which are limited to a subset > of clients to a specific topic with an associated ACL. > > I expect you’ll find having 8k extra topics very problematic in a range of > ways, such as: > > * Replication issues; > * Poor batching; > * Memory issues due to the additional buffers. > And so many more. > > Currently the upper limit for partitions is generally considered ~100,000 > and that’s on a LARGE cluster. > I usually see a lot of issues coming up long before that limit is hit and > on smaller clusters even more so. > > If you had a replication factor of 3 and a single partition per topic, > this adds 24,000 partitions to your cluster. > > The end solution really depends on how the clients get the data. > > Do you have a consumer read it, do some preprocessing and send it to them? > Then you can handle this in the business logic. > > Do they have direct consumption rights to the cluster? > Then you NEED to have ACLs, because there won’t be anything stopping them > from simply subscribing to another clients topic. > > Cheers, > Eevee. > > > > > On 19 Feb 2019, at 1:04 am, M. Manna <manme...@gmail.com> wrote: > > > > Hello, > > > > We have a requirement where, based on business requirementes, we need to > > publish data only for a specific set of clients. For example, an invoice > > update shouldn't go to all clients, only the specific client. But a > company > > remittance info should be published to all clients. Also, in some cases, > a > > specific client changes some contract info which is published in a P2P > > fashion. We have about 8k clients. > > > > What is the ideal way to control this flow? > > > > 1) specific topic per client > > 2) Some form of ACL? > > > > For option 1, we are not 100% sure if Kafka can handle 8k topics (or, the > > resource issues for that matter). Has anyone solved a similar business > > problem? If so, would you mind sharing your solution? > > > > Btw, we are not using stream platform, it's simply pub-sub. Because we > > don't need real-time aggregation of various items. For us, it's key that > > the synchronisation occurs, and has "exactly-once" semantics. > > > > Thanks, > > -- Pere Urbon-Bayes Software Architect http://www.purbon.com https://twitter.com/purbon https://www.linkedin.com/in/purbon/