Ola, Let's review the Apache Kafka ecosystem briefly, and then I will make an attempt to address your concerns:
In the Kafka Ecosystem, we have the following components: - Brokers (stores events in logical containers called Topics. Topics are analogous to Tables in relational databases like MySQL or PostgreSQL) - Producers (the generate events and sends them to the brokers for storage - Consumers (picks up the events from the Topics and processes or consumes them) - Streams (at a high level combines Consumer and Producer mechanism to process events in near real time and send them back to the Topics) - Schema Registry (keeps track of data structures in the topics. Can be used for Avro, JSON, Protobuf formats) https://kafka.apache.org/documentation/#api https://github.com/confluentinc/schema-registry There are two main things to consider here in your scenario. Each of the devices is a prospective Producer of events that will be sent to the topic. You don't necessarily need to dedicate topics uniquely for each producer just like how you will not need to create a table for each customer record that you need to store. Events sent to a topic are generally grouped together because they have similar data structure, so if your devices are generating messages with the same data structure, then regardless of the number of devices, you should still be able to send them to the same topic. Just make sure that you have enough partitions and you should be able to consume them in parallel. The partition count is important because the maximum number of consumers within a group of Consumers is limited by default by the number of partitions in the topic. If you are looking to have up to let's say 50 parallel processors in your Consumer Group then you need to specify 50 partitions when creating the topic Nevertheless, with the parallel consumer you can mitigate this partition limitation by using the parallel consumer by Confluent to process your events with key-based ordering. https://github.com/confluentinc/parallel-consumer Key-Based ordering essentially eliminates this limitation https://github.com/confluentinc/parallel-consumer#ordered-by-key The second item of consideration is that you wanted to "loop" to process the events. I don't think you need to do this. You can consider the Streams API, to process your events as they arrive without needing to do this https://kafka.apache.org/30/documentation/streams/ The Streams API has so many built-in mechanisms that allow you to just focus on how to process, join and aggregate your events as they arrive at the topics without the need to loop I definitely would not recommend having a topic (table) for each device. Find a way to group the data structures that are similar into a particular topic, then you can use the Consumer API or Streams API to process the events in near-real time. If you are not really comfortable with writing Java Code for the stream processing, you can also take a look at KSQLDB that allows you to leverage SQL-like syntax to process streams arriving in Kafka Brokers https://ksqldb.io/ These systems are capable of handling a significantly large amount of events per second at scale so I have no doubt that you will be able to figure out how to implement the architecture to resolve your needs. When you have a moment, could you confirm if your events generated by the devices are similar in data structures? I hope this message gives you enough information to get started. Sincerely, Israel Ekpo Lead Instructor, IzzyAcademy.com https://izzyacademy.com/ https://www.youtube.com/c/izzyacademy <https://www.youtube.com/c/izzyacademy> On Thu, Dec 30, 2021 at 5:13 AM Ola Bissani <ola.biss...@easysoft.com.lb> wrote: > Dears, > > > > I'm looking for a way to get real-time updates using my service, I believe > kafka is the way to go but I still have an issue on how to use it. > > > > My system gets data from devices using GPRS, I then read this data and > analyze it to check what action I should do afterwards. I need the > analyzing step to be as fast as possible. I was thinking of two options: > > > > The first option is to gather all the data sent from all the devices into > one huge topic and then getting all the data from this topic and analyzing > it. The downside of this option is that the data analysis step is delaying > my work since I was to loop through the topic data, on the other hand the > advantage is that I have a manageable number of topics ( only 1 topic). > > > > The other option is to divide the data I'm gathering into several small > topics by allowing each device to have its own topic, take into > consideration that the number of devices is large, I'm talking about more > that 5000 devices. The downside of this option is that I have thousands of > topics, where the advantage is that each topic will have a manageable > amount of data allowing me to get my analysis done in much more reasonable > time. > > > > Can you advise on what option is better and whether there is a third > option that I'm not considering, > > *Best Regards* > > *Ola Bissani* > > Developer Manager > > *Easysoft* > > Mobile Lebanon : +961 3 61 16 90 > > Office Lebanon :+961 1 33 55 15/17 > > E mail: ola.biss...@easysoft.com.lb > > web site:www.easysoft.com.lb > > *"Tailored to Perfection"* > > > [image: image1] [image: most innov 2017 final logo][image: Description: > Description: easysoft-logo transparent2012] > > The information transmitted is intended only for the person or entity to > which it is addressed and it may contain proprietary, > business-confidential, and/or legally privileged information. If you are > not the intended recipient of this email you are hereby notified that any > use, review, retransmission, dissemination, distribution, reproduction or > any other action taken in reliance upon this email is strictly prohibited. > If you have received this email in error, please contact the sender and > delete this email and its contents from any computer. Any views expressed > in this email are those of the individual sender and may not necessarily > reflect the views of the > company. > Please consider the environmet before printing this email. > > >