That really was a helpful overview, Israel. Might make a good blog post! 😀
Ola, C# would make it so that you can’t use Kafka Streams, but you may not need it. The Kafka Consumer API, which is available in C#, might be enough for you. For a good explanation of topics, partitions, and pretty much everything else Israel mentioned, I would suggest you go to http://developer.confluent.io There you’ll find free video courses, quick-starts, tutorials, and more. Sounds like you are at the beginning of an exciting journey! Enjoy! Dave > On Dec 30, 2021, at 8:29 AM, Ola Bissani <ola.biss...@easysoft.com.lb> wrote: > > Dear Israel, > > Thank you so much for your support, I will check the links you sent in your > email to start my service. > > As for your question, yes the events generated by the devices are similar in > data structures. I would also like to state that my service will be either > done in java or C#. Would using C# be an issue? Also is there some link you > recommend I can check before writing my code. > > I have also one more question, in your mail you mentioned using one topic > with many partitions, I would like to state that the number of devices I'm > using is dynamic, are you suggesting I create a partition for each device and > would it be possible if I don't know the exact number of devices I have, or > should I create multiple partition for the purpose of multi-processing only? > > Thank you, > > Best Regards > Ola Bissani > Developer Manager > Easysoft > Mobile Lebanon : +961 3 61 16 90 > Office Lebanon :+961 1 33 55 15/17 > E mail: ola.biss...@easysoft.com.lb > web site:www.easysoft.com.lb > "Tailored to Perfection" > > > The information transmitted is intended only for the person or entity to > which it is addressed and it may contain proprietary, business-confidential, > and/or legally privileged information. If you are not the intended recipient > of this email you are hereby notified that any use, review, retransmission, > dissemination, distribution, reproduction or any other action taken in > reliance upon this email is strictly prohibited. If you have received this > email in error, please contact the sender and delete this email and its > contents from any computer. Any views expressed in this email are those of > the individual sender and may not necessarily reflect the views of the > company. > Please > consider the environmet before printing this email. > > -----Original Message----- > From: Israel Ekpo <israele...@gmail.com> > Sent: Thursday, December 30, 2021 3:47 PM > To: Users <users@kafka.apache.org> > Subject: Re: Kafka-Real Time Update > > Ola, > > Let's review the Apache Kafka ecosystem briefly, and then I will make an > attempt to address your concerns: > > In the Kafka Ecosystem, we have the following components: > > - Brokers (stores events in logical containers called Topics. Topics are > analogous to Tables in relational databases like MySQL or PostgreSQL) > - Producers (the generate events and sends them to the brokers for storage > - Consumers (picks up the events from the Topics and processes or consumes > them) > - Streams (at a high level combines Consumer and Producer mechanism to > process events in near real time and send them back to the Topics) > - Schema Registry (keeps track of data structures in the topics. Can be used > for Avro, JSON, Protobuf formats) > > https://kafka.apache.org/documentation/#api > > https://github.com/confluentinc/schema-registry > > There are two main things to consider here in your scenario. > > Each of the devices is a prospective Producer of events that will be sent to > the topic. > > You don't necessarily need to dedicate topics uniquely for each producer just > like how you will not need to create a table for each customer record that > you need to store. > Events sent to a topic are generally grouped together because they have > similar data structure, so if your devices are generating messages with the > same data structure, then regardless of the number of devices, you should > still be able to send them to the same topic. Just make sure that you have > enough partitions and you should be able to consume them in parallel. The > partition count is important because the maximum number of consumers within a > group of Consumers is limited by default by the number of partitions in the > topic. If you are looking to have up to let's say 50 parallel processors in > your Consumer Group then you need to specify 50 partitions when creating the > topic > > Nevertheless, with the parallel consumer you can mitigate this partition > limitation by using the parallel consumer by Confluent to process your events > with key-based ordering. > > https://github.com/confluentinc/parallel-consumer > > Key-Based ordering essentially eliminates this limitation > https://github.com/confluentinc/parallel-consumer#ordered-by-key > > The second item of consideration is that you wanted to "loop" to process the > events. I don't think you need to do this. You can consider the Streams API, > to process your events as they arrive without needing to do this > > https://kafka.apache.org/30/documentation/streams/ > > The Streams API has so many built-in mechanisms that allow you to just focus > on how to process, join and aggregate your events as they arrive at the > topics without the need to loop > > I definitely would not recommend having a topic (table) for each device. > Find a way to group the data structures that are similar into a particular > topic, then you can use the Consumer API or Streams API to process the events > in near-real time. > > If you are not really comfortable with writing Java Code for the stream > processing, you can also take a look at KSQLDB that allows you to leverage > SQL-like syntax to process streams arriving in Kafka Brokers > > https://ksqldb.io/ > > These systems are capable of handling a significantly large amount of events > per second at scale so I have no doubt that you will be able to figure out > how to implement the architecture to resolve your needs. > > When you have a moment, could you confirm if your events generated by the > devices are similar in data structures? > > I hope this message gives you enough information to get started. > > Sincerely, > > Israel Ekpo > Lead Instructor, IzzyAcademy.com > https://izzyacademy.com/ > https://www.youtube.com/c/izzyacademy > <https://www.youtube.com/c/izzyacademy> > > >> On Thu, Dec 30, 2021 at 5:13 AM Ola Bissani <ola.biss...@easysoft.com.lb> >> wrote: >> >> Dears, >> >> >> >> I'm looking for a way to get real-time updates using my service, I >> believe kafka is the way to go but I still have an issue on how to use it. >> >> >> >> My system gets data from devices using GPRS, I then read this data and >> analyze it to check what action I should do afterwards. I need the >> analyzing step to be as fast as possible. I was thinking of two options: >> >> >> >> The first option is to gather all the data sent from all the devices >> into one huge topic and then getting all the data from this topic and >> analyzing it. The downside of this option is that the data analysis >> step is delaying my work since I was to loop through the topic data, >> on the other hand the advantage is that I have a manageable number of topics >> ( only 1 topic). >> >> >> >> The other option is to divide the data I'm gathering into several >> small topics by allowing each device to have its own topic, take into >> consideration that the number of devices is large, I'm talking about >> more that 5000 devices. The downside of this option is that I have >> thousands of topics, where the advantage is that each topic will have >> a manageable amount of data allowing me to get my analysis done in >> much more reasonable time. >> >> >> >> Can you advise on what option is better and whether there is a third >> option that I'm not considering, >> >> *Best Regards* >> >> *Ola Bissani* >> >> Developer Manager >> >> *Easysoft* >> >> Mobile Lebanon : +961 3 61 16 90 >> >> Office Lebanon :+961 1 33 55 15/17 >> >> E mail: ola.biss...@easysoft.com.lb >> >> web site:www.easysoft.com.lb >> >> *"Tailored to Perfection"* >> >> >> [image: image1] [image: most innov 2017 final logo][image: Description: >> Description: easysoft-logo transparent2012] >> >> The information transmitted is intended only for the person or entity >> to which it is addressed and it may contain proprietary, >> business-confidential, and/or legally privileged information. If you >> are not the intended recipient of this email you are hereby notified >> that any use, review, retransmission, dissemination, distribution, >> reproduction or any other action taken in reliance upon this email is >> strictly prohibited. >> If you have received this email in error, please contact the sender >> and delete this email and its contents from any computer. Any views >> expressed in this email are those of the individual sender and may not >> necessarily reflect the views of the company. >> Please consider the environmet before printing this email. >> >> >> >