Ola,

Let's review the Apache Kafka ecosystem briefly, and then I will make an
attempt to address your concerns:

In the Kafka Ecosystem, we have the following components:

- Brokers (stores events in logical containers called Topics. Topics are
analogous to Tables in relational databases like MySQL or PostgreSQL)
- Producers (the generate events and sends them to the brokers for storage
- Consumers (picks up the events from the Topics and processes or consumes
them)
- Streams (at a high level combines Consumer and Producer mechanism to
process events in near real time and send them back to the Topics)
- Schema Registry (keeps track of data structures in the topics. Can be
used for Avro, JSON, Protobuf formats)

https://kafka.apache.org/documentation/#api

https://github.com/confluentinc/schema-registry

There are two main things to consider here in your scenario.

Each of the devices is a prospective Producer of events that will be sent
to the topic.

You don't necessarily need to dedicate topics uniquely for each producer
just like how you will not need to create a table for each customer record
that you need to store.
Events sent to a topic are generally grouped together because they have
similar data structure, so if your devices are generating messages with the
same data structure, then regardless of the number of devices, you should
still be able to send them to the same topic. Just make sure that you have
enough partitions and you should be able to consume them in parallel. The
partition count is important because the maximum number of consumers within
a group of Consumers is limited by default by the number of partitions in
the topic. If you are looking to have up to let's say 50 parallel
processors in your Consumer Group then you need to specify 50 partitions
when creating the topic

Nevertheless, with the parallel consumer you can mitigate this partition
limitation by using the parallel consumer by Confluent to process your
events with key-based ordering.

https://github.com/confluentinc/parallel-consumer

Key-Based ordering essentially eliminates this limitation
https://github.com/confluentinc/parallel-consumer#ordered-by-key

The second item of consideration is that you wanted to "loop" to process
the events. I don't think you need to do this. You can consider the Streams
API, to process your events as they arrive without needing to do this

https://kafka.apache.org/30/documentation/streams/

The Streams API has so many built-in mechanisms that allow you to just
focus on how to process, join and aggregate your events as they arrive at
the topics without the need to loop

I definitely would not recommend having a topic (table) for each device.
Find a way to group the data structures that are similar into a particular
topic, then you can use the Consumer API or Streams API to process the
events in near-real time.

If you are not really comfortable with writing Java Code for the stream
processing, you can also take a look at KSQLDB that allows you to leverage
SQL-like syntax to process streams arriving in Kafka Brokers

https://ksqldb.io/

These systems are capable of handling a significantly large amount
of events per second at scale so I have no doubt that you will be able to
figure out how to implement the architecture to resolve your needs.

When you have a moment, could you confirm if your events generated by the
devices are similar in data structures?

I hope this message gives you enough information to get started.

Sincerely,

Israel Ekpo
Lead Instructor, IzzyAcademy.com
https://izzyacademy.com/
https://www.youtube.com/c/izzyacademy
<https://www.youtube.com/c/izzyacademy>


On Thu, Dec 30, 2021 at 5:13 AM Ola Bissani <ola.biss...@easysoft.com.lb>
wrote:

> Dears,
>
>
>
> I'm looking for a way to get real-time updates using my service, I believe
> kafka is the way to go but I still have an issue on how to use it.
>
>
>
> My system gets data from devices using GPRS, I then read this data and
> analyze it to check what action I should do afterwards. I need the
> analyzing step to be as fast as possible. I was thinking of two options:
>
>
>
> The first option is to gather all the data sent from all the devices into
> one huge topic and then getting all the data from this topic and analyzing
> it. The downside of this option is that the data analysis step is delaying
> my work since I was to loop through the topic data, on the other hand the
> advantage is that I have a manageable number of topics ( only 1 topic).
>
>
>
> The other option is to divide the data I'm gathering into several small
> topics by allowing each device to have its own topic, take into
> consideration that the number of devices is large, I'm talking about more
> that 5000 devices. The downside of this option is that I have thousands of
> topics, where the advantage is that each topic will have a manageable
> amount of data allowing me to get my analysis done in much more reasonable
> time.
>
>
>
> Can you advise on what option is better and whether there is a third
> option that I'm not considering,
>
> *Best Regards*
>
> *Ola Bissani*
>
> Developer Manager
>
> *Easysoft*
>
> Mobile Lebanon   : +961       3 61 16 90
>
> Office Lebanon      :+961       1 33 55 15/17
>
> E mail:     ola.biss...@easysoft.com.lb
>
> web site:www.easysoft.com.lb
>
> *"Tailored to Perfection"*
>
>
>   [image: image1] [image: most innov 2017 final logo][image: Description:
> Description: easysoft-logo transparent2012]
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and it may contain proprietary,
> business-confidential, and/or legally privileged information. If you are
> not the intended recipient of this email you are hereby notified that any
> use, review, retransmission, dissemination, distribution, reproduction or
> any other action taken in reliance upon this email is strictly prohibited.
> If you have received this email in error, please contact the sender and
> delete this email and its contents from any computer. Any views expressed
> in this email are those of the individual sender and may not necessarily
> reflect the views of the
> company.
> Please consider the environmet before printing this email.
>
>
>

Reply via email to