Apache log4j 1.x vulnerability mitigations on Kafka

2022-01-27 Thread Karupasamy S
Hi Team,


We are using Apache Kafka as part of the ELK stack and we have 
an internal tool to find the vulnerabilities present on all the products/3pp 
which we use in our product.

So we received the below vulnerabilities on log4j:

CVE-2022-23302, CVE-2022-23305, CVE-2022-23307

Since Kafka is using log4j internally we are also applicable to 
these vulnerabilities. Hence our security team is asking us to mitigate these 
vulnerabilities before releasing our product to the market.

On analyzing further, we found for the CVE CVE-2022-23307, 
there is a mitigation plan proposed by Kafka, in the below-mentioned article:
https://kafka.apache.org/cve-list

[cid:image001.png@01D81397.8E0DAD00]

But in the same article we, didn't find any information for the 
CVEs CVE-2022-23302, CVE-2022-23305.

So kindly help us in clarifying the below queries:


  1.  Are the CVEs CVE-2022-23302, CVE-2022-23305 applicable to the Apache 
Kafka? If so, how to mitigate these vulnerabilities, and will be there be any 
patch/fix that will be released?


  1.  If not vulnerable, Can we remove the following vulnerable classes from 
the log4j jar?

zip -q -d log4j-*.jar org/apache/log4j/net/JMSSink.class
zip -q -d log4j-*.jar org/apache/log4j/jdbc/JDBCAppender.class


  1.  Will there be any impact on Kafka's functionalities after removing the 
above-mentioned classes?



Thanks & Regards
Karupasamy



Random continuous NetworkException on client and EOFException on server.log

2022-01-27 Thread Deepak Jain
Hello Everyone,

We are using Kafka 2.8.1 Broker/Client system in our prod env.

Getting following exception randomly after 1 hour or so.

java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NetworkException: Disconnected from node 0
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:98)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:81)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)

After enabling debug log on Kafka, found out below EOF exception is also thrown 
almost at same time when the above NetworkException is thrown.

DEBUG [SocketServer listenerType=ZK_BROKER, nodeId=0] Connection with /IP 
disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
at kafka.network.Processor.poll(SocketServer.scala:989)
at kafka.network.Processor.run(SocketServer.scala:892)
at java.lang.Thread.run(Thread.java:748)

This exception does not seems to be having any impact on the actual data 
transfer but they are still coming in the logs.

Can anyone please let me know the reason behind them and help me find out the 
root cause for it?

Regards,
Deepak



RE: Random continuous NetworkException on client and EOFException on server.log

2022-01-27 Thread Deepak Jain
Hello Everyone,

Alongwith the below exception, also getting in stderr

org.apache.kafka.clients.producer.internals.Sender completeBatch
WARNING: [Producer clientId=producer-1] Received invalid metadata error in 
produce request on partition  due to 
org.apache.kafka.common.errors.NetworkException: Disconnected from node 0. 
Going to request metadata update now

-Original Message-
From: Deepak Jain  
Sent: 27 January 2022 21:14
To: users@kafka.apache.org
Subject: Random continuous NetworkException on client and EOFException on 
server.log

Hello Everyone,

We are using Kafka 2.8.1 Broker/Client system in our prod env.

Getting following exception randomly after 1 hour or so.

java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.NetworkException: Disconnected from node 0
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:98)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:81)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)

After enabling debug log on Kafka, found out below EOF exception is also thrown 
almost at same time when the above NetworkException is thrown.

DEBUG [SocketServer listenerType=ZK_BROKER, nodeId=0] Connection with /IP 
disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402)
at 
org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
at kafka.network.Processor.poll(SocketServer.scala:989)
at kafka.network.Processor.run(SocketServer.scala:892)
at java.lang.Thread.run(Thread.java:748)

This exception does not seems to be having any impact on the actual data 
transfer but they are still coming in the logs.

Can anyone please let me know the reason behind them and help me find out the 
root cause for it?

Regards,
Deepak



Event Streaming Open Network over the Internet

2022-01-27 Thread Emiliano Spinella
Hello everybody out there using Apache Kafka,

I have been designing a (free and open) way to connect event streams of
different event brokers, including of course Apache Kafka. The motivation
is to facilitate cross-organizational event flow connections over the
Internet.

One can use MirrorMaker 2 when needing to connect topics across Apache
Kafka instances. However, this may not be the best approach when the
instances are owned by different organizations. On the other hand, if we
need to connect events from an Apache Kafka instance to a different
technology (i.e. Apache Pulsar, RabbitMQ) the only alternative is to
develop your own stream application.

The idea of an Event Streaming Open Network is to have an open framework
for the discovery, name resolution and overall communication of event
streams. The participants of this network could use whatever event broker
implementation they want. They would be able to consume event flows
independently of the event broker used by the peer, and vice versa.

Imagine you can share an event flow just by sending your peer an URI like
flow://temperature.office.mycompany.com and have the network discover the
connection details. You would avoid hardcoded details like
bootstrap_servers, topic name, tls, etc. If in the future you need to move
an event flow to a different instance/technology, the consumers and
producers would not need any refactoring (as long as they resolve the event
flow using its URI).

Here is a diagram of the interaction between two different network
participants:

[image: image.png]

You can take a look at the full specification in the following IETF
Internet Draft:

https://github.com/syndeno/draft-spinella-event-streaming-open-network

Finally, I have built an initial implementation of the network components.
I will be posting it to this mailing list in the following weeks in case
anyone is interested in seeing how it works.

Any comment or question is extremely welcome!

Thanks,
Emiliano

-- 
_LEGAL NOTICE: The content of this email message, including the attached 
files, is confidential and is protected by article 18.3 of the Spanish 
Constitution, which guarantees the secrecy of communications. If you 
receive this message in error, please contact the sender to inform them of 
this fact, and do not broadcast its content or make copies.
_
_*** This 
message has been verified with removal tools for viruses and malicious 
content ***
_
_This legal notice has been automatically incorporated into 
the message.
_

*-*
*AVISO 
LEGAL: El contenido de este mensaje de correo electrónico, incluidos los 
ficheros adjuntos, es confidencial y está protegido por el artículo 18.3 de 
la Constitución Española, que garantiza el secreto de las comunicaciones. 
Si usted recibe este mensaje por error, por favor póngase en contacto con 
el remitente para informarle de este hecho, y no difunda su contenido ni 
haga copias.
*
_*** Este mensaje ha sido verificado con herramientas de 
eliminación de virus y contenido malicioso ***
_
_Este aviso legal ha sido 
incorporado automáticamente al mensaje._


Re: Kafka Topics

2022-01-27 Thread Edward Capriolo
On Thursday, December 30, 2021, Suresh Chidambaram 
wrote:

> Hi Ola,
>
> I would suggest you can go with single Topic with multiple partitions. Once
> the data gets received from the Topic, you can do a DB update kind of a
> stuff to store the data , then use the data for analysing.
>
> Also, the below URL can be used to do the Topic sizing.
>
> eventsizer.io
>
>
> Thanks
> C Suresh
>
> On Thursday, December 30, 2021, Ola Bissani 
> wrote:
>
> > Dears,
> >
> > I'm looking for a way to get real-time updates using my service, I
> believe
> > kafka is the way to go but I still have an issue on how to use it.
> >
> > My system gets data from devices using GPRS, I then read this data and
> > analyze it to check what action I should do afterwards. I need the
> > analyzing step to be as fast as possible. I was thinking of two options:
> >
> > The first option is to gather all the data sent from all the devices into
> > one huge topic and then getting all the data from this topic and
> analyzing
> > it. The downside of this option is that the data analysis step is
> delaying
> > my work since I was to loop through the topic data, on the other hand the
> > advantage is that I have a manageable number of topics ( only 1 topic).
> >
> > The other option is to divide the data I'm gathering into several small
> > topics by allowing each device to have its own topic, take into
> > consideration that the number of devices is large, I'm talking about more
> > that 5000 devices. The downside of this option is that I have thousands
> of
> > topics, where the advantage is that each topic will have a manageable
> > amount of data allowing me to get my analysis done in much more
> reasonable
> > time.
> >
> > Can you advise on what option is better and whether there is a third
> > option that I'm not considering,
> >
> >
> >
> > Best Regards
> > Ola Bissani
> > Developer Manager
> > Easysoft
> > Mobile Lebanon   : +961   3 61 16 90
> > Office Lebanon  :+961   1 33 55 15/17
> > E mail: ola.biss...@easysoft.com.lb
> > web site:www.easysoft.com.lb
> > "Tailored to Perfection"
> >
> >
> > The information transmitted is intended only for the person or entity to
> > which it is addressed and it may contain proprietary,
> > business-confidential, and/or legally privileged information. If you are
> > not the intended recipient of this email you are hereby notified that any
> > use, review, retransmission, dissemination, distribution, reproduction or
> > any other action taken in reliance upon this email is strictly
> prohibited.
> > If you have received this email in error, please contact the sender and
> > delete this email and its contents from any computer. Any views expressed
> > in this email are those of the individual sender and may not necessarily
> > reflect the views of the company.
> >
> >Please consider the environmet before printing this email.
> >
> >
> > -Original Message-
> > From: Wes Peng 
> > Sent: Thursday, December 23, 2021 10:11 PM
> > To: users@kafka.apache.org
> > Subject: Re: Kafka Topics
> >
> > That depends on your resources such as ram, disk etc. General speaking
> > there is no problem.
> >
> > Regards
> >
> >
> > > Dears,
> > >
> > > I'm new to using Kafka, and I was wondering up to how many topics can
> > > Kafka Handle. I'm trying to use Kafka but using it I will be obliged
> > > to create thousands of topics to keep up with my data. Will Kafka on
> > > my server handle this kind of data?
> > >
> > > Thank you,
> > >
> > >
> > > Best Regards
> > >
> > > Ola Bissani
> > >
> > > Developer Manager
> > >
> > > Easysoft
> > >
> > > Mobile Lebanon   : +961   3 61 16 90
> > >
> > > Office Lebanon  :+961   1 33 55 15/17
> > >
> > > E mail: ola.biss...@easysoft.com.lb
> > >
> > > web site:www.easysoft.com.lb
> > >
> > > "Tailored to Perfection"
> > >
> > >
> > >
> > >
> > > The information transmitted is intended only for the person or entity
> > > to which it is addressed and it may contain proprietary,
> > > business-confidential, and/or legally privileged information. If you
> > > are not the intended recipient of this email you are hereby notified
> > > that any use, review, retransmission, dissemination, distribution,
> > > reproduction or any other action taken in reliance upon this email is
> > strictly prohibited.
> > > If you have received this email in error, please contact the sender
> > > and delete this email and its contents from any computer. Any views
> > > expressed in this email are those of the individual sender and may not
> > > necessarily reflect the views of the company.
> > >
> > >Please consider the environmet before printing this
> email.
> > >
> > >
> >
> >
>


Please dont listen to the folks that say "you can have a smany as you
want". You cant. Here is why.

Each topic is divided into partitions,  each partition is replicated , each
partition replica lives on a disk.

The higher your retention .. days weeks you 

Re: Huge latency at consumer side ,testing performance for production and consumption

2022-01-27 Thread Jigar Shah
Hello again,
Could someone please provide feedback on these findings ?
Thank you in advance for feedback.

*Regards,*
*Jigar*



On Mon, 17 Jan 2022 at 13:24, Jigar Shah  wrote:

> Hello again,
> I had performed a few more tests on producer and consumer again and I
> observed a pattern in Kafka Producer creating large latency.
> Could you please confirm that my understanding is correct about the
> producer protocol?
>
> The configurations are the same as above.
>
> The producer is continuously producing messages into kafka topic, using
> the default producer partitioner creating messages in random
> topic-partitions
>
> The workflow of protocol according to my understanding is:
> 1. First connection from producer to a broker (1 out of 3) in the cluster
> to fetch metadata.
> 2. If the partition to produce is located on the same broker then
>a. Re-use the existing connection to produce messages.
> 3. Else if the partition to produce is located on one of other brokers then
>a. Create a new connection
>b. Fetch again metadata.
>c. Produce the message using the new connection
>
> After analysis, I assume the latency is caused at step *3.a & 3.b *when
> the partition selected is on the other two brokers.  Such peaks are
> observed during initial part of test only
> [image: image.png]
> Thank you in advance for feedback.
>
> *Regards,*
> *Jigar*
>
>
> On Wed, 15 Dec 2021 at 10:53, Jigar Shah  wrote:
>
>> Hello,
>> I agree with time taken for consumer initialization processes
>> But actually in the test I am taking care of that and I am waiting for
>> the consumer to be initiated and only then starting the producer to
>> discount the initialization delay.
>> So, are there any more processes happening during the poll of consumers
>> for the first few messages?
>>
>> Thank you
>>
>> On Mon, 13 Dec 2021 at 18:33, Luke Chen  wrote:
>>
>>> Hi Jigar,
>>>
>>> As Liam mentioned, those are necessary consumer initialization processes.
>>> So, I don't think you can speed it up by altering some timeouts/interval
>>> properties.
>>> Is there any reason why you need to care about the initial delay?
>>> If, like you said, the delay won't happen later on, I think the cost will
>>> be amortized.
>>>
>>>
>>> Thank you.
>>> Luke
>>>
>>>
>>> On Mon, Dec 13, 2021 at 4:59 PM Jigar Shah 
>>> wrote:
>>>
>>> > Hello ,
>>> > Answering your first mail, indeed I am using consumer groups using
>>> > group.id
>>> > , I must have missed to add it in mentioned properties
>>> > Also, thank you for information regarding the internal processes
>>> working
>>> > behind creating a KafkaConsumer.
>>> > I agree that following steps do add latency during initial connection
>>> > creation.But can it be somehow optimised(reduced) ,by altering some
>>> > timeouts/interval properties, could you please suggest those?
>>> >
>>> > Thank you
>>> >
>>> > On Mon, 13 Dec 2021 at 12:05, Liam Clarke-Hutchinson <
>>> lclar...@redhat.com>
>>> > wrote:
>>> >
>>> > > I realise that's a silly question, you must be if you're using auto
>>> > commit.
>>> > >
>>> > > When a consumer starts, it needs to do a few things.
>>> > >
>>> > > 1) Connect to a bootstrap server
>>> > >
>>> > > 2) Join an existing consumer group, or create a new one, if it
>>> doesn't
>>> > > exist. This may cause a stop the world rebalance as partitions are
>>> > > reassigned within the group.
>>> > >
>>> > > 3) Acquire metadata - which brokers are the partition leaders for my
>>> > > assigned partitions on? And what offsets am I consuming from?
>>> > >
>>> > > 4) Establish the long lived connections to those brokers.
>>> > >
>>> > > 5) Send fetch requests
>>> > >
>>> > > (I might not have the order correct)
>>> > >
>>> > > So yeah, this is why you're seeing that initial delay before
>>> consuming
>>> > > records.
>>> > >
>>> > > Kind regards,
>>> > >
>>> > > Liam Clarke-Hutchinson
>>> > >
>>> > > On Mon, 13 Dec 2021, 7:19 pm Liam Clarke-Hutchinson, <
>>> > lclar...@redhat.com>
>>> > > wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > I'm assuming you're using consumer groups? E.g., group.id=X
>>> > > >
>>> > > > Cheers,
>>> > > >
>>> > > > Liam
>>> > > >
>>> > > > On Mon, 13 Dec 2021, 6:30 pm Jigar Shah, >> >
>>> > > wrote:
>>> > > >
>>> > > >> Hello,
>>> > > >> I am trying to test the latency between message production and
>>> message
>>> > > >> consumption using Java Kafka-Client*(2.7.2)* library.
>>> > > >> The configuration of cluster is 3 KafkaBrokers*(2.7.2, Scala
>>> 2.13)*, 3
>>> > > >> Zookeeper*(3.5.9)*
>>> > > >> Here is a pattern what I have observed
>>> > > >> Reference:
>>> > > >>  ConsumerReadTimeStamp: Timestamp when record received in Kafka
>>> > Consumer
>>> > > >>  ProducerTimeStamp: Timestamp added before producer.send record
>>> > > >>  RecordTimeStamp: CreateTimeStamp inside the record obtained at
>>> > consumer
>>> > > >>
>>> > > >> [image: kafka1.png]
>>> > > >>
>>> > > >> *For 100 Messages*
>>> > > >>
>>> > > >> *ConsumerReadTimeStamp-ProducerTimeStamp(ms)*
>>