Apache log4j 1.x vulnerability mitigations on Kafka
Hi Team, We are using Apache Kafka as part of the ELK stack and we have an internal tool to find the vulnerabilities present on all the products/3pp which we use in our product. So we received the below vulnerabilities on log4j: CVE-2022-23302, CVE-2022-23305, CVE-2022-23307 Since Kafka is using log4j internally we are also applicable to these vulnerabilities. Hence our security team is asking us to mitigate these vulnerabilities before releasing our product to the market. On analyzing further, we found for the CVE CVE-2022-23307, there is a mitigation plan proposed by Kafka, in the below-mentioned article: https://kafka.apache.org/cve-list [cid:image001.png@01D81397.8E0DAD00] But in the same article we, didn't find any information for the CVEs CVE-2022-23302, CVE-2022-23305. So kindly help us in clarifying the below queries: 1. Are the CVEs CVE-2022-23302, CVE-2022-23305 applicable to the Apache Kafka? If so, how to mitigate these vulnerabilities, and will be there be any patch/fix that will be released? 1. If not vulnerable, Can we remove the following vulnerable classes from the log4j jar? zip -q -d log4j-*.jar org/apache/log4j/net/JMSSink.class zip -q -d log4j-*.jar org/apache/log4j/jdbc/JDBCAppender.class 1. Will there be any impact on Kafka's functionalities after removing the above-mentioned classes? Thanks & Regards Karupasamy
Random continuous NetworkException on client and EOFException on server.log
Hello Everyone, We are using Kafka 2.8.1 Broker/Client system in our prod env. Getting following exception randomly after 1 hour or so. java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: Disconnected from node 0 at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:98) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:81) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) After enabling debug log on Kafka, found out below EOF exception is also thrown almost at same time when the above NetworkException is thrown. DEBUG [SocketServer listenerType=ZK_BROKER, nodeId=0] Connection with /IP disconnected (org.apache.kafka.common.network.Selector) java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402) at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576) at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at kafka.network.Processor.poll(SocketServer.scala:989) at kafka.network.Processor.run(SocketServer.scala:892) at java.lang.Thread.run(Thread.java:748) This exception does not seems to be having any impact on the actual data transfer but they are still coming in the logs. Can anyone please let me know the reason behind them and help me find out the root cause for it? Regards, Deepak
RE: Random continuous NetworkException on client and EOFException on server.log
Hello Everyone, Alongwith the below exception, also getting in stderr org.apache.kafka.clients.producer.internals.Sender completeBatch WARNING: [Producer clientId=producer-1] Received invalid metadata error in produce request on partition due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 0. Going to request metadata update now -Original Message- From: Deepak Jain Sent: 27 January 2022 21:14 To: users@kafka.apache.org Subject: Random continuous NetworkException on client and EOFException on server.log Hello Everyone, We are using Kafka 2.8.1 Broker/Client system in our prod env. Getting following exception randomly after 1 hour or so. java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.NetworkException: Disconnected from node 0 at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:98) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:81) at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30) After enabling debug log on Kafka, found out below EOF exception is also thrown almost at same time when the above NetworkException is thrown. DEBUG [SocketServer listenerType=ZK_BROKER, nodeId=0] Connection with /IP disconnected (org.apache.kafka.common.network.Selector) java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402) at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576) at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at kafka.network.Processor.poll(SocketServer.scala:989) at kafka.network.Processor.run(SocketServer.scala:892) at java.lang.Thread.run(Thread.java:748) This exception does not seems to be having any impact on the actual data transfer but they are still coming in the logs. Can anyone please let me know the reason behind them and help me find out the root cause for it? Regards, Deepak
Event Streaming Open Network over the Internet
Hello everybody out there using Apache Kafka, I have been designing a (free and open) way to connect event streams of different event brokers, including of course Apache Kafka. The motivation is to facilitate cross-organizational event flow connections over the Internet. One can use MirrorMaker 2 when needing to connect topics across Apache Kafka instances. However, this may not be the best approach when the instances are owned by different organizations. On the other hand, if we need to connect events from an Apache Kafka instance to a different technology (i.e. Apache Pulsar, RabbitMQ) the only alternative is to develop your own stream application. The idea of an Event Streaming Open Network is to have an open framework for the discovery, name resolution and overall communication of event streams. The participants of this network could use whatever event broker implementation they want. They would be able to consume event flows independently of the event broker used by the peer, and vice versa. Imagine you can share an event flow just by sending your peer an URI like flow://temperature.office.mycompany.com and have the network discover the connection details. You would avoid hardcoded details like bootstrap_servers, topic name, tls, etc. If in the future you need to move an event flow to a different instance/technology, the consumers and producers would not need any refactoring (as long as they resolve the event flow using its URI). Here is a diagram of the interaction between two different network participants: [image: image.png] You can take a look at the full specification in the following IETF Internet Draft: https://github.com/syndeno/draft-spinella-event-streaming-open-network Finally, I have built an initial implementation of the network components. I will be posting it to this mailing list in the following weeks in case anyone is interested in seeing how it works. Any comment or question is extremely welcome! Thanks, Emiliano -- _LEGAL NOTICE: The content of this email message, including the attached files, is confidential and is protected by article 18.3 of the Spanish Constitution, which guarantees the secrecy of communications. If you receive this message in error, please contact the sender to inform them of this fact, and do not broadcast its content or make copies. _ _*** This message has been verified with removal tools for viruses and malicious content *** _ _This legal notice has been automatically incorporated into the message. _ *-* *AVISO LEGAL: El contenido de este mensaje de correo electrónico, incluidos los ficheros adjuntos, es confidencial y está protegido por el artículo 18.3 de la Constitución Española, que garantiza el secreto de las comunicaciones. Si usted recibe este mensaje por error, por favor póngase en contacto con el remitente para informarle de este hecho, y no difunda su contenido ni haga copias. * _*** Este mensaje ha sido verificado con herramientas de eliminación de virus y contenido malicioso *** _ _Este aviso legal ha sido incorporado automáticamente al mensaje._
Re: Kafka Topics
On Thursday, December 30, 2021, Suresh Chidambaram wrote: > Hi Ola, > > I would suggest you can go with single Topic with multiple partitions. Once > the data gets received from the Topic, you can do a DB update kind of a > stuff to store the data , then use the data for analysing. > > Also, the below URL can be used to do the Topic sizing. > > eventsizer.io > > > Thanks > C Suresh > > On Thursday, December 30, 2021, Ola Bissani > wrote: > > > Dears, > > > > I'm looking for a way to get real-time updates using my service, I > believe > > kafka is the way to go but I still have an issue on how to use it. > > > > My system gets data from devices using GPRS, I then read this data and > > analyze it to check what action I should do afterwards. I need the > > analyzing step to be as fast as possible. I was thinking of two options: > > > > The first option is to gather all the data sent from all the devices into > > one huge topic and then getting all the data from this topic and > analyzing > > it. The downside of this option is that the data analysis step is > delaying > > my work since I was to loop through the topic data, on the other hand the > > advantage is that I have a manageable number of topics ( only 1 topic). > > > > The other option is to divide the data I'm gathering into several small > > topics by allowing each device to have its own topic, take into > > consideration that the number of devices is large, I'm talking about more > > that 5000 devices. The downside of this option is that I have thousands > of > > topics, where the advantage is that each topic will have a manageable > > amount of data allowing me to get my analysis done in much more > reasonable > > time. > > > > Can you advise on what option is better and whether there is a third > > option that I'm not considering, > > > > > > > > Best Regards > > Ola Bissani > > Developer Manager > > Easysoft > > Mobile Lebanon : +961 3 61 16 90 > > Office Lebanon :+961 1 33 55 15/17 > > E mail: ola.biss...@easysoft.com.lb > > web site:www.easysoft.com.lb > > "Tailored to Perfection" > > > > > > The information transmitted is intended only for the person or entity to > > which it is addressed and it may contain proprietary, > > business-confidential, and/or legally privileged information. If you are > > not the intended recipient of this email you are hereby notified that any > > use, review, retransmission, dissemination, distribution, reproduction or > > any other action taken in reliance upon this email is strictly > prohibited. > > If you have received this email in error, please contact the sender and > > delete this email and its contents from any computer. Any views expressed > > in this email are those of the individual sender and may not necessarily > > reflect the views of the company. > > > >Please consider the environmet before printing this email. > > > > > > -Original Message- > > From: Wes Peng > > Sent: Thursday, December 23, 2021 10:11 PM > > To: users@kafka.apache.org > > Subject: Re: Kafka Topics > > > > That depends on your resources such as ram, disk etc. General speaking > > there is no problem. > > > > Regards > > > > > > > Dears, > > > > > > I'm new to using Kafka, and I was wondering up to how many topics can > > > Kafka Handle. I'm trying to use Kafka but using it I will be obliged > > > to create thousands of topics to keep up with my data. Will Kafka on > > > my server handle this kind of data? > > > > > > Thank you, > > > > > > > > > Best Regards > > > > > > Ola Bissani > > > > > > Developer Manager > > > > > > Easysoft > > > > > > Mobile Lebanon : +961 3 61 16 90 > > > > > > Office Lebanon :+961 1 33 55 15/17 > > > > > > E mail: ola.biss...@easysoft.com.lb > > > > > > web site:www.easysoft.com.lb > > > > > > "Tailored to Perfection" > > > > > > > > > > > > > > > The information transmitted is intended only for the person or entity > > > to which it is addressed and it may contain proprietary, > > > business-confidential, and/or legally privileged information. If you > > > are not the intended recipient of this email you are hereby notified > > > that any use, review, retransmission, dissemination, distribution, > > > reproduction or any other action taken in reliance upon this email is > > strictly prohibited. > > > If you have received this email in error, please contact the sender > > > and delete this email and its contents from any computer. Any views > > > expressed in this email are those of the individual sender and may not > > > necessarily reflect the views of the company. > > > > > >Please consider the environmet before printing this > email. > > > > > > > > > > > Please dont listen to the folks that say "you can have a smany as you want". You cant. Here is why. Each topic is divided into partitions, each partition is replicated , each partition replica lives on a disk. The higher your retention .. days weeks you
Re: Huge latency at consumer side ,testing performance for production and consumption
Hello again, Could someone please provide feedback on these findings ? Thank you in advance for feedback. *Regards,* *Jigar* On Mon, 17 Jan 2022 at 13:24, Jigar Shah wrote: > Hello again, > I had performed a few more tests on producer and consumer again and I > observed a pattern in Kafka Producer creating large latency. > Could you please confirm that my understanding is correct about the > producer protocol? > > The configurations are the same as above. > > The producer is continuously producing messages into kafka topic, using > the default producer partitioner creating messages in random > topic-partitions > > The workflow of protocol according to my understanding is: > 1. First connection from producer to a broker (1 out of 3) in the cluster > to fetch metadata. > 2. If the partition to produce is located on the same broker then >a. Re-use the existing connection to produce messages. > 3. Else if the partition to produce is located on one of other brokers then >a. Create a new connection >b. Fetch again metadata. >c. Produce the message using the new connection > > After analysis, I assume the latency is caused at step *3.a & 3.b *when > the partition selected is on the other two brokers. Such peaks are > observed during initial part of test only > [image: image.png] > Thank you in advance for feedback. > > *Regards,* > *Jigar* > > > On Wed, 15 Dec 2021 at 10:53, Jigar Shah wrote: > >> Hello, >> I agree with time taken for consumer initialization processes >> But actually in the test I am taking care of that and I am waiting for >> the consumer to be initiated and only then starting the producer to >> discount the initialization delay. >> So, are there any more processes happening during the poll of consumers >> for the first few messages? >> >> Thank you >> >> On Mon, 13 Dec 2021 at 18:33, Luke Chen wrote: >> >>> Hi Jigar, >>> >>> As Liam mentioned, those are necessary consumer initialization processes. >>> So, I don't think you can speed it up by altering some timeouts/interval >>> properties. >>> Is there any reason why you need to care about the initial delay? >>> If, like you said, the delay won't happen later on, I think the cost will >>> be amortized. >>> >>> >>> Thank you. >>> Luke >>> >>> >>> On Mon, Dec 13, 2021 at 4:59 PM Jigar Shah >>> wrote: >>> >>> > Hello , >>> > Answering your first mail, indeed I am using consumer groups using >>> > group.id >>> > , I must have missed to add it in mentioned properties >>> > Also, thank you for information regarding the internal processes >>> working >>> > behind creating a KafkaConsumer. >>> > I agree that following steps do add latency during initial connection >>> > creation.But can it be somehow optimised(reduced) ,by altering some >>> > timeouts/interval properties, could you please suggest those? >>> > >>> > Thank you >>> > >>> > On Mon, 13 Dec 2021 at 12:05, Liam Clarke-Hutchinson < >>> lclar...@redhat.com> >>> > wrote: >>> > >>> > > I realise that's a silly question, you must be if you're using auto >>> > commit. >>> > > >>> > > When a consumer starts, it needs to do a few things. >>> > > >>> > > 1) Connect to a bootstrap server >>> > > >>> > > 2) Join an existing consumer group, or create a new one, if it >>> doesn't >>> > > exist. This may cause a stop the world rebalance as partitions are >>> > > reassigned within the group. >>> > > >>> > > 3) Acquire metadata - which brokers are the partition leaders for my >>> > > assigned partitions on? And what offsets am I consuming from? >>> > > >>> > > 4) Establish the long lived connections to those brokers. >>> > > >>> > > 5) Send fetch requests >>> > > >>> > > (I might not have the order correct) >>> > > >>> > > So yeah, this is why you're seeing that initial delay before >>> consuming >>> > > records. >>> > > >>> > > Kind regards, >>> > > >>> > > Liam Clarke-Hutchinson >>> > > >>> > > On Mon, 13 Dec 2021, 7:19 pm Liam Clarke-Hutchinson, < >>> > lclar...@redhat.com> >>> > > wrote: >>> > > >>> > > > Hi, >>> > > > >>> > > > I'm assuming you're using consumer groups? E.g., group.id=X >>> > > > >>> > > > Cheers, >>> > > > >>> > > > Liam >>> > > > >>> > > > On Mon, 13 Dec 2021, 6:30 pm Jigar Shah, >> > >>> > > wrote: >>> > > > >>> > > >> Hello, >>> > > >> I am trying to test the latency between message production and >>> message >>> > > >> consumption using Java Kafka-Client*(2.7.2)* library. >>> > > >> The configuration of cluster is 3 KafkaBrokers*(2.7.2, Scala >>> 2.13)*, 3 >>> > > >> Zookeeper*(3.5.9)* >>> > > >> Here is a pattern what I have observed >>> > > >> Reference: >>> > > >> ConsumerReadTimeStamp: Timestamp when record received in Kafka >>> > Consumer >>> > > >> ProducerTimeStamp: Timestamp added before producer.send record >>> > > >> RecordTimeStamp: CreateTimeStamp inside the record obtained at >>> > consumer >>> > > >> >>> > > >> [image: kafka1.png] >>> > > >> >>> > > >> *For 100 Messages* >>> > > >> >>> > > >> *ConsumerReadTimeStamp-ProducerTimeStamp(ms)* >>