Hi Peter,
A HUGE thank you for your suggestion of using ‘retention.ms=-1' for the topic. 
I also explicitly set ‘retention.bytes=-1'. With this combination, kafka is 
certainly not deleting the segment logs and I am able to run multiple instances 
of console consumers to read data. I am still confused about the behavior that 
I see when I set 'retention.ms=31449600000' and 'retention.bytes=10737418240'. 
I would love to understand why Kafka is deleting messages.
I am noticing that each segment is ~ 100MB in size when the default value for 
'segment.bytes' is ~ 1GB. Another example of Kafka not behaving according to 
what's documented.
Thanks once again for your suggestion.
Sachin

    On Wednesday, July 17, 2019, 10:46:13 PM EDT, Peter Bukowinski 
<pmb...@gmail.com> wrote:  
 
 Indeed, something seems wrong. I have a kafka (2.0.1) cluster that aggregates 
data from multiple locations. It has so much data moving through it I can’t 
afford to keep more than 24 hours on disk. The retention is working correctly. 
I don’t restrict topics by size, only by time.

What version of kafka are you using?

Looking back at the example log directory listing, I see that you mentioned 
seeing xxxx.log.deleted files. Yes, that means kafka tagged that log segment 
for deletion, and then the cleanup process removed it soon after. Something is 
causing your data to be cleaned, despite your retention overrides.

Can you try removing 'retention.bytes’ and setting ‘retention.ms=-1' for the 
topic? That should persist the data indefinitely.



> On Jul 17, 2019, at 6:07 PM, Sachin Nikumbh <saniku...@yahoo.com.INVALID> 
> wrote:
> 
> I am not setting the group id for the console consumer. When I say, the .log 
> files are all 0 bytes long it is after the producer has gone through 96 GB 
> worth of data. Apart from this topic where I am dumping 96GB of data, I have 
> some test topics where I am publishing very small amount of data. I don't 
> have any problem reading messages from those topics. The .log files for those 
> topics are properly sized and I can read those messages using multiple 
> console consumers at the same time. I have a feeling that the this specific 
> topic is having trouble due to the amount of data that I am publishing. I am 
> failing to understand which Kafka settings are playing role here.
> I am sure 96GB of data is really not a big deal for Kafka and I am not the 
> first one to do this.
>    On Wednesday, July 17, 2019, 04:58:48 PM EDT, Peter Bukowinski 
><pmb...@gmail.com> wrote:  
> 
> Are you setting a group.id for your console consumer, perhaps, and keeping it 
> static? That would explain the inability to reconsume the data. As to why 
> your logs look empty, kafka likes to hold the data in memory and leaves it to 
> the OS to flush the data to disk. On a non-busy broker, the interval between 
> when data arrives and when it is flushed to disk can be quite a while.
> 
> 
>> On Jul 17, 2019, at 1:39 PM, Sachin Nikumbh <saniku...@yahoo.com.INVALID> 
>> wrote:
>> 
>> Hi Jamie,
>> I have 3 brokers and the replication factor for my topic is set to 3. I know 
>> for sure that the producer is producing data successfully because I am 
>> running a console consumer at the same time and it shows me the messages. 
>> After the producer produces all the data, I have /var/log/kafka/myTopic-* 
>> directories (15 of them) and all of them have only one .log file with size 
>> of 0 bytes. So, I am not sure if that addresses your question around the 
>> active segment.
>> ThanksSachin
>>    On Wednesday, July 17, 2019, 04:00:56 PM EDT, Jamie 
>><jamied...@aol.co.uk.INVALID> wrote:  
>> 
>> Hi Sachin, 
>> My understanding is that the active segment is never deleted which means you 
>> should have at least 1GB of data in your partition, if the data is indeed 
>> being produced to Kafka, Are there are errors in your broker logs? How many 
>> brokers do you have have and what is the replication factor of the topic? If 
>> you have less than 3 brokers, have you set offsets.topic.replication.factor 
>> to the number of brokers? 
>> 
>> Thanks, 
>> Jamie
>> 
>> -----Original Message-----
>> From: Sachin Nikumbh <saniku...@yahoo.com.INVALID>
>> To: users <users@kafka.apache.org>
>> Sent: Wed, 17 Jul 2019 20:21
>> Subject: Re: Kafka logs are getting deleted too soon
>> 
>> Broker 
>> configs:===========broker.id=36num.network.threads=3num.io.threads=8socket.send.buffer.bytes=102400socket.receive.buffer.bytes=102400socket.request.max.bytes=104857600log.dirs=/var/log/kafkanum.partitions=1num.recovery.threads.per.data.dir=1offsets.topic.replication.factor=1transaction.state.log.replication.factor=1transaction.state.log.min.isr=1log.retention.hours=168log.segment.bytes=1073741824log.retention.check.interval.ms=300000zookeeper.connect=myserver1:2181,myserver2:2181,myserver3:2181zookeeper.connection.timeout.ms=6000confluent.support.metrics.enable=trueconfluent.support.customer.id=anonymousgroup.initial.rebalance.delay.ms=0auto.create.topics.enable=false
>> Topic configs:==========--partitions 15--replication-factor 
>> 3retention.ms=31449600000retention.bytes=10737418240
>> As you can see, I have tried to override the retention.bytes for each 
>> partition to 10GB to be explicit. 96GB over 10 partitions which 6.4GB. So, I 
>> gave myself more than enough buffer. Even then, I am left with no logs. 
>> Here's an example:
>> % ls -ltr /var/log/kafka/MyTopic-0total 4-rw-r--r-- 1 root root      14      
>> Jul 17 15:05 leader-epoch-checkpoint-rw-r--r-- 1 root root 10485756 Jul 17 
>> 15:05 00000000000005484128.timeindex-rw-r--r-- 1 root root        0        
>> Jul 17 15:05 00000000000005484128.log-rw-r--r-- 1 root root 10485760 Jul 17 
>> 15:05 00000000000005484128.index
>> 
>>  I kept my eyes on the directory for each partition as the producer was 
>>publishing data and I saw periodic .deleted files. Does it mean that Kafka 
>>was deleting logs?
>> Any help would be highly appreciated.
>>    On Wednesday, July 17, 2019, 01:47:44 PM EDT, Peter Bukowinski 
>><pmb...@gmail.com> wrote:  
>> 
>> Can you share your broker and topic config here?
>> 
>>> On Jul 17, 2019, at 10:09 AM, Sachin Nikumbh <saniku...@yahoo.com.INVALID> 
>>> wrote:
>>> 
>>> Thanks for the quick response, Tom.
>>> I should have mentioned in my original post that I am always using 
>>> --from-beginning with my console consumer. Even then  I don't get any data. 
>>> And as mentioned, the .log files are of size 0 bytes.
>>>    On Wednesday, July 17, 2019, 11:09:22 AM EDT, Thomas Aley 
>>><thomas.a...@ibm.com> wrote:  
>>> 
>>> Hi Sachin,
>>> 
>>> Try adding --from-beginning to your console consumer to view the 
>>> historically produced data. By default the console consumer starts from 
>>> the last offset.
>>> 
>>> Tom Aley
>>> thomas.a...@ibm.com
>>> 
>>> 
>>> 
>>> From:  Sachin Nikumbh <saniku...@yahoo.com.INVALID>
>>> To:    Kafka Users <users@kafka.apache.org>
>>> Date:  17/07/2019 16:01
>>> Subject:        [EXTERNAL] Kafka logs are getting deleted too soon
>>> 
>>> 
>>> 
>>> Hi all,
>>> I have ~ 96GB of data in files that I am trying to get into a Kafka 
>>> cluster. I have ~ 11000 keys for the data and I have created 15 partitions 
>>> for my topic. While my producer is dumping data in Kafka, I have a console 
>>> consumer that shows me that kafka is getting the data. The producer runs 
>>> for a few hours before it is done. However, at this point, when I run the 
>>> console consumer, it does not fetch any data. If I look at the logs 
>>> directory, .log files for all the partitions are of 0 byte size. 
>>> If I am not wrong, the default value for log.retention.bytes is -1 which 
>>> means there is no size limit for the logs/partition. I do want to make 
>>> sure that the value for this setting is per partition. Given that the 
>>> default time based retention is 7 days, I am failing to understand why the 
>>> logs are getting deleted. The other thing that confuses me is that when I 
>>> use kafka.tools.GetOffsetShell, it shows me large enough values for all 
>>> the 15 partitions for offsets.
>>> Can someone please help me understand why I don't see logs and why 
>>> is kafka.tools.GetOffsetShell making me believe there is data.
>>> ThanksSachin
>>> 
>>> 
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number 
>>> 741598. 
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>> 
>>    
  

Reply via email to