Hi Vitalii, The timestamps provided by your producers are in microseconds, whereas Kafka expects milliseconds epochs. This could be the reason for over-rolling. When you had the default roll time value of a week, did you experience segment rolls every 15 minutes or so?
Thanks, Alexandre Le jeu. 23 juil. 2020 à 08:31, William Reynolds <william.reyno...@instaclustr.com> a écrit : > > Hi Vitali, > When I ran into it it was latest time being very large. Until we could get > the messages set right we set segment.ms to maxint so it only rolled based > on size. > Cheers > William > > On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov < > vitalii.stoianov...@gmail.com> wrote: > > > Hi William, > > > > > > ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --property > > print.timestamp=true --topic test > > One of the messages TS output: > > CreateTime:1595485571406707 1595485026.850 1595485571.406 216301538579718 > > {msg data} > > > > So which one of these is used to roll over a log segment? > > I was trying to find some explanation on the web but with no luck. > > > > Regards, > > Vitalii. > > > > On Thu, Jul 23, 2020 at 9:25 AM William Reynolds < > > william.reyno...@instaclustr.com> wrote: > > > > > Hi Vitali, > > > What are the timestamps in your message? I have seen this before where > > you > > > have timestamps well into the future so every few messages causes a log > > > roll and you end up with a very large amount of log files. > > > > > > *William* > > > > > > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov < > > > vitalii.stoianov...@gmail.com> wrote: > > > > > > > Hi All, > > > > > > > > I also have noticed that the number of log/index files are too high and > > > log > > > > roll is happening more frequently than expected. > > > > The log.roll.hours is default (168) and log.segment.bytes is 1g and log > > > > files size in the topic partition folders are usually smaller than 1g. > > > > > > > > Regards, > > > > Vitalii. > > > > > > > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov < > > > > vitalii.stoianov...@gmail.com> wrote: > > > > > > > > > Hi All, > > > > > > > > > > According to this: > > > > https://docs.confluent.io/current/kafka/deployment.html > > > > > vm.max_map_count is depend on number of index file: > > > > > *find /tmp/kafka_logs -name '*index' | wc -l* > > > > > > > > > > In our test lab we have next setup: > > > > > > > > > > *Topic:test PartitionCount:256 ReplicationFactor:2 > > > > > Configs:segment.bytes=1073741824,retention.ms > > > > > <http://retention.ms > > > > > > > > > >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true* > > > > > > > > > > No cleanup.policy set explicitly for topic or in server.properties > > so I > > > > > assume default: delete according to > > > > > https://kafka.apache.org/23/documentation.html#brokerconfigs > > > > > > > > > > I did a small script that counted the number of index files and for > > > this > > > > > topic it is: > > > > > ~638000. > > > > > Also if I check kafka log/data dir it contain some old log/index > > files > > > > > create date for which is older than 10 days.(retention for topic is > > one > > > > day) > > > > > Note: When i checked log-cleaner.log it contains info only about > > > cleanup > > > > > for compacted logs. > > > > > > > > > > In order to set: vm.max_map_count value correctly, I need to > > > > > understand the following: > > > > > Why do such old index/log files exist and not cleaned? > > > > > How properly set vm.max_map_count if index/logs is not freed on time > > ?? > > > > > > > > > > Regards, > > > > > Vitalii. > > > > > > > > > > > > > > > -- > > > *William Reynolds**Technical Operations Engineer* > > > <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> > <https://www.linkedin.com/company/instaclustr> > > Read our latest technical blog posts here > <https://www.instaclustr.com/blog/>. > > This email has been sent on behalf of Instaclustr Pty. Limited (Australia) > and Instaclustr Inc (USA). > > This email and any attachments may contain confidential and legally > privileged information. If you are not the intended recipient, do not copy > or disclose its content, but please reply to this email immediately and > highlight the error to the sender and then immediately delete the message. > > Instaclustr values your privacy. Our privacy policy can be found at > https://www.instaclustr.com/company/policies/privacy-policy