Hi Vitalii,

The timestamps provided by your producers are in microseconds, whereas
Kafka expects milliseconds epochs. This could be the reason for
over-rolling. When you had the default roll time value of a week, did
you experience segment rolls every 15 minutes or so?

Thanks,
Alexandre

Le jeu. 23 juil. 2020 à 08:31, William Reynolds
<william.reyno...@instaclustr.com> a écrit :
>
> Hi Vitali,
> When I ran into it it was latest time being very large. Until we could get
> the messages set right we set segment.ms to maxint so it only rolled based
> on size.
> Cheers
> William
>
> On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov <
> vitalii.stoianov...@gmail.com> wrote:
>
> > Hi  William,
> >
> >
> > ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --property
> > print.timestamp=true --topic test
> > One of the messages TS output:
> > CreateTime:1595485571406707 1595485026.850 1595485571.406 216301538579718
> > {msg data}
> >
> > So which one of these is used to roll over a log segment?
> > I was trying to find some explanation on the web but with no luck.
> >
> > Regards,
> > Vitalii.
> >
> > On Thu, Jul 23, 2020 at 9:25 AM William Reynolds <
> > william.reyno...@instaclustr.com> wrote:
> >
> > > Hi Vitali,
> > > What are the timestamps in your message? I have seen this before where
> > you
> > > have timestamps well into the future so every few messages causes a log
> > > roll and you end up with a very large amount of log files.
> > >
> > > *William*
> > >
> > > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
> > > vitalii.stoianov...@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I also have noticed that the number of log/index files are too high and
> > > log
> > > > roll is happening more frequently than expected.
> > > > The log.roll.hours is default (168) and log.segment.bytes is 1g and log
> > > > files size in the topic partition folders are usually smaller than 1g.
> > > >
> > > > Regards,
> > > > Vitalii.
> > > >
> > > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
> > > > vitalii.stoianov...@gmail.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > According to this:
> > > > https://docs.confluent.io/current/kafka/deployment.html
> > > > > vm.max_map_count is depend on number of index file:
> > > > > *find /tmp/kafka_logs -name '*index' | wc -l*
> > > > >
> > > > > In our test lab we have next setup:
> > > > >
> > > > > *Topic:test      PartitionCount:256      ReplicationFactor:2
> > > > > Configs:segment.bytes=1073741824,retention.ms
> > > > > <http://retention.ms
> > > >
> > >
> > >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
> > > > >
> > > > > No cleanup.policy set explicitly for topic or in server.properties
> > so I
> > > > > assume default: delete according to
> > > > > https://kafka.apache.org/23/documentation.html#brokerconfigs
> > > > >
> > > > > I did a small script that counted the number of index files and for
> > > this
> > > > > topic it is:
> > > > > ~638000.
> > > > > Also if I check kafka log/data dir it contain some old log/index
> > files
> > > > > create date for which is older than 10 days.(retention for topic is
> > one
> > > > day)
> > > > > Note: When i checked  log-cleaner.log it contains info only about
> > > cleanup
> > > > > for compacted logs.
> > > > >
> > > > > In order to set:  vm.max_map_count value correctly, I need to
> > > > > understand the following:
> > > > > Why do such old index/log files exist and not cleaned?
> > > > > How properly set vm.max_map_count if index/logs is not freed on time
> > ??
> > > > >
> > > > > Regards,
> > > > > Vitalii.
> > > > >
> > > >
> > >
> >
> --
>
>
> *William Reynolds**Technical Operations Engineer*
>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> Instaclustr values your privacy. Our privacy policy can be found at
> https://www.instaclustr.com/company/policies/privacy-policy

Reply via email to