Hi All, I was checking it more and found this (we use librdkafka to put data into kafka topics):
https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923 As docs say they use microseconds: virtual ErrorCode produce <https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#a5d569225be5e98a016f889d54adf4e6c> (const std::string topic_name, int32_t partition, int msgflags, void *payload, size_t len, const void *key, size_t key_len, int64_t timestamp, void *msg_opaque)=0 produce() <https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923> variant that takes topic as a string (no need for creating a Topic <https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Topic.html> object), and also allows providing the message timestamp (microseconds since beginning of epoch, UTC). Otherwise identical to produce() <https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923> above. So am I looking into docs miss leading and we still need to set milliseconds? Regards, Vitalii. On Thu, Jul 23, 2020 at 11:53 AM Vitalii Stoianov < vitalii.stoianov...@gmail.com> wrote: > Hi Alexandre, > > According to kafka broker logs it happens even faster each 5-30 sec. > > Regards, > Vitalii. > > On Thu, Jul 23, 2020 at 11:15 AM Alexandre Dupriez < > alexandre.dupr...@gmail.com> wrote: > >> Hi Vitalii, >> >> The timestamps provided by your producers are in microseconds, whereas >> Kafka expects milliseconds epochs. This could be the reason for >> over-rolling. When you had the default roll time value of a week, did >> you experience segment rolls every 15 minutes or so? >> >> Thanks, >> Alexandre >> >> Le jeu. 23 juil. 2020 à 08:31, William Reynolds >> <william.reyno...@instaclustr.com> a écrit : >> > >> > Hi Vitali, >> > When I ran into it it was latest time being very large. Until we could >> get >> > the messages set right we set segment.ms to maxint so it only rolled >> based >> > on size. >> > Cheers >> > William >> > >> > On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov < >> > vitalii.stoianov...@gmail.com> wrote: >> > >> > > Hi William, >> > > >> > > >> > > ./kafka-console-consumer.sh --bootstrap-server localhost:9092 >> --property >> > > print.timestamp=true --topic test >> > > One of the messages TS output: >> > > CreateTime:1595485571406707 1595485026.850 1595485571.406 >> 216301538579718 >> > > {msg data} >> > > >> > > So which one of these is used to roll over a log segment? >> > > I was trying to find some explanation on the web but with no luck. >> > > >> > > Regards, >> > > Vitalii. >> > > >> > > On Thu, Jul 23, 2020 at 9:25 AM William Reynolds < >> > > william.reyno...@instaclustr.com> wrote: >> > > >> > > > Hi Vitali, >> > > > What are the timestamps in your message? I have seen this before >> where >> > > you >> > > > have timestamps well into the future so every few messages causes a >> log >> > > > roll and you end up with a very large amount of log files. >> > > > >> > > > *William* >> > > > >> > > > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov < >> > > > vitalii.stoianov...@gmail.com> wrote: >> > > > >> > > > > Hi All, >> > > > > >> > > > > I also have noticed that the number of log/index files are too >> high and >> > > > log >> > > > > roll is happening more frequently than expected. >> > > > > The log.roll.hours is default (168) and log.segment.bytes is 1g >> and log >> > > > > files size in the topic partition folders are usually smaller >> than 1g. >> > > > > >> > > > > Regards, >> > > > > Vitalii. >> > > > > >> > > > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov < >> > > > > vitalii.stoianov...@gmail.com> wrote: >> > > > > >> > > > > > Hi All, >> > > > > > >> > > > > > According to this: >> > > > > https://docs.confluent.io/current/kafka/deployment.html >> > > > > > vm.max_map_count is depend on number of index file: >> > > > > > *find /tmp/kafka_logs -name '*index' | wc -l* >> > > > > > >> > > > > > In our test lab we have next setup: >> > > > > > >> > > > > > *Topic:test PartitionCount:256 ReplicationFactor:2 >> > > > > > Configs:segment.bytes=1073741824,retention.ms >> > > > > > <http://retention.ms >> > > > > >> > > > >> > > >> >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true* >> > > > > > >> > > > > > No cleanup.policy set explicitly for topic or in >> server.properties >> > > so I >> > > > > > assume default: delete according to >> > > > > > https://kafka.apache.org/23/documentation.html#brokerconfigs >> > > > > > >> > > > > > I did a small script that counted the number of index files and >> for >> > > > this >> > > > > > topic it is: >> > > > > > ~638000. >> > > > > > Also if I check kafka log/data dir it contain some old log/index >> > > files >> > > > > > create date for which is older than 10 days.(retention for >> topic is >> > > one >> > > > > day) >> > > > > > Note: When i checked log-cleaner.log it contains info only >> about >> > > > cleanup >> > > > > > for compacted logs. >> > > > > > >> > > > > > In order to set: vm.max_map_count value correctly, I need to >> > > > > > understand the following: >> > > > > > Why do such old index/log files exist and not cleaned? >> > > > > > How properly set vm.max_map_count if index/logs is not freed on >> time >> > > ?? >> > > > > > >> > > > > > Regards, >> > > > > > Vitalii. >> > > > > > >> > > > > >> > > > >> > > >> > -- >> > >> > >> > *William Reynolds**Technical Operations Engineer* >> > >> > >> > <https://www.facebook.com/instaclustr> < >> https://twitter.com/instaclustr> >> > <https://www.linkedin.com/company/instaclustr> >> > >> > Read our latest technical blog posts here >> > <https://www.instaclustr.com/blog/>. >> > >> > This email has been sent on behalf of Instaclustr Pty. Limited >> (Australia) >> > and Instaclustr Inc (USA). >> > >> > This email and any attachments may contain confidential and legally >> > privileged information. If you are not the intended recipient, do not >> copy >> > or disclose its content, but please reply to this email immediately and >> > highlight the error to the sender and then immediately delete the >> message. >> > >> > Instaclustr values your privacy. Our privacy policy can be found at >> > https://www.instaclustr.com/company/policies/privacy-policy >> >