Re: kafka tuning(vm.max_map_count) and logs retention.

Vitalii Stoianov Thu, 23 Jul 2020 02:52:13 -0700

Hi All,

I was checking it more and found this (we use librdkafka to put data into
kafka topics):


https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923

As docs say they use microseconds:
virtual ErrorCode  produce
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#a5d569225be5e98a016f889d54adf4e6c>
(const
std::string topic_name, int32_t partition, int msgflags, void *payload,
size_t len, const void *key, size_t key_len, int64_t timestamp, void
*msg_opaque)=0
  produce()
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923>
variant
that takes topic as a string (no need for creating a Topic
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Topic.html>
object),
and also allows providing the message timestamp (microseconds since
beginning of epoch, UTC). Otherwise identical to produce()
<https://docs.confluent.io/5.0.0/clients/librdkafka/classRdKafka_1_1Producer.html#ab90a30c5e5fb006a3b4004dc4c9a7923>
 above.

So am I looking into docs miss leading  and we still need to set
milliseconds?

Regards,
Vitalii.


On Thu, Jul 23, 2020 at 11:53 AM Vitalii Stoianov <
vitalii.stoianov...@gmail.com> wrote:

> Hi  Alexandre,
>
> According to kafka broker logs it happens even faster each 5-30 sec.
>
> Regards,
> Vitalii.
>
> On Thu, Jul 23, 2020 at 11:15 AM Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
>> Hi Vitalii,
>>
>> The timestamps provided by your producers are in microseconds, whereas
>> Kafka expects milliseconds epochs. This could be the reason for
>> over-rolling. When you had the default roll time value of a week, did
>> you experience segment rolls every 15 minutes or so?
>>
>> Thanks,
>> Alexandre
>>
>> Le jeu. 23 juil. 2020 à 08:31, William Reynolds
>> <william.reyno...@instaclustr.com> a écrit :
>> >
>> > Hi Vitali,
>> > When I ran into it it was latest time being very large. Until we could
>> get
>> > the messages set right we set segment.ms to maxint so it only rolled
>> based
>> > on size.
>> > Cheers
>> > William
>> >
>> > On Thu, 23 Jul 2020 at 4:46 pm, Vitalii Stoianov <
>> > vitalii.stoianov...@gmail.com> wrote:
>> >
>> > > Hi  William,
>> > >
>> > >
>> > > ./kafka-console-consumer.sh --bootstrap-server localhost:9092
>> --property
>> > > print.timestamp=true --topic test
>> > > One of the messages TS output:
>> > > CreateTime:1595485571406707 1595485026.850 1595485571.406
>> 216301538579718
>> > > {msg data}
>> > >
>> > > So which one of these is used to roll over a log segment?
>> > > I was trying to find some explanation on the web but with no luck.
>> > >
>> > > Regards,
>> > > Vitalii.
>> > >
>> > > On Thu, Jul 23, 2020 at 9:25 AM William Reynolds <
>> > > william.reyno...@instaclustr.com> wrote:
>> > >
>> > > > Hi Vitali,
>> > > > What are the timestamps in your message? I have seen this before
>> where
>> > > you
>> > > > have timestamps well into the future so every few messages causes a
>> log
>> > > > roll and you end up with a very large amount of log files.
>> > > >
>> > > > *William*
>> > > >
>> > > > On Thu, 23 Jul 2020 at 16:22, Vitalii Stoianov <
>> > > > vitalii.stoianov...@gmail.com> wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > I also have noticed that the number of log/index files are too
>> high and
>> > > > log
>> > > > > roll is happening more frequently than expected.
>> > > > > The log.roll.hours is default (168) and log.segment.bytes is 1g
>> and log
>> > > > > files size in the topic partition folders are usually smaller
>> than 1g.
>> > > > >
>> > > > > Regards,
>> > > > > Vitalii.
>> > > > >
>> > > > > On Wed, Jul 22, 2020 at 8:15 PM Vitalii Stoianov <
>> > > > > vitalii.stoianov...@gmail.com> wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > >
>> > > > > > According to this:
>> > > > > https://docs.confluent.io/current/kafka/deployment.html
>> > > > > > vm.max_map_count is depend on number of index file:
>> > > > > > *find /tmp/kafka_logs -name '*index' | wc -l*
>> > > > > >
>> > > > > > In our test lab we have next setup:
>> > > > > >
>> > > > > > *Topic:test      PartitionCount:256      ReplicationFactor:2
>> > > > > > Configs:segment.bytes=1073741824,retention.ms
>> > > > > > <http://retention.ms
>> > > > >
>> > > >
>> > >
>> >=86400000,message.format.version=2.3-IV1,max.message.bytes=4194304,unclean.leader.election.enable=true*
>> > > > > >
>> > > > > > No cleanup.policy set explicitly for topic or in
>> server.properties
>> > > so I
>> > > > > > assume default: delete according to
>> > > > > > https://kafka.apache.org/23/documentation.html#brokerconfigs
>> > > > > >
>> > > > > > I did a small script that counted the number of index files and
>> for
>> > > > this
>> > > > > > topic it is:
>> > > > > > ~638000.
>> > > > > > Also if I check kafka log/data dir it contain some old log/index
>> > > files
>> > > > > > create date for which is older than 10 days.(retention for
>> topic is
>> > > one
>> > > > > day)
>> > > > > > Note: When i checked  log-cleaner.log it contains info only
>> about
>> > > > cleanup
>> > > > > > for compacted logs.
>> > > > > >
>> > > > > > In order to set:  vm.max_map_count value correctly, I need to
>> > > > > > understand the following:
>> > > > > > Why do such old index/log files exist and not cleaned?
>> > > > > > How properly set vm.max_map_count if index/logs is not freed on
>> time
>> > > ??
>> > > > > >
>> > > > > > Regards,
>> > > > > > Vitalii.
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > --
>> >
>> >
>> > *William Reynolds**Technical Operations Engineer*
>> >
>> >
>> > <https://www.facebook.com/instaclustr>   <
>> https://twitter.com/instaclustr>
>> > <https://www.linkedin.com/company/instaclustr>
>> >
>> > Read our latest technical blog posts here
>> > <https://www.instaclustr.com/blog/>.
>> >
>> > This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia)
>> > and Instaclustr Inc (USA).
>> >
>> > This email and any attachments may contain confidential and legally
>> > privileged information.  If you are not the intended recipient, do not
>> copy
>> > or disclose its content, but please reply to this email immediately and
>> > highlight the error to the sender and then immediately delete the
>> message.
>> >
>> > Instaclustr values your privacy. Our privacy policy can be found at
>> > https://www.instaclustr.com/company/policies/privacy-policy
>>
>

Re: kafka tuning(vm.max_map_count) and logs retention.

Reply via email to