Any take on this very specific question?

Sincerely,
Anindya Haldar
Oracle Responsys


> On Jan 15, 2020, at 1:59 PM, Anindya Haldar <anindya.hal...@oracle.com> wrote:
> 
> Okay, let’s say
> 
> - the application is using a non-transactional producer, shared across 
> multiple threads
> - the linger.ms and buffer.memory is non-zero, and so is batch.size such that 
> messages are actually batched
> - the replication factor is 3
> - the minimum number of ISRs is 2
> - the parameter ack is set to ‘all’
> 
> Now the application calls send(), get a future back, and then calls get() on 
> the future. At some point (driven by the batching related parameters and a 
> number of other factors) the get() call to the future returns successfully.
> 
> Precisely at this point does Kafka guarantee that the message has been 
> persisted to the leader’s and all the ISRs’ logs? By persisted, I mean 
> written to the replication logs, but may or may not yet have been committed 
> to the storage media by the fsync() call.
> 
> If the answer is yes, it looks good from here. If the answer is no, then what 
> else does the application need to do?
> 
> Sincerely,
> Anindya Haldar
> Oracle Responsys
> 
> 
>> On Jan 15, 2020, at 12:31 PM, M. Manna <manme...@gmail.com> wrote:
>> 
>> Hey Anindya,
>> 
>> 
>> 
>> On Wed, 15 Jan 2020 at 18:23, Anindya Haldar <anindya.hal...@oracle.com>
>> wrote:
>> 
>>> Thanks for the response.
>>> 
>>> Essentially, we are looking for a confirmation that a send acknowledgement
>>> received at the client’s end will ensure the message is indeed persisted to
>>> the replication logs. We initially wondered whether the client has to make
>>> an explicit flush() call or whether it has to commit a producer transaction
>>> for that to happen. Based upon what I understand now from your response, a
>>> flush() or commitTransaction() call should not be necessary for this, and a
>>> send acknowledgement via the successful return from the get() call on the
>>> future will ensure the persistence of the message.
>>> 
>>> Please feel free to correct me if I didn’t get it right.
>>> 
>> 
>> I'm sure you have done the reading, but to be in context of your question,
>> *commitTransaction()* is sufficient on it's own (see excerpt from *flush()*
>> doc below)
>> 
>> *Applications don't need to call this method for transactional producers,
>>> since the commitTransaction() will flush all buffered records before
>>> performing the commit. This ensures that all the send(ProducerRecord) calls
>>> made since the previous beginTransaction() are completed before the
>>> commit. *
>> 
>> 
>> But you *do *need to call commitTransaction() (for txn based producers),
>> or flush() (for normal cases) to send the records *immediately*. Otherwise,
>> they will be sent when the data buffer is full (re: buffer.memory and
>> linger.ms).
>> 
>> If you want to know more about transactions, there are some nice articles
>> regarding txn producers
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_KAFKA_Transactional-2BMessaging-2Bin-2BKafka&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=vmJiAMDGSxNeZnFztNs5ITB_i_Z3h3VtLPGma9y7cKI&m=qtRoal09Ax8f1wskhpGkLJz8loX98EAVCX95pMjnI8s&s=laTP9-1xOTyb1L9AFMVLYSlvZE-nfgJ7N4rsL3NyZvU&e=
>>  
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.confluent.io_blog_transactions-2Dapache-2Dkafka_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=vmJiAMDGSxNeZnFztNs5ITB_i_Z3h3VtLPGma9y7cKI&m=qtRoal09Ax8f1wskhpGkLJz8loX98EAVCX95pMjnI8s&s=SMCrXdI5TvfT6FEiqpQAA_8f8x8RA2MRFzrOKJmCFFc&e=
>>  
>> 
>> Also, if you are interested to become more technical, please check the
>> codebase for KafkaProducer and see what doSend() and wakeup() is doing:
>> 
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_kafka_blob_5c00191ea957fef425bf5dbbe47d70e41249e2d6_clients_src_main_java_org_apache_kafka_clients_producer_KafkaProducer.java-23L832&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=vmJiAMDGSxNeZnFztNs5ITB_i_Z3h3VtLPGma9y7cKI&m=qtRoal09Ax8f1wskhpGkLJz8loX98EAVCX95pMjnI8s&s=xruUiNP1BFXu6CziC0aB00HcoX7GyH8HNalyLp-CYlI&e=
>>  
>> 
>> I hope this helps.
>> 
>> Regards,
>> 
>>> 
>>> Sincerely,
>>> Anindya Haldar
>>> Oracle Responsys
>>> 
>>> 
>>>> On Jan 15, 2020, at 8:55 AM, M. Manna <manme...@gmail.com> wrote:
>>>> 
>>>> Anindya,
>>>> 
>>>> On Wed, 15 Jan 2020 at 16:49, Anindya Haldar <anindya.hal...@oracle.com>
>>>> wrote:
>>>> 
>>>>> In our case, the minimum in-sync replicas is set to 2.
>>>>> 
>>>>> Given that, what will be expected behavior for the scenario I outlined?
>>>>> 
>>>> 
>>>> This means you will get confirmation when 2 of them have acknowledged. so
>>>> you will always have 2 in-sync.
>>>> 
>>>> Perhaps drilling each detail and having a long thread, you could explain
>>>> what is it you are trying to investigate/identify? We will be happy to
>>> help.
>>>> 
>>>> Regards,
>>>> 
>>>> 
>>>>> Sincerely,
>>>>> Anindya Haldar
>>>>> Oracle Responsys
>>>>> 
>>>>> 
>>>>>> On Jan 15, 2020, at 6:38 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>>>>>> 
>>>>>> To all the in-sync replicas. You can set the minimum number of in-sync
>>>>>> replicas via the min.insync.replicas topic/broker config.
>>>>>> 
>>>>>> Ismael
>>>>>> 
>>>>>> On Tue, Jan 14, 2020 at 11:11 AM Anindya Haldar <
>>>>> anindya.hal...@oracle.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> I have a question related to the semantics of a producer send and the
>>>>> get
>>>>>>> calls on the future returned by the send call.
>>>>>>> 
>>>>>>> - It is a Java application, using the Kafka Java client library
>>>>>>> - The application is set up to use 3 replicas and using acks=all for
>>> the
>>>>>>> producer
>>>>>>> - the application is using a non-zero value for linger.ms and
>>>>> batch.size
>>>>>>> parameters
>>>>>>> - The application is using a single non-transactional Kafka producer
>>>>>>> instance, shared across a number of threads
>>>>>>> 
>>>>>>> With that,
>>>>>>> 
>>>>>>> - Any application thread makes a send() call on the producer.
>>>>>>> - Then the same thread calls get() on the future returned by the last
>>>>>>> send() call
>>>>>>> - The get() call on the future returns after it gets the
>>> acknowledgement
>>>>>>> from the system for the message send
>>>>>>> 
>>>>>>> At this point, is it guaranteed that the message has actually been
>>>>> written
>>>>>>> (but may not be committed by calling fsync) to ALL of the replicas’
>>>>>>> filesystems?
>>>>>>> 
>>>>>>> Sincerely,
>>>>>>> Anindya Haldar
>>>>>>> Oracle Responsys
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 

Reply via email to