[jira] [Commented] (KAFKA-2997) Synchronous write to disk

Arkadiusz Firus (JIRA) Thu, 17 Dec 2015 02:18:07 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061843#comment-15061843
 ]


Arkadiusz Firus commented on KAFKA-2997:
----------------------------------------

[~granthenke]
We are currently considering Kafka as a message backbone and a general log 
(source of truth) accessible for all systems. Because we are working on 
financial data we have to have much more guarantees than the memory replication.

[~fpj]
I want to have guarantee that when the client call returns the message will be 
persisted to disk. On the other hand I do not want invoke flush after every 
message because it has very negative impact to the performance. In VoltDB 
(https://voltdb.com/) they have had a similar problem - they want to have a 
persistence to disk and high performance. They have found a solution - gather 
few writes to disk (from different sessions) in a one batch and then invoke 
fsync. I want to use this approach in Kafka. A thread which wants to write a 
data will wait a few ms because maybe in this time there will be other threads 
which wants to write data to the same partition.
Running a thread in a loop (instead of timer) could be also a good solution. I 
have to think about this.
Thank you very much for the link.

> Synchronous write to disk
> -------------------------
>
>                 Key: KAFKA-2997
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2997
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.9.0.0
>            Reporter: Arkadiusz Firus
>            Priority: Minor
>              Labels: features, patch
>
> Hi All,
> I am currently work on a mechanism which allows to do an efficient 
> synchronous writing to the file system. My idea is to gather few write 
> requests for one partition and after that call the fsync.
> As I read the code I find out that the best place to do it is to modify:
> kafka.log.Log.append
> method. Currently at the end of the method (line 368) there is a verification 
> if the number of unflushed messages is greater than the flush interval 
> (configuration parameter).
> I am thinking of extending this condition. I want to add additional boolean 
> configuration parameter (sync write or something like this). If this 
> parameter is set to true at the end of this method the thread should hang on 
> a lock. On the other hand there will be another timer thread (for every 
> partition) which will be invoked every 10ms (configuration parameter). During 
> invocation the thread will call flush method and after that will be releasing 
> all hanged threads.
> I am writing here because I would like to know your opinion about such 
> approach. Do you think this one is good or maybe someone have a better (more 
> permanent) one. I would also like to know if such approach is according to 
> general Kafka architecture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2997) Synchronous write to disk

Reply via email to