[ https://issues.apache.org/jira/browse/KAFKA-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061843#comment-15061843 ]
Arkadiusz Firus commented on KAFKA-2997: ---------------------------------------- [~granthenke] We are currently considering Kafka as a message backbone and a general log (source of truth) accessible for all systems. Because we are working on financial data we have to have much more guarantees than the memory replication. [~fpj] I want to have guarantee that when the client call returns the message will be persisted to disk. On the other hand I do not want invoke flush after every message because it has very negative impact to the performance. In VoltDB (https://voltdb.com/) they have had a similar problem - they want to have a persistence to disk and high performance. They have found a solution - gather few writes to disk (from different sessions) in a one batch and then invoke fsync. I want to use this approach in Kafka. A thread which wants to write a data will wait a few ms because maybe in this time there will be other threads which wants to write data to the same partition. Running a thread in a loop (instead of timer) could be also a good solution. I have to think about this. Thank you very much for the link. > Synchronous write to disk > ------------------------- > > Key: KAFKA-2997 > URL: https://issues.apache.org/jira/browse/KAFKA-2997 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 0.9.0.0 > Reporter: Arkadiusz Firus > Priority: Minor > Labels: features, patch > > Hi All, > I am currently work on a mechanism which allows to do an efficient > synchronous writing to the file system. My idea is to gather few write > requests for one partition and after that call the fsync. > As I read the code I find out that the best place to do it is to modify: > kafka.log.Log.append > method. Currently at the end of the method (line 368) there is a verification > if the number of unflushed messages is greater than the flush interval > (configuration parameter). > I am thinking of extending this condition. I want to add additional boolean > configuration parameter (sync write or something like this). If this > parameter is set to true at the end of this method the thread should hang on > a lock. On the other hand there will be another timer thread (for every > partition) which will be invoked every 10ms (configuration parameter). During > invocation the thread will call flush method and after that will be releasing > all hanged threads. > I am writing here because I would like to know your opinion about such > approach. Do you think this one is good or maybe someone have a better (more > permanent) one. I would also like to know if such approach is according to > general Kafka architecture. -- This message was sent by Atlassian JIRA (v6.3.4#6332)