Hi, Jay, Thank you!
The Kafka document shows “Kafka should run well on any unix system". I assume it includes the major two Unix versions, IBM AIX and HP-UX. Right? 1. Unfortunately, we aims at supporting all the platforms, Linux, Unix, Windows and especially z/OS. I know z/OS is not easy to support. 2. Fsync per message is very expensive and Fsync per batch will break the transaction atomicity. We are looking for transaction-level fsync, which is more efficient. Then, our producers can easily combine multiple small transactions into a single bigger batch transaction. I hope the implementation of the ongoing Kafka feature “Transactional Messaging in Kafka” already considered all these issues, although the following link does not mention it: https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka 3. After a quick reading of the design of “Transactional Messaging in Kafka”, I have a doubt if it can scale especially when most transactions are very short (e.g., containing a single message). - In many use cases, we do not need global transactions, which might be too expensive. The partition-specific transaction granularity might be fine. - To reduce the overhead of transaction-level fsync + network or address-space roundtrips, Kafka might need two extra parameters for batching the small transactions into a single one. The performance benefits of batching are huge, as shown in the MRI (multi-row insert) feature in Oracle and DB2 z/OS. I believe transactional messaging is a critical feature. The design document is not very clear. Do you have more materials or links about it? Thanks, Xiao Li On Mar 7, 2015, at 9:33 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > Xiao, > > FileChannel.force is fsync on unix. > > To force fsync on every message: > log.flush.interval.messages=1 > > You are looking at the time based fsync, which, naturally, as you say, is > time-based. > > -Jay > > On Fri, Mar 6, 2015 at 11:35 PM, Xiao <lixiao1...@gmail.com> wrote: > >> Hi, Jay, >> >> Thank you for your answer. >> >> Sorry, I still do not understand your meaning. >> >> I guess the two parameters you mentioned are log.flush.interval and >> log.default.flush.interval.ms. However, these two parameters only control >> when Kafka issues a flush (i.e., calling FileChannel.force()). >> >> Fsync (fileOutputStream.getFD().sync()) is controlled by another parameter >> log.default.flush.scheduler.interval.ms. >> >> scheduler.schedule("kafka-recovery-point-checkpoint", >> checkpointRecoveryPointOffsets, >> delay = InitialTaskDelayMs, >> period = flushCheckpointMs, >> TimeUnit.MILLISECONDS) >> >> This thread is only time-controlled. It does not check the number of >> messages. >> >> Thank you, >> >> Xiao Li >> >> >> On Mar 5, 2015, at 11:59 AM, Jay Kreps <jay.kr...@gmail.com> wrote: >> >>> Hey Xiao, >>> >>> That's not quite right. Fsync is controlled by either a time based >> criteria >>> (flush every 30 seconds) or a number of messages criteria. So if you set >>> the number of messages to 1 the flush is synchronous with the write, >> which >>> I think is what you are looking for. >>> >>> -Jay >> >>