I think there might be a misunderstanding regarding FileChannel and BufferedFileChannel.
BufferedFileChannel is a BK class and has a flush(boolean) method. * flush(false) --> just writes the in-memory byte array to the FileChannel (no fsync/fdatasync involved) * flush(true) --> write to FileChannel and issue forceWrite(false) on the channel (fdatasync) .. I know.. it's a bit confusing On Wed, May 10, 2017 at 5:25 PM Venkateswara Rao Jujjuri <jujj...@gmail.com> wrote: > Yes; My question is what is the impact we are taking by doing force(true) > all the time. > Just to avoid any concern caused by Charan's statement about "java API > for force method - > https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#force(boolean), > it says the implementation is unspecified and system-dependent. " > > On Wed, May 10, 2017 at 5:09 PM, Sijie Guo <guosi...@gmail.com> wrote: > >> >> >> On Thu, May 11, 2017 at 8:02 AM, Venkateswara Rao Jujjuri < >> jujj...@gmail.com> wrote: >> >>> That is correct. But we are talking about active entrylog file(s) and >>> journal file(s). >>> Do you see lot of active journal files that won't get updated but read >>> for long time? >>> >> >> I am a bit confused about this question here. When talking about >> durability, entry log files and journal files should be same, right? >> >> >> >>> Anyway just trying broaden the discussion and if we have any doubt if >>> java does fdatasync all the time. >>> >>> On Wed, May 10, 2017 at 4:57 PM, Matteo Merli <matteo.me...@gmail.com> >>> wrote: >>> >>>> My understanding is that fsync() forces the IO on the metadata even if >>>> the file size has not changed, for example to update the access timestamp. >>>> >>>> On Wed, May 10, 2017 at 4:47 PM Venkateswara Rao Jujjuri < >>>> jujj...@gmail.com> wrote: >>>> >>>>> Let me try to consolidate the discussion. >>>>> >>>>> 1. We need to flush the metadata whenever we cause metadata changes >>>>> due to data changes. Mostly Append/Filesize change. >>>>> 2. I don't know any operations where we perform "metadata operation" >>>>> only and we needed it to be "persisted" >>>>> Even if we can think/come up with some obscure cases of #2, majority >>>>> of of our use-case falls into #1. >>>>> So why not put force(true) all the time? What cases we may incur >>>>> penalty? If we pass true, then there is no debate on >>>>> implementation/source. >>>>> >>>>> What do you guys say? >>>>> JV >>>>> >>>>> On Wed, May 10, 2017 at 4:42 PM, Sijie Guo <guosi...@gmail.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, May 11, 2017 at 2:28 AM, Charan Reddy G < >>>>>> reddychara...@gmail.com> wrote: >>>>>> >>>>>>> @JV and @Sijie I think Android java source is not the appropriate >>>>>>> one to look into. For FileChannelImpl we need to look into jdk code >>>>>>> >>>>>>> >>>>>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/ch/FileChannelImpl.java#FileChannelImpl.force%28boolean%29 >>>>>>> >>>>>>> >>>>>>> public void force(boolean metaData) throws IOException { >>>>>>> ensureOpen(); >>>>>>> int rv = -1; >>>>>>> int ti = -1; >>>>>>> try { >>>>>>> begin(); >>>>>>> ti = threads.add(); >>>>>>> if (!isOpen()) >>>>>>> return; >>>>>>> do { >>>>>>> rv = nd.force(fd, metaData); >>>>>>> } while ((rv == IOStatus.INTERRUPTED) && isOpen()); >>>>>>> } finally { >>>>>>> threads.remove(ti); >>>>>>> end(rv > -1); >>>>>>> assert IOStatus.check(rv); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> here 'nd' is of type FileDispatcherImpl. I tried to look into its >>>>>>> code but since it is native code, we wont be able to read that >>>>>>> >>>>>>> >>>>>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/ch/FileDispatcherImpl.java#FileDispatcherImpl >>>>>>> >>>>>>> So it is not clear if it is just difference of fdatasync() and >>>>>>> fsync(). Even if we assume that it is the difference of fdatasync() >>>>>>> and fsync(), man pages of fdatasync() says >>>>>>> >>>>>>> "fdatasync() is similar to fsync(), but does not flush modified >>>>>>> metadata unless that metadata is needed in order to allow a subsequent >>>>>>> data >>>>>>> retrieval to be correctly handled. For example, changes to >>>>>>> st_atime or st_mtime (respectively, time of last access and time of >>>>>>> last >>>>>>> modification; see stat(2)) do not require flushing because they are not >>>>>>> necessary for a subsequent data read to be handled correctly. *On >>>>>>> the other hand, a change to the file size (st_size, as made by say >>>>>>> ftruncate(2)), would require a metadata flush.**"* >>>>>>> >>>>>>> >>>>>> fdatasync - "does not flush modified metadata unless that metadata >>>>>> is needed in order to allow a subsequent data retrieval to be correctly >>>>>> handled" >>>>>> >>>>>> In practice, this means that unless the file size changed, the file >>>>>> metadata will not be synced when calling fdatasync. >>>>>> >>>>>> >>>>>>> Also, when we look into java API for force method - >>>>>>> https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#force(boolean), >>>>>>> it says the implementation is unspecified and system-dependent. So I >>>>>>> don't >>>>>>> think it is safe to make any assumptions here. >>>>>>> >>>>>>> "The metaData parameter can be used to limit the number of I/O >>>>>>> operations that this method is required to perform. Passing false for >>>>>>> this parameter indicates that only updates to the file's content need be >>>>>>> written to storage; passing true indicates that updates to both the >>>>>>> file's content and metadata must be written, which generally requires at >>>>>>> least one more I/O operation. Whether this parameter actually has any >>>>>>> effect is dependent upon the underlying operating system and is >>>>>>> therefore >>>>>>> unspecified. Invoking this method may cause an I/O operation to >>>>>>> occur even if the channel was only opened for reading. Some operating >>>>>>> systems, for example, maintain a last-access time as part of a file's >>>>>>> metadata, and this time is updated whenever the file is read. Whether or >>>>>>> not this is actually done is system-dependent and is therefore >>>>>>> unspecified. >>>>>>> " >>>>>>> >>>>>>> Thanks, >>>>>>> Charan >>>>>>> >>>>>>> On Wed, May 10, 2017 at 4:07 AM, Sijie Guo <guosi...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On May 10, 2017 12:42 PM, "Venkateswara Rao Jujjuri" < >>>>>>>> jujj...@gmail.com> wrote: >>>>>>>> >>>>>>>> Looked at the source and it is fdatasync() >>>>>>>> >>>>>>>> https://android.googlesource.com/platform/libcore/+/2496a68/luni/src/main/java/java/nio/FileChannelImpl.java >>>>>>>> if (metadata) { >>>>>>>> Libcore.os.fsync(fd); >>>>>>>> } else { >>>>>>>> Libcore.os.fdatasync(fd); >>>>>>>> } >>>>>>>> >>>>>>>> So, agreed. fdatasync() fluesh metadata when the file size etc >>>>>>>> changes, and avoids metadata flush if no data related changes but >>>>>>>> but that tells me there may be no situation for us to call >>>>>>>> force(true) in the code. >>>>>>>> But our code calls it with true in some code path, should we change >>>>>>>> everything to false for consistency >>>>>>>> >>>>>>>> >>>>>>>> I don't remember that we called force(true) on file channel. Do you >>>>>>>> mean the buffered channel? >>>>>>>> >>>>>>>> >>>>>>>> JV >>>>>>>> >>>>>>>> On Tue, May 9, 2017 at 4:40 PM, Sijie Guo <guosi...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> We don't force flush metadata in bookies. Because: >>>>>>>>> >>>>>>>>> - flush metadata requires more I/O operations. >>>>>>>>> - flush metadata usually updates file metadata information (e.g. >>>>>>>>> access permissions, last modification/access time). >>>>>>>>> >>>>>>>>> The difference of force(false) and force(true) is the difference >>>>>>>>> between fdatasync() and fsync(). The former one update last >>>>>>>>> modification >>>>>>>>> time, while the latter one doesn't. We don't care about the metadata >>>>>>>>> information like access/modification time, that's the reason we only >>>>>>>>> called >>>>>>>>> force(false). >>>>>>>>> >>>>>>>>> - Sijie >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, May 6, 2017 at 9:10 AM, Charan Reddy G < >>>>>>>>> reddychara...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> By metadata I mean metadata of file >>>>>>>>>> >>>>>>>>>> in BufferedChannel.java >>>>>>>>>> >>>>>>>>>> public void flush(boolean shouldForceWrite) throws >>>>>>>>>> IOException { >>>>>>>>>> synchronized(this) { >>>>>>>>>> flushInternal(); >>>>>>>>>> } >>>>>>>>>> if (shouldForceWrite) { >>>>>>>>>> forceWrite(false); <------------- false here >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> public long forceWrite(boolean forceMetadata) throws >>>>>>>>>> IOException { >>>>>>>>>> // This is the point up to which we had flushed to the >>>>>>>>>> file system page cache >>>>>>>>>> // before issuing this force write hence is guaranteed to >>>>>>>>>> be made durable by >>>>>>>>>> // the force write, any flush that happens after this may >>>>>>>>>> or may >>>>>>>>>> // not be flushed >>>>>>>>>> long positionForceWrite = writeBufferStartPosition.get(); >>>>>>>>>> fileChannel.force(forceMetadata); <--------------- >>>>>>>>>> forcemetadata false >>>>>>>>>> return positionForceWrite; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> On Fri, May 5, 2017 at 5:56 PM, Charan Reddy G < >>>>>>>>>> reddychara...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hey, >>>>>>>>>>> >>>>>>>>>>> In GarbageCollectorThread.doCompactEntryLogs { >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> try { >>>>>>>>>>> compactEntryLog(scannerFactory, meta); >>>>>>>>>>> scannerFactory.flush(); >>>>>>>>>>> <--------------- this will eventually call >>>>>>>>>>> entrylogger.flushCurrentLog and >>>>>>>>>>> it force writes the content of the BufferedChannel but not the >>>>>>>>>>> metadata of >>>>>>>>>>> the file >>>>>>>>>>> >>>>>>>>>>> LOG.info("Removing entry log {} after >>>>>>>>>>> compaction", meta.getEntryLogId()); >>>>>>>>>>> removeEntryLog(meta.getEntryLogId()); >>>>>>>>>>> <----------- this will delete the compacted entrylog >>>>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> in doCompactEntryLogs, we first write the non-deleted entries of >>>>>>>>>>> the entryLog which is getting compacted to the currentLog in >>>>>>>>>>> entryLogger, >>>>>>>>>>> then we flush the entrylogger before deleting the compacted >>>>>>>>>>> entrylog. But >>>>>>>>>>> when currentLog is flushed by the entrylogger, it flushes only the >>>>>>>>>>> content >>>>>>>>>>> of the file but not the metadata. After flush is completed the >>>>>>>>>>> compacted >>>>>>>>>>> entrylog is deleted. It is not ok to not to force flush the >>>>>>>>>>> metadata of the >>>>>>>>>>> currentLog for the persisted (checkpointed) data. The filesystem >>>>>>>>>>> behaviour >>>>>>>>>>> is unexpected in this case and there is possibility of data loss if >>>>>>>>>>> the >>>>>>>>>>> Bookie crashes before closing that logchannel. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Charan >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jvrao >>>>>>>> --- >>>>>>>> First they ignore you, then they laugh at you, then they fight you, >>>>>>>> then you win. - Mahatma Gandhi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jvrao >>>>> --- >>>>> First they ignore you, then they laugh at you, then they fight you, >>>>> then you win. - Mahatma Gandhi >>>>> >>>>> >>>>> >>> >>> >>> -- >>> Jvrao >>> --- >>> First they ignore you, then they laugh at you, then they fight you, then >>> you win. - Mahatma Gandhi >>> >>> >>> >> > > > -- > Jvrao > --- > First they ignore you, then they laugh at you, then they fight you, then > you win. - Mahatma Gandhi > > >