On Thu, May 11, 2017 at 7:47 AM, Venkateswara Rao Jujjuri <jujj...@gmail.com > wrote:
> Let me try to consolidate the discussion. > > 1. We need to flush the metadata whenever we cause metadata changes due to > data changes. Mostly Append/Filesize change. > My take on fdatasync - it will sync metadata when filesize is changed. It doesn't sync metadata (for changes like modification time). > 2. I don't know any operations where we perform "metadata operation" only > and we needed it to be "persisted" > Even if we can think/come up with some obscure cases of #2, majority of of > our use-case falls into #1. > So why not put force(true) all the time? What cases we may incur penalty? > If we pass true, then there is no debate on implementation/source. > As what matteo pointed out above, in journal we preallocate file, so subsequent fdatasync won't require flushing metadata as the file size has been changed on pre-allocation. In entry logger file, we didn't do pre-allocation (we probably should do). Passing true will enforce unnecessary metadata syncs when journal file size isn't changed. > > What do you guys say? > JV > > On Wed, May 10, 2017 at 4:42 PM, Sijie Guo <guosi...@gmail.com> wrote: > >> >> >> On Thu, May 11, 2017 at 2:28 AM, Charan Reddy G <reddychara...@gmail.com> >> wrote: >> >>> @JV and @Sijie I think Android java source is not the appropriate one >>> to look into. For FileChannelImpl we need to look into jdk code >>> >>> http://grepcode.com/file/repository.grepcode.com/java/root/j >>> dk/openjdk/8u40-b25/sun/nio/ch/FileChannelImpl.java#FileChan >>> nelImpl.force%28boolean%29 >>> >>> >>> public void force(boolean metaData) throws IOException { >>> ensureOpen(); >>> int rv = -1; >>> int ti = -1; >>> try { >>> begin(); >>> ti = threads.add(); >>> if (!isOpen()) >>> return; >>> do { >>> rv = nd.force(fd, metaData); >>> } while ((rv == IOStatus.INTERRUPTED) && isOpen()); >>> } finally { >>> threads.remove(ti); >>> end(rv > -1); >>> assert IOStatus.check(rv); >>> } >>> } >>> >>> here 'nd' is of type FileDispatcherImpl. I tried to look into its code >>> but since it is native code, we wont be able to read that >>> >>> http://grepcode.com/file/repository.grepcode.com/java/root/j >>> dk/openjdk/8u40-b25/sun/nio/ch/FileDispatcherImpl.java#FileD >>> ispatcherImpl >>> >>> So it is not clear if it is just difference of fdatasync() and fsync(). >>> Even if we assume that it is the difference of fdatasync() and fsync(), >>> man pages of fdatasync() says >>> >>> "fdatasync() is similar to fsync(), but does not flush modified metadata >>> unless that metadata is needed in order to allow a subsequent data >>> retrieval to be correctly handled. For example, changes to >>> st_atime or st_mtime (respectively, time of last access and time of last >>> modification; see stat(2)) do not require flushing because they are not >>> necessary for a subsequent data read to be handled correctly. *On the >>> other hand, a change to the file size (st_size, as made by say >>> ftruncate(2)), would require a metadata flush.**"* >>> >>> >> fdatasync - "does not flush modified metadata unless that metadata is >> needed in order to allow a subsequent data retrieval to be correctly >> handled" >> >> In practice, this means that unless the file size changed, the file >> metadata will not be synced when calling fdatasync. >> >> >>> Also, when we look into java API for force method - >>> https://docs.oracle.com/javase/7/docs/api/java/nio/channel >>> s/FileChannel.html#force(boolean), it says the implementation is >>> unspecified and system-dependent. So I don't think it is safe to make any >>> assumptions here. >>> >>> "The metaData parameter can be used to limit the number of I/O >>> operations that this method is required to perform. Passing false for >>> this parameter indicates that only updates to the file's content need be >>> written to storage; passing true indicates that updates to both the >>> file's content and metadata must be written, which generally requires at >>> least one more I/O operation. Whether this parameter actually has any >>> effect is dependent upon the underlying operating system and is therefore >>> unspecified. Invoking this method may cause an I/O operation to occur >>> even if the channel was only opened for reading. Some operating systems, >>> for example, maintain a last-access time as part of a file's metadata, and >>> this time is updated whenever the file is read. Whether or not this is >>> actually done is system-dependent and is therefore unspecified." >>> >>> Thanks, >>> Charan >>> >>> On Wed, May 10, 2017 at 4:07 AM, Sijie Guo <guosi...@gmail.com> wrote: >>> >>>> >>>> >>>> On May 10, 2017 12:42 PM, "Venkateswara Rao Jujjuri" <jujj...@gmail.com> >>>> wrote: >>>> >>>> Looked at the source and it is fdatasync() >>>> https://android.googlesource.com/platform/libcore/+/2496a68/ >>>> luni/src/main/java/java/nio/FileChannelImpl.java >>>> if (metadata) { >>>> Libcore.os.fsync(fd); >>>> } else { >>>> Libcore.os.fdatasync(fd); >>>> } >>>> >>>> So, agreed. fdatasync() fluesh metadata when the file size etc changes, >>>> and avoids metadata flush if no data related changes but >>>> but that tells me there may be no situation for us to call force(true) >>>> in the code. >>>> But our code calls it with true in some code path, should we change >>>> everything to false for consistency >>>> >>>> >>>> I don't remember that we called force(true) on file channel. Do you >>>> mean the buffered channel? >>>> >>>> >>>> JV >>>> >>>> On Tue, May 9, 2017 at 4:40 PM, Sijie Guo <guosi...@gmail.com> wrote: >>>> >>>>> We don't force flush metadata in bookies. Because: >>>>> >>>>> - flush metadata requires more I/O operations. >>>>> - flush metadata usually updates file metadata information (e.g. >>>>> access permissions, last modification/access time). >>>>> >>>>> The difference of force(false) and force(true) is the difference >>>>> between fdatasync() and fsync(). The former one update last modification >>>>> time, while the latter one doesn't. We don't care about the metadata >>>>> information like access/modification time, that's the reason we only >>>>> called >>>>> force(false). >>>>> >>>>> - Sijie >>>>> >>>>> >>>>> On Sat, May 6, 2017 at 9:10 AM, Charan Reddy G < >>>>> reddychara...@gmail.com> wrote: >>>>> >>>>>> By metadata I mean metadata of file >>>>>> >>>>>> in BufferedChannel.java >>>>>> >>>>>> public void flush(boolean shouldForceWrite) throws IOException { >>>>>> synchronized(this) { >>>>>> flushInternal(); >>>>>> } >>>>>> if (shouldForceWrite) { >>>>>> forceWrite(false); <------------- false here >>>>>> } >>>>>> } >>>>>> >>>>>> public long forceWrite(boolean forceMetadata) throws IOException { >>>>>> // This is the point up to which we had flushed to the file >>>>>> system page cache >>>>>> // before issuing this force write hence is guaranteed to be >>>>>> made durable by >>>>>> // the force write, any flush that happens after this may or >>>>>> may >>>>>> // not be flushed >>>>>> long positionForceWrite = writeBufferStartPosition.get(); >>>>>> fileChannel.force(forceMetadata); <--------------- >>>>>> forcemetadata false >>>>>> return positionForceWrite; >>>>>> } >>>>>> >>>>>> On Fri, May 5, 2017 at 5:56 PM, Charan Reddy G < >>>>>> reddychara...@gmail.com> wrote: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> In GarbageCollectorThread.doCompactEntryLogs { >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> try { >>>>>>> compactEntryLog(scannerFactory, meta); >>>>>>> scannerFactory.flush(); <--------------- >>>>>>> this will eventually call entrylogger.flushCurrentLog and it force >>>>>>> writes >>>>>>> the content of the BufferedChannel but not the metadata of the file >>>>>>> >>>>>>> LOG.info("Removing entry log {} after compaction", >>>>>>> meta.getEntryLogId()); >>>>>>> removeEntryLog(meta.getEntryLogId()); <----------- >>>>>>> this will delete the compacted entrylog >>>>>>> >>>>>>> } >>>>>>> ... >>>>>>> >>>>>>> >>>>>>> in doCompactEntryLogs, we first write the non-deleted entries of the >>>>>>> entryLog which is getting compacted to the currentLog in entryLogger, >>>>>>> then >>>>>>> we flush the entrylogger before deleting the compacted entrylog. But >>>>>>> when >>>>>>> currentLog is flushed by the entrylogger, it flushes only the content of >>>>>>> the file but not the metadata. After flush is completed the compacted >>>>>>> entrylog is deleted. It is not ok to not to force flush the metadata of >>>>>>> the >>>>>>> currentLog for the persisted (checkpointed) data. The filesystem >>>>>>> behaviour >>>>>>> is unexpected in this case and there is possibility of data loss if the >>>>>>> Bookie crashes before closing that logchannel. >>>>>>> >>>>>>> Thanks, >>>>>>> Charan >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Jvrao >>>> --- >>>> First they ignore you, then they laugh at you, then they fight you, >>>> then you win. - Mahatma Gandhi >>>> >>>> >>>> >>>> >>> >> > > > -- > Jvrao > --- > First they ignore you, then they laugh at you, then they fight you, then > you win. - Mahatma Gandhi > > >