On Thu, May 11, 2017 at 7:47 AM, Venkateswara Rao Jujjuri <jujj...@gmail.com
> wrote:

> Let me try to consolidate the discussion.
>
> 1. We need to flush the metadata whenever we cause metadata changes due to
> data changes. Mostly Append/Filesize change.
>

My take on fdatasync - it will sync metadata when filesize is changed. It
doesn't sync metadata (for changes like modification time).


> 2. I don't know any operations where we perform "metadata operation" only
> and we needed it to be "persisted"
>
Even if we can think/come up with some obscure cases of #2, majority of of
> our use-case falls into #1.
> So why not put force(true) all the time? What cases we may incur penalty?
> If we pass true, then there is no debate on implementation/source.
>

As what matteo pointed out above, in journal we preallocate file, so
subsequent fdatasync won't require flushing metadata as the file size has
been changed on pre-allocation.

In entry logger file, we didn't do pre-allocation (we probably should do).

Passing true will enforce unnecessary metadata syncs when journal file size
isn't changed.


>
> What do you guys say?
> JV
>
> On Wed, May 10, 2017 at 4:42 PM, Sijie Guo <guosi...@gmail.com> wrote:
>
>>
>>
>> On Thu, May 11, 2017 at 2:28 AM, Charan Reddy G <reddychara...@gmail.com>
>> wrote:
>>
>>>  @JV and @Sijie I think Android java source is not the appropriate one
>>> to look into. For FileChannelImpl we need to look into jdk code
>>>
>>> http://grepcode.com/file/repository.grepcode.com/java/root/j
>>> dk/openjdk/8u40-b25/sun/nio/ch/FileChannelImpl.java#FileChan
>>> nelImpl.force%28boolean%29
>>>
>>>
>>>     public void force(boolean metaData) throws IOException {
>>>         ensureOpen();
>>>         int rv = -1;
>>>         int ti = -1;
>>>         try {
>>>             begin();
>>>             ti = threads.add();
>>>             if (!isOpen())
>>>                 return;
>>>             do {
>>>                 rv = nd.force(fd, metaData);
>>>             } while ((rv == IOStatus.INTERRUPTED) && isOpen());
>>>         } finally {
>>>             threads.remove(ti);
>>>             end(rv > -1);
>>>             assert IOStatus.check(rv);
>>>         }
>>>     }
>>>
>>> here 'nd' is of type FileDispatcherImpl. I tried to look into its code
>>> but since it is native code, we wont be able to read that
>>>
>>> http://grepcode.com/file/repository.grepcode.com/java/root/j
>>> dk/openjdk/8u40-b25/sun/nio/ch/FileDispatcherImpl.java#FileD
>>> ispatcherImpl
>>>
>>> So it is not clear if it is just difference of fdatasync() and fsync().
>>> Even if we assume that it is the difference of fdatasync() and fsync(),
>>> man pages of fdatasync() says
>>>
>>> "fdatasync() is similar to fsync(), but does not flush modified metadata
>>> unless that metadata is needed in order to allow a subsequent data
>>> retrieval to be correctly handled.   For  example, changes  to
>>>  st_atime or st_mtime (respectively, time of last access and time of last
>>> modification; see stat(2)) do not require flushing because they are not
>>> necessary for a subsequent data read to be handled correctly.  *On the
>>> other hand, a change to the file size (st_size, as made by say
>>> ftruncate(2)), would require a metadata flush.**"*
>>>
>>>
>> fdatasync - "does not flush modified metadata unless that metadata is
>> needed in order to allow a subsequent data retrieval to be correctly
>> handled"
>>
>> In practice, this means that unless the file size changed, the file
>> metadata will not be synced when calling fdatasync.
>>
>>
>>> Also, when we look into java API for force method -
>>> https://docs.oracle.com/javase/7/docs/api/java/nio/channel
>>> s/FileChannel.html#force(boolean), it says the implementation is
>>> unspecified and system-dependent. So I don't think it is safe to make any
>>> assumptions here.
>>>
>>> "The metaData parameter can be used to limit the number of I/O
>>> operations that this method is required to perform. Passing false for
>>> this parameter indicates that only updates to the file's content need be
>>> written to storage; passing true indicates that updates to both the
>>> file's content and metadata must be written, which generally requires at
>>> least one more I/O operation. Whether this parameter actually has any
>>> effect is dependent upon the underlying operating system and is therefore
>>> unspecified. Invoking this method may cause an I/O operation to occur
>>> even if the channel was only opened for reading. Some operating systems,
>>> for example, maintain a last-access time as part of a file's metadata, and
>>> this time is updated whenever the file is read. Whether or not this is
>>> actually done is system-dependent and is therefore unspecified."
>>>
>>> Thanks,
>>> Charan
>>>
>>> On Wed, May 10, 2017 at 4:07 AM, Sijie Guo <guosi...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On May 10, 2017 12:42 PM, "Venkateswara Rao Jujjuri" <jujj...@gmail.com>
>>>> wrote:
>>>>
>>>> Looked at the source and it is fdatasync()
>>>> https://android.googlesource.com/platform/libcore/+/2496a68/
>>>> luni/src/main/java/java/nio/FileChannelImpl.java
>>>> if (metadata) {
>>>>     Libcore.os.fsync(fd);
>>>> } else {
>>>>      Libcore.os.fdatasync(fd);
>>>> }
>>>>
>>>> So, agreed. fdatasync() fluesh metadata when the file size etc changes,
>>>> and avoids metadata flush if no data related changes but
>>>> but that tells me there may be no situation for us to call force(true)
>>>> in the code.
>>>> But our code calls it with true in some code path, should we change
>>>> everything to false for consistency
>>>>
>>>>
>>>> I don't remember that we called force(true) on file channel. Do you
>>>> mean the buffered channel?
>>>>
>>>>
>>>> JV
>>>>
>>>> On Tue, May 9, 2017 at 4:40 PM, Sijie Guo <guosi...@gmail.com> wrote:
>>>>
>>>>> We don't force flush metadata in bookies. Because:
>>>>>
>>>>> - flush metadata requires more I/O operations.
>>>>> - flush metadata usually updates file metadata information (e.g.
>>>>> access permissions, last modification/access time).
>>>>>
>>>>> The difference of force(false) and force(true) is the difference
>>>>> between fdatasync() and fsync(). The former one update last modification
>>>>> time, while the latter one doesn't. We don't care about the metadata
>>>>> information like access/modification time, that's the reason we only 
>>>>> called
>>>>> force(false).
>>>>>
>>>>> - Sijie
>>>>>
>>>>>
>>>>> On Sat, May 6, 2017 at 9:10 AM, Charan Reddy G <
>>>>> reddychara...@gmail.com> wrote:
>>>>>
>>>>>> By metadata I mean metadata of file
>>>>>>
>>>>>> in BufferedChannel.java
>>>>>>
>>>>>>     public void flush(boolean shouldForceWrite) throws IOException {
>>>>>>         synchronized(this) {
>>>>>>             flushInternal();
>>>>>>         }
>>>>>>         if (shouldForceWrite) {
>>>>>>             forceWrite(false);          <------------- false here
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>     public long forceWrite(boolean forceMetadata) throws IOException {
>>>>>>         // This is the point up to which we had flushed to the file
>>>>>> system page cache
>>>>>>         // before issuing this force write hence is guaranteed to be
>>>>>> made durable by
>>>>>>         // the force write, any flush that happens after this may or
>>>>>> may
>>>>>>         // not be flushed
>>>>>>         long positionForceWrite = writeBufferStartPosition.get();
>>>>>>         fileChannel.force(forceMetadata);    <---------------
>>>>>> forcemetadata false
>>>>>>         return positionForceWrite;
>>>>>>     }
>>>>>>
>>>>>> On Fri, May 5, 2017 at 5:56 PM, Charan Reddy G <
>>>>>> reddychara...@gmail.com> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> In GarbageCollectorThread.doCompactEntryLogs {
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>             try {
>>>>>>>                 compactEntryLog(scannerFactory, meta);
>>>>>>>                 scannerFactory.flush();            <---------------
>>>>>>> this will eventually call entrylogger.flushCurrentLog and it force 
>>>>>>> writes
>>>>>>> the content of the BufferedChannel but not the metadata of the file
>>>>>>>
>>>>>>>                 LOG.info("Removing entry log {} after compaction",
>>>>>>> meta.getEntryLogId());
>>>>>>>                 removeEntryLog(meta.getEntryLogId());  <-----------
>>>>>>> this will delete the compacted entrylog
>>>>>>>
>>>>>>>             }
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>> in doCompactEntryLogs, we first write the non-deleted entries of the
>>>>>>> entryLog which is getting compacted to the currentLog in entryLogger, 
>>>>>>> then
>>>>>>> we flush the entrylogger before deleting the compacted entrylog. But 
>>>>>>> when
>>>>>>> currentLog is flushed by the entrylogger, it flushes only the content of
>>>>>>> the file but not the metadata. After flush is completed the compacted
>>>>>>> entrylog is deleted. It is not ok to not to force flush the metadata of 
>>>>>>> the
>>>>>>> currentLog for the persisted (checkpointed) data. The filesystem 
>>>>>>> behaviour
>>>>>>> is unexpected in this case and there is possibility of data loss if the
>>>>>>> Bookie crashes before closing that logchannel.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Charan
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jvrao
>>>> ---
>>>> First they ignore you, then they laugh at you, then they fight you,
>>>> then you win. - Mahatma Gandhi
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>
>
>

Reply via email to