Hi Jeevan, On Wed, Aug 28, 2019 at 10:26 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote:
> > > On Tue, Aug 27, 2019 at 10:33 PM Robert Haas <robertmh...@gmail.com> > wrote: > >> While reviewing a proposed patch to basebackup.c this morning, I found >> myself a bit underwhelmed by the quality of the code and comments in >> basebackup.c's sendFile(). I believe it's already been pointed out >> that the the retry logic here is not particularly correct, and the >> comments demonstrate a pretty clear lack of understanding of how the >> actual race conditions that are possible here might operate. >> >> However, I then noticed another problem which I think is significantly >> more serious: this code totally ignores the possibility of a failure >> in fread(). It just assumes that if fread() fails to return a >> positive value, it's reached the end of the file, and if that happens, >> it just pads out the rest of the file with zeroes. So it looks to me >> like if fread() encountered, say, an I/O error while trying to read a >> data file, and if that error were on the first data block, then the >> entire contents of that file would be replaced with zero bytes in the >> backup, and no error or warning of any kind would be given to the >> user. If it hit the error later, everything from the point of the >> error onward would just get replaced with zero bytes. To be clear, I >> think it is fine for basebackup.c to fill out the rest of the expected >> length with zeroes if in fact the file has been truncated during the >> backup; recovery should fix it. But recovery is not going to fix data >> that got eaten because EIO was encountered while copying a file. >> > > Per fread(3) manpage, we cannot distinguish between an error or EOF. So to > check for any error we must use ferror() after fread(). Attached patch > which > tests ferror() and throws an error if it returns true. > You are right. This seems the right approach to me. I can see at least couple of places already using ferror() to guard errors of fread() like CopyGetData(), read_backup_label() etc. > However, I think > fread()/ferror() both do not set errno (per some web reference) and thus > throwing a generic error here. I have manually tested it. > If we are interested in getting the errno, then instead of fread(), we can use read(). But, here, in particular, we are not decision making anything depending on errno so I think we should be fine with your current patch. > >> The logic that rereads a block appears to have the opposite problem. >> Here, it will detect an error, but it will also fail in the face of a >> concurrent truncation, which it shouldn't. >> > > For this, I have checked for feof() assuming that when the file gets > truncated > we reach EOF. And if yes, getting out of the loop instead of throwing an > error. > I may be wrong as I couldn't reproduce this issue. > I had a look at the patch and this seems correct to me. Few minor comments: + /* Check fread() error. */ + CHECK_FREAD_ERROR(fp, pathbuf); + The comments above the macro call at both the places are not necessary as your macro name itself is self-explanatory. ---------- + /* + * If file is truncated, then we will hit + * end-of-file error in which case we don't + * want to error out, instead just pad it with + * zeros. + */ + if (feof(fp)) The if block does not do the truncation right away, so I think the comment above can be reworded to explain why we reset cnt? Regards, Jeevan Ladhe