From: Pádraig Brady <p...@draigbrady.com>
Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source
Date: Sun, 13 Oct 2024 15:59:27 +0100

> On 13/10/2024 05:56, Masatake YAMATO wrote:
>> When copying files, the system data cache are consumed, the system
>> data cache is utilized for both the source and destination files. In
>> scenarios such as creating backup files for old, unused files, it is
>> clear to users that these files will not be needed in the near
>> future. In such cases, retaining the data for these files in the cache
>> constitutes a waste of computer resources, especially when running
>> applications that require significant memory in the foreground.
>> With the new option, users will have the ability to request the
>> discarding of the system data cache, thereby avoiding the unwanted
>> swapping out of data from foreground processes.
>> I evaluated cache consumption using a script called
>> run.bash. Initially, run.bash creates many small files, each 8 KB in
>> size. It then copies these files using the cp command, both with and
>> without the specified option. Finally, it reports the difference in
>> the total size of the caches before and after the copying process.
>> run.bash:
>>      #!/bin/bash
>>      CP=$1
>>      shift
>>      [[ -e "$CP" ]] || {
>>      echo "no file found: $CP" 1>&2
>>      exit 1
>>      }
>>      N=8
>>      S=drop-src
>>      D=${HOME}/drop-dst
>>      mkdir -p $S
>>      mkdir -p $D
>>      start=
>>      end=
>>      print_cached()
>>      {
>>      grep ^Cached: /proc/meminfo
>>      }
>>      start()
>>      {
>>      start=$(print_cached | awk '{print $2}')
>>      }
>>      end()
>>      {
>>      end=$(print_cached | awk '{print $2}')
>>      }
>>      report()
>>      {
>>      echo -n "delta[$N:$1/$2]: "
>>      expr "$end" - "$start"
>>      }
>>      cleanup()
>>      {
>>      local i
>>      local j
>>      for ((i = 0; i < 10; i++)); do
>>          for ((j = 0; j < 10; j++)); do
>>              rm -f $S/F-${i}${j}*
>>              rm -f $D/F-${i}${j}*
>>          done
>>      done
>>      rm -f $S/F-*
>>      rm -f $D/F-*
>>      }
>>      prep()
>>      {
>>      local i
>>      for ((i = 0; i < 1024 * $N; i++ )); do
>>          if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
>>                  status=none; then
>>              echo "failed in dd of=$S/F-$F" 1>&2
>>              exit 1
>>          fi
>>      done
>>      sync
>>      }
>>      run_cp()
>>      {
>>      start
>>      local i
>>      time for ((i = 0; i < 1024 * $N; i++ )); do
>>          if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
>>              echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
>>              exit 1
>>          fi
>>      done
>>      end
>>      report "$1" $2
>>      }
>>      cleanup
>>      sync
>>      prep
>>      run_cp "$@"
>> running:
>>      ~/coreutils/nocache$  ./run.bash ../src/cp
>>      real    0m16.051s
>>      user    0m4.249s
>>      sys     0m12.437s
>>      delta[8:/]: 65548
>>      ~/coreutils/nocache$  ./run.bash ../src/cp --nocache-source
>>      real    0m17.109s
>>      user    0m4.492s
>>      sys     0m13.317s
>>      delta[8:--nocache-source/]: 620
>> --nocache-source option suppresses the consumption of the cache
>> massively.
> 
> Thanks for the patch.
> I have some reservations/notes though...
> 
> There is nothing particularly special about cp, that it might need
> this option.
> I.e. it would be nice to be able to wrap any program so that it
> streamed
> data through the cache, rather than aggressively cached.  I'm not sure
> how to do that,
> but also I'd be reluctant to start adding such options to individual
> commands though.
> Perhaps Linux' open() may gain an O_STREAM flag in future that might
> be
> more generally applied with a wrapper or something.

I found an interesting article: 
https://www.phoronix.com/news/Uncached-Buffered-IO-Linux-6.14

It seems that RWF_DONTCACHE flag of pwritev and preadv implements
what we need.

When Linux-6.14 is released, I will rewrite my patch based on
RWF_DONTCACHE.

Masatake YAMATO

> For single (large) files, one already has this functionality in dd.
> 
> On the write side, you'd also have to worry about syncing, to make the
> drop cache advisory effective, and this could impact performance.
> 
> Might this drop caches for already cached files,
> which cp may just happen to be copying,
> thus potentially impacting performance for other programs.
> 
> If reflinking we probably would not want to do this operation,
> since we're not reading the source.
> 
> thanks,
> Pádraig
>




Reply via email to