From: Pádraig Brady <p...@draigbrady.com> Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source Date: Sun, 13 Oct 2024 15:59:27 +0100
> On 13/10/2024 05:56, Masatake YAMATO wrote: >> When copying files, the system data cache are consumed, the system >> data cache is utilized for both the source and destination files. In >> scenarios such as creating backup files for old, unused files, it is >> clear to users that these files will not be needed in the near >> future. In such cases, retaining the data for these files in the cache >> constitutes a waste of computer resources, especially when running >> applications that require significant memory in the foreground. >> With the new option, users will have the ability to request the >> discarding of the system data cache, thereby avoiding the unwanted >> swapping out of data from foreground processes. >> I evaluated cache consumption using a script called >> run.bash. Initially, run.bash creates many small files, each 8 KB in >> size. It then copies these files using the cp command, both with and >> without the specified option. Finally, it reports the difference in >> the total size of the caches before and after the copying process. >> run.bash: >> #!/bin/bash >> CP=$1 >> shift >> [[ -e "$CP" ]] || { >> echo "no file found: $CP" 1>&2 >> exit 1 >> } >> N=8 >> S=drop-src >> D=${HOME}/drop-dst >> mkdir -p $S >> mkdir -p $D >> start= >> end= >> print_cached() >> { >> grep ^Cached: /proc/meminfo >> } >> start() >> { >> start=$(print_cached | awk '{print $2}') >> } >> end() >> { >> end=$(print_cached | awk '{print $2}') >> } >> report() >> { >> echo -n "delta[$N:$1/$2]: " >> expr "$end" - "$start" >> } >> cleanup() >> { >> local i >> local j >> for ((i = 0; i < 10; i++)); do >> for ((j = 0; j < 10; j++)); do >> rm -f $S/F-${i}${j}* >> rm -f $D/F-${i}${j}* >> done >> done >> rm -f $S/F-* >> rm -f $D/F-* >> } >> prep() >> { >> local i >> for ((i = 0; i < 1024 * $N; i++ )); do >> if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \ >> status=none; then >> echo "failed in dd of=$S/F-$F" 1>&2 >> exit 1 >> fi >> done >> sync >> } >> run_cp() >> { >> start >> local i >> time for ((i = 0; i < 1024 * $N; i++ )); do >> if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then >> echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2 >> exit 1 >> fi >> done >> end >> report "$1" $2 >> } >> cleanup >> sync >> prep >> run_cp "$@" >> running: >> ~/coreutils/nocache$ ./run.bash ../src/cp >> real 0m16.051s >> user 0m4.249s >> sys 0m12.437s >> delta[8:/]: 65548 >> ~/coreutils/nocache$ ./run.bash ../src/cp --nocache-source >> real 0m17.109s >> user 0m4.492s >> sys 0m13.317s >> delta[8:--nocache-source/]: 620 >> --nocache-source option suppresses the consumption of the cache >> massively. > > Thanks for the patch. > I have some reservations/notes though... > > There is nothing particularly special about cp, that it might need > this option. > I.e. it would be nice to be able to wrap any program so that it > streamed > data through the cache, rather than aggressively cached. I'm not sure > how to do that, > but also I'd be reluctant to start adding such options to individual > commands though. > Perhaps Linux' open() may gain an O_STREAM flag in future that might > be > more generally applied with a wrapper or something. I found an interesting article: https://www.phoronix.com/news/Uncached-Buffered-IO-Linux-6.14 It seems that RWF_DONTCACHE flag of pwritev and preadv implements what we need. When Linux-6.14 is released, I will rewrite my patch based on RWF_DONTCACHE. Masatake YAMATO > For single (large) files, one already has this functionality in dd. > > On the write side, you'd also have to worry about syncing, to make the > drop cache advisory effective, and this could impact performance. > > Might this drop caches for already cached files, > which cp may just happen to be copying, > thus potentially impacting performance for other programs. > > If reflinking we probably would not want to do this operation, > since we're not reading the source. > > thanks, > Pádraig >