On Wed, Sep 21, 2022 at 04:29:07PM +0100, jr wrote:
On Wednesday, 21 September 2022 at 13:10:05 UTC+1, Greg Wooledge wrote:
On Wed, Sep 21, 2022 at 12:31:58PM +0100, jr wrote:
> ...
> "What's in the file"
>
> file names, one per line. (and, before you ask, '\n' terminated lines)
This is not helpful. We want to see the ACTUAL CONTENTS so we can
look for DUPLICATES. How are you not understanding this?
oh dear, UPPERCASE. copied from my previous post:
$ locate /jr/ |
> grep -v -e /.cache/ -e /tmp/ |
> sed -e 's#/home/jr/##'
that is (one way) how the "ACTUAL CONTENTS" are arrived at.
and you may want to try to re-read my previous post, re locate,
database(s), and "DUPLICATES".
The premise is false. There are actually multiple implementations of
"locate" available in debian (and more, historically) so just saying
"locate" doesn't describe the implementation very well. (I.e., different
implementations of locate will output different results.) Also, once you
start rewriting the output, the supposition that "databases" have
handled duplicates goes right out the window.
In general it seems weird to depend on any locate for this sort of thing
rather than using find, because the results won't reflect the current
state of the system.
In this case it seems like an especially bad choice to use locate,
because that command will probably list each matching directory as well
as the contents of the directory. e.g.:
/home/jr/dir1
/home/jr/dir1/file1
/home/jr/dir2
/home/jr/dir2/file2
Much better if you want just a list of files would be to use `find
/home/jr -type f` which would output:
/home/jr/dir1/file1
/home/jr/dir2/file2
find tends to be the right tool for anything other than an interactive
file listing. It would be better yet to use `find /home/jr -type f
-print0` which would output:
/home/jr/dir1/file1\0/home/jr/dir2/file2\0
[disclaimer: the rest of this assumes GNU tar; other tar implementations
will have different behavior, capabilities, and options]
Which you could then use with tar's --null option and -T:
find /home/jr -type f -print0 | tar cf file.tar --null -T -
Alternatively, you could pass both filenames and directory names to tar,
but add the --no-recursion flag to tar. Then tar wouldn't add the entire
directory tree each time it sees a directory, followed by another copy
of the files in each directory -- or, actually, a link to the original
copy as an optimization of a strange request. Depending on your
objectives this may be the better solution vs only sending filenames, if
the permissions on the directories matter and should be preserved.
(Though be aware that tar by default won't restore user/group if running
as non-root, so you'd need to add --same-owner if that matters.)
sure. I'm talking about a working environment, not a play-skool
situation where we .. make things up to pursue "hypothetical"s.
(again, see previous post) while names like '* file3 *' could,
conceivably, result from a poorly written command-line, they would be
removed/renamed immediately, to quote your good self: "How are you not
understanding this?"
He's fundamentally correct. Especially in a "working environment" it's
best to avoid constructions that are known to cause hard-to-diagnose
problems at unexpected times.
many programs support include/exclude lists, 'rsync' comes to mind.
the "nonsense" is quick + convenient on the command-line (don't know
about you, but I have no problems differentiating between a "one off"
command line and a to-be-used-frequently script, and adjust
accordingly)
Essentially all of them accept null-terminated file lists these days,
specifically to avoid issues with filename ambiguity. (And most of the
locate implementations will generate that sort of output!)