Re: diff or deduplicate two volumes with different folder structures

2016-09-22 Thread Chris Murphy
On Thu, Sep 22, 2016 at 12:56 PM, Matthew Miller wrote: > On Thu, Sep 22, 2016 at 07:57:48PM +0200, Roberto Ragusa wrote: >> > Don't use MD5. You will get unintentional file collisions. (SHA-256 is >> > good. It depends on just how much you are comparing.) >> MD5 unintentional collisions? >> It is

Re: diff or deduplicate two volumes with different folder structures

2016-09-22 Thread Matthew Miller
On Thu, Sep 22, 2016 at 07:57:48PM +0200, Roberto Ragusa wrote: > > Don't use MD5. You will get unintentional file collisions. (SHA-256 is > > good. It depends on just how much you are comparing.) > MD5 unintentional collisions? > It is 128 bit, so you will have a collision after about 2^64 files,

Re: diff or deduplicate two volumes with different folder structures

2016-09-22 Thread Roberto Ragusa
On 09/21/2016 01:01 AM, a...@clueserver.org wrote: > Don't use MD5. You will get unintentional file collisions. (SHA-256 is > good. It depends on just how much you are comparing.) MD5 unintentional collisions? It is 128 bit, so you will have a collision after about 2^64 files, according to the bi

Re: diff or deduplicate two volumes with different folder structures

2016-09-21 Thread Chris Murphy
What I ended up doing: $ find /brickA -type f -exec md5sum "{}" + > brickA.txt $ find /brickB -type f -exec md5sum "{}" + > brickB.txt $ cut -c 1-32 brickA.txt > brickA_md5.txt $ grep -v -F -f brickA_md5.txt brickB.txt > onbrickB_notonbrickA.txt Thanks for the help everyone. Chris Murphy ___

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread alan
> On Tue, Sep 20, 2016 at 10:52:10PM +0200, Ahmad Samir wrote: >> One last try (sometimes an issue nags): >> $ find A -exec md5sum '{}' + > a-md5 >> $ find B -exec md5sum '{}' + > b-md5 >> $ cat a-md5 b-md5 > All >> $ sort -u -k 1,1 All > dupes >> >> Now, (I hopefully got my head around it this tim

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread Jon LaBadie
On Tue, Sep 20, 2016 at 10:52:10PM +0200, Ahmad Samir wrote: > One last try (sometimes an issue nags): > $ find A -exec md5sum '{}' + > a-md5 > $ find B -exec md5sum '{}' + > b-md5 > $ cat a-md5 b-md5 > All > $ sort -u -k 1,1 All > dupes > > Now, (I hopefully got my head around it this time...), t

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread Ahmad Samir
One last try (sometimes an issue nags): $ find A -exec md5sum '{}' + > a-md5 $ find B -exec md5sum '{}' + > b-md5 $ cat a-md5 b-md5 > All $ sort -u -k 1,1 All > dupes Now, (I hopefully got my head around it this time...), the dupes file should contain a list of files that exist in _both_ A and B;

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread stan
On Mon, 19 Sep 2016 17:23:39 -0600 Chris Murphy wrote: > Drives A and B have many overlapping files but I want to find out what > files don't exist on each. Thwarting this is directory structure > differs between the two drives, and I'm fairly certain some of the > file names differ on the two dr

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread Chris Murphy
On Tue, Sep 20, 2016 at 11:55 AM, Ahmad Samir wrote: > On 20 September 2016 at 13:00, Ahmad Samir wrote: >> On 20 September 2016 at 12:34, Ahmad Samir wrote: >>> On 20 September 2016 at 10:33, Ahmad Samir wrote: Here's a crude way: $ find /brickA -type f -exec md5sum "{}" + | sor

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread Ahmad Samir
On 20 September 2016 at 13:00, Ahmad Samir wrote: > On 20 September 2016 at 12:34, Ahmad Samir wrote: >> On 20 September 2016 at 10:33, Ahmad Samir wrote: >>> >>> Here's a crude way: >>> $ find /brickA -type f -exec md5sum "{}" + | sort > brickA.txt >>> $ find /brickB -type f -exec md5sum "{}" +

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread Ahmad Samir
On 20 September 2016 at 12:34, Ahmad Samir wrote: > On 20 September 2016 at 10:33, Ahmad Samir wrote: >> >> Here's a crude way: >> $ find /brickA -type f -exec md5sum "{}" + | sort > brickA.txt >> $ find /brickB -type f -exec md5sum "{}" + | sort > brickB.txt >> $ diff -U 0 brickA.txt brickB.txt

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread Ahmad Samir
On 20 September 2016 at 10:33, Ahmad Samir wrote: > On 20 September 2016 at 01:23, Chris Murphy wrote: >> Drives A and B have many overlapping files but I want to find out what >> files don't exist on each. Thwarting this is directory structure >> differs between the two drives, and I'm fairly ce

Re: diff or deduplicate two volumes with different folder structures

2016-09-20 Thread Ahmad Samir
On 20 September 2016 at 01:23, Chris Murphy wrote: > Drives A and B have many overlapping files but I want to find out what > files don't exist on each. Thwarting this is directory structure > differs between the two drives, and I'm fairly certain some of the > file names differ on the two drives

Re: diff or deduplicate two volumes with different folder structures

2016-09-19 Thread geo.inbox.ignored
On 09/19/2016 06:23 PM, Chris Murphy wrote: > Drives A and B have many overlapping files but I want to find out what > files don't exist on each. you might consider; rsync -avh /brickA/ /brickB/ then rsync -avh /brickB/ /brickA/ to dupe files on both drives. read 'man rsync' for argument

diff or deduplicate two volumes with different folder structures

2016-09-19 Thread Chris Murphy
Drives A and B have many overlapping files but I want to find out what files don't exist on each. Thwarting this is directory structure differs between the two drives, and I'm fairly certain some of the file names differ on the two drives also. Therefore I need something hash based. I started with