On Thu Jul 11 09:13:10 EDT 2013, aris...@ar.aichi-u.ac.jp wrote:
> Hello,
> 
> It seems f option of grep is buggy.
> or any limitations in using the RE?
> 
> term% wc MD5dir
>    4584    9168  388756 MD5dir
> term% wc x
>    4582    4582  151206 x
> term% grep -f x MD5dir | wc
>    4580    9160  388463
> term%
> term% grep e54272690d513f8b2403568a7574b1ba MD5dir
> e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
> term% grep e54272690d513f8b2403568a7574b1ba x
> e54272690d513f8b2403568a7574b1ba
> term% grep -v -f x MD5dir
> 7b6d7ae369226b6d0195ac3fe4487ce7 /usr/arisawa/src/elnfs/WWW/
> d44d788ad1237311d8282bbabca65977 
> /usr/arisawa/src/hg/python-2.5.1-ape/Modules/_ctypes/libffi/src/darwin/
> e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
> 84a0f83f5020f16d0b277e8b19407791 /usr/arisawa/src/trans
> term% 

a trick i often use for many fixed strings is sort + uniq.
(internally, grep/comp.c:/^increment does O(n^2)
qsorts on the patterns) perhaps it could be used to
double-check.  

to find the md5 hashes that only appear in one file or the other
(only the first field is considered by uniq),

        cat x MD5dir | sort | uniq -c | sed '/^ *2 /d' 

to count the fields that appear in both

        cat x MD5dir | sort | uniq -c | grep '^ *2 ' | wc -l
or
                        ...     | awk '$1==2{n++}END{print n}'

can you find a smaller test case that has the same issue.  this
should be fixed

- erik

Reply via email to