On Mon, Jul 4, 2016 at 6:57 AM, Pascal <patate...@gmail.com> wrote: > hi, > > I've a big (3.3Go) gzipped file which comes from nsrl with fields separated > by one tabulation : > > $ zcat nsrlfiletxt.gz | head -2 > sha-1 md5 crc32 filename filesize productcode > opsystemcode specialcode > 000000206738748edd92c4e3d2e823896700f849 > 392126e756571ebf112cb1c1cdedf926 ebd105a0 i05002t2.pfb 98865 > 3095 win > > I've a file with fixed patterns (windows only from field 7 opsystemcode) : > > $ cat win.os > 2000 sp 4 > 2ksp3 > dos > ... > xp sp2 > xphomeedw/sp2 > xpprofessw/sp2 > > my os is : > > $ uname -a > Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64 > GNU/Linux > > and grep is : > > $ grep --version > grep (GNU grep) 2.25 > ... > > $ pacman -Q grep > grep 2.25-2 > > when I try this : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 59,4k 0:00:00 [ 776k/s] [ <=> ] > > only 59.4k lines are processed, with no error :-( ! > (sed is used on win.os to match only on field and pipe view is used to show > progess) > > I downgrade to grep 2.24 : > > # pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz > ... > > and retry this (the same) : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 59,4k 0:00:00 [ 863k/s] [ <=> ] > > again, only 59.4k lines are processed, with no error :-( ! > > I downgrade to grep 2.23 : > > # pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz > ... > > and retry this (the same) : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 59,1k 0:00:00 [ 823k/s] [ <=> ] > > only 59.1k lines are processed, with no error :-( ! > > I downgrade to grep 2.22 : > > # pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz > ... > > and retry this (the same) : > > $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed > 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows > 157M 0:04:36 [ 567k/s] [ <=> ] > > all the 157M of lines are well processed :-) ! > > so I think there's a bug introduced with grep 2.23...
Thank you for the report. However, I'll bet that your input is not what POSIX calls a "text file," and your locale is neither C nor POSIX. I.e., I'll bet the input contains a NUL byte or a sequence of bytes that constitutes an invalid character in your locale. Either of those would make your use of grep non-conformant. You may be able to make your command work portably by adding grep's "-a" option or by running grep in the C locale: zcat nsrlfiletxt.gz | pv -l | LC_ALL=C grep --fixed-strings --file=... or zcat nsrlfiletxt.gz | pv -l | grep -a --fixed-strings --file=... If you look at the actual output, you should see an indication of the problem: when you have less output than expected, there should be at least one line of the form "Binary file ... matches".