Hello Seth, thanks for the quick and detailed response. The attachment
is deleted. Please feel free to make the discussion public.

Apparently the issue is not the umlauts (at least on my machine), but
ligatures, &c. I've a script to rename files, but some always slip.
Especially the incapability of the system to properly handle Russian
file names and contents, due to different encodings, is a nuisance. And
when processing strings in perl-scripts, it is a nightmare.

I was not aware of that grep switches from text to binary mode while
parsing, and that it only does so if a grepped line contains a binary
character. It would be good if the warning was send to stderr, so that
it does not get lost in pipes. Anyway, I already added the alias grep
--text to my ~/.bashrc.

Just to continue the discussion, is there a similar switch for locate?

locate  comparison-of-turbulence-models
Binary file (standard input) matches

Thanks

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to grep in Ubuntu.
https://bugs.launchpad.net/bugs/1587101

Title:
  Grep silently discards tails of long text streams

Status in grep package in Ubuntu:
  Invalid

Bug description:
  Grep silently discards tails of long streams on my machine:

  grep -n discharge_calculate_.m 0.txt 
  64264:/home/pia/phd/src/discharge/discharge_calculate_.m

  So far, so good, "discharge_calculate_.m" is grepped on line 64264.

  grep -n discharge 0.txt | grep calculate

  Apparently, grep gobbled, and fails to grep the line.

  Some tests:

  tail -n +64264 0.txt | grep discharge | grep calculate_
  /home/pia/phd/src/discharge/discharge_calculate_.m

  grep -a -n discharge 0.txt | grep calculate_
  64264:/home/pia/phd/src/discharge/discharge_calculate_.m

  file 0.txt 
  0.txt: ISO-8859 text

  I noticed this when not finding files that I knew to exist in the directory 
tree and
  thought at first it were a bug in locate or find. I could not reproduce this 
on the fly when the lines leading to the grep line were filled with arbitrary 
characters, so the behaviour depends also on the content of the stream, not 
only on its length. Grep seems to interpret the text stream as binary. Looks 
very much like a buffer overflow, that's why I mark this as a security 
vulnerability. In case this is intended behaviour, grep should not silently 
discard the tail but output a warning to stderr. A less than 60k line limit 
seems also a bit too low in the 64bit era.

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: grep 2.25-1~16.04.1
  ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8
  Uname: Linux 4.4.0-22-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2
  Architecture: amd64
  CurrentDesktop: GNOME
  Date: Mon May 30 15:41:35 2016
  InstallationDate: Installed on 2015-11-05 (207 days ago)
  InstallationMedia: Ubuntu 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805)
  SourcePackage: grep
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1587101/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to