Re: uniq for unsorted input

2024-03-26 Thread Kaz Kylheku
On 2024-03-26 09:40, Bruno Haible wrote: > Other methods described in [3], such as counters maintained in an 'awk' > or 'perl' process, or the 'unique' program that is part of the 'john' package > [4], can be ignored, because they need O(N) space and are thus not usable for > 40 GB large inputs [5

Re: uniq for unsorted input

2024-03-26 Thread Bruno Haible
Pádraig Brady wrote: > A documentation patch would be very useful. > If you search for "Decorate Sort Undecorate" in coreutils.texi > you can see an existing example of this DSU pattern. > I would also mention DSU if adding another example. OK, I'll provide doc input for this (post 9.5). > Re add

Re: uniq for unsorted input

2024-03-26 Thread Pádraig Brady
On 26/03/2024 16:40, Bruno Haible wrote: The documentation of 'uniq' [1] says: "The input need not be sorted, but repeated input lines are detected only if they are adjacent. If you want to discard non-adjacent duplicate lines, perhaps you want to use 'sort -u'." When I wrote gnulib-