I have a text file with tamil words. one word per line.
cat test.txt நன்றி நண்பரே. நன்றி Let us sort this. cat test.txt | sort நண்பரே. நன்றி நன்றி Let us use uniq for this. cat test.txt | sort | uniq நண்பரே. நன்றி நன்றி uniq is not working for the unicode text. We are collecting tamil words to build a tamil spellchecker using hunspell. We need to remove duplicate words from the collection. The uniq is not working. Is there any other way to find duplicate words from unicode file? Thanks. -- Regards, T.Shrinivasan My Life with GNU/Linux : http://goinggnu.wordpress.com Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com Get CollabNet Subversion Edge : http://www.collab.net/svnedge _______________________________________________ ILUGC Mailing List: http://www.ae.iitm.ac.in/mailman/listinfo/ilugc ILUGC Mailing List Guidelines: http://ilugc.in/mailinglist-guidelines