if you can tolerate errors then a simple idea is to generate a random number in the range 0 ... 2 ^n and use that as the key. if the number of lines is small relative to 2 ^ n then with high probability you won't get the same key twice.
Miles 2009/5/4 Rares Vernica <[email protected]>: > Hello, > > TextInputFormat is a perfect match for my problem. The only drawback is > that fact that keys are unique only within a file. Is there an easy way > to have keys unique across files. That is, each line in any file should > get a unique key. Is there an unique id for each file? If yes, maybe I > can concatenate them if I can access the file id from the map function. > > Thanks, > Rares > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
