So,for background, I first did:
select count(from_number) from cross_reference and select count(to_number) from cross_reference and got 84919 in both cases. Double that, gives 169838. If I then do: select from_number from cross_reference union select to_number from cross_referenceI get 110256 rows. (There are two columns in the table, if one has data then the other must also, hence the equal counts). This means the union (as expected) is merging the two lists and eliminating duplicate values.
To confirm, I took all the lines found by each of the individual selects and put them in two files (named quite originally as from_number and to_number ;).
Each file has the expected number of total lines (84919). I then did: sort -n -o from_number from_number sort -n -o to_number to_number Still the same number of lines, only numerically sorted, now. Then: uniq from_number | wc -l 73609 uniq to_number | wc -l 48418Adding these leaves 122027, too big by 12000+. Ah, I thinks to meself, some of the numbers in the two files can match each other between the files, but are unique in each file. So:
sort -m from_number to_number | uniq | wc -l 122010This is still almost 12000 too big (only 17 less than the 'uniq' on the separate files). So, I run this:
sort -u from_number to_number | wc -l And I get 110256, the same number as the SQL UNION gave me.So, if both files are sorted and I then use 'sort -m' followed by 'uniq' and count the results, shouldn't I get the same thing as resorting the two (already sorted) files with sort's '-u' option and counting that output?
I did wonder if I needed to use '-n' with the '-m', but that didn't fix anything, in fact, I got a different count: 121995.
Am I missing something obvious, having to do with numbers and merging? Or is this a bug in sort?
Thanks for your patience with the long post ;} Bob
smime.p7s
Description: S/MIME Cryptographic Signature