Dear all,

I want to delete the exact matches in a large dataset based on a smaller 
dataset. In other words I want to subtract the smaller dataset from the larger 
one. The smaller dataset is a part of the larger one. The datasets contains 
hundred of thousands of lines (1 column) and the content on each line differ in 
length. The data is extracted paths from web logs.

On an abstract level I want to subtract dataset2 from dataset1 to get dataset3:

dataset1: 
1 A
2 B
3 X
4 AA
5 A
6 D
7 XA
8 C

dataset2:
1 A
2 X
3 A

dataset3:
1 B
2 AA
3 D
4 XA
5 C

The final order in dataset3 is not important.

Thanks,

Jonas Fransson
Ph.D.stud.

IVA / Det Informationsvidenskabelige Akademi
Royal School of Library and Information Science
Birketinget 6
DK-2300 Copenhagen S
T +45 32 58 60 66
D +45 32 34 15 10
www.iva.dk/jf


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to