Hi,
  When i tried to merge two datasets (multiple to multiple merge), i met a
problem on how to stop a possible loop in the sampling arguments.
###My codes are as follows.###
data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
data1<-data.frame(data1);names(data1)<-c("areaid","x","y","date")
data2<-matrix(data=c(1,1.22,1.32,1,  1.53,  2.34,1,  1.21,  1.37,1,  1.52,
2.35,2,  0.21,  3.33,2,  0.23,  3.35,3,  1.57, 1.31,3,  1.59,
1.33),nrow=8,ncol=3,byrow=TRUE)
data2<-data.frame(data2);names(data2)<-c("areaid","x1","y1")
id<-unique(data1$areaid)
for (n in id) {
 data1_n<-data1[data1$areaid==n,]
 data2_n<-data2[data2$areaid==n,]
 leg1_n<-length(data1_n$areaid)
 leg2_n<-length(data2_n$areaid)
 if (leg1_n=1) merge(data1_n,data2_n,by.x="areaid",by.y="areaid") else
   {
    #leg1_n=1>=2
   #leg1_1=2 and leg2_1=4 for areaid=1
  set.seed(1000)
  samp1_n<-sample(c(1:leg1_n),1, replace = FALSE)
  data1_n_samp1<-data1_n[samp1_n,]
  samp2_n<-sample(c(1:leg2_n),2, replace = FALSE)
  data2_n_samp2<-data2_n[samp2_n,]
  merge(data1_n_samp1,data2_n_samp2,by.x="areaid",by.y="areaid")
  #need to continue to sample from the remained records in data1_n and
data2_n??????????
  #some criteria to stop the sampling maybe needed?????????
  }
 }
#merge all the above dataset to get the final results.

  Any ideas or suggestions on the problem? Thanks a lot.

###My question is explained in detail###

 two datasets, data1 and data2.

######

data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)

data1<-data.frame(data1)

names(data1)<-c("areaid","x","y","date")

data1

   areaid   x   y      date

1      1 1.2 1.3 3/23/2004

2      1 1.5 2.3 3/22/2004

3      2 0.2 3.3 4/23/2004

4      3 1.5 1.3 5/22/2004

######

data2<-matrix(data=c(1,1.22,1.32,1,  1.53,  2.34,1,  1.21,  1.37,1,  1.52,
2.35,2,  0.21,  3.33,2,  0.23,  3.35,3,  1.57, 1.31,3,  1.59,
1.33),nrow=8,ncol=3,byrow=TRUE)

data2<-data.frame(data2)

names(data2)<-c("areaid","x1","y1")

data2

   areaid x1   y1

1      1 1.22 1.32

2      1 1.53 2.34

3      1 1.21 1.37

4      1 1.52 2.35

5      2 0.21 3.33

6      2 0.23 3.35

7      3 1.57 1.31

8      3 1.59 1.33

  Explains the two data. You can treat data1 as case dataset and data2 as
control dataset,respectively.Note th number of recodes for data2 are 2 times
as that of data1 for each records,something like 1:2 matched case-control
study design. I hope to merge data1 and data2. Take areaid=1 as an example.
>From the two dataset, we can see that data1 has two points(x,y) in areaid=1,
and data2 has four points (x1,y1) in areaid=1. Each record in data1 will
have two matched records in data2. I want to randomly select 1/2 points of
areaid=1 in data2 to link the one record of areaid=1 in the data1, and the
other 1/2 points of areaid=1 in data2 to link the other one record of
areaid=1 in the data1.Actually, the number of records in the same areaid
will be over 2 in the actual dataset1. This is only an example to explain
the problem.  For the cases of areaid=2 or 3,they are a little easier than
areaid=1 because there are only one value in data1.

  The key or match variable is just areaid.

  The final results are something like the following dataset.

areaid x1 y1    date         x  y

1  1.22  1.32  3/23/2004   1.2  1.3

1  1.53  2.34  3/22/2004   1.2  1.3

1  1.21  1.37  3/23/2004   1.5  2.3

1  1.52  2.35  3/22/2004   1.5  2.3

2  0.21  3.33  4/23/2004   0.2  3.3

2  0.23  3.35  4/23/2004   0.2  3.3

3  1.57  1.31  5/22/2004   1.5  1.3

3  1.59  1.33  5/22/2004   1.5  1.3

-- 
-----------------
Jane Chang
Queen's

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to