Hello R users, I have been using R for a while now for basic stats but I'm now trying to get my head around looping scripts and in some places I am failing!
I have a data set with c. 1200 data points on 98 individual animals with data on each row representing a daily measure and I am asking the question "what variables affect the animal's behaviour?" the dataset includes these variables for analyses: presence of behaviour, absence of behaviour, site, year, rain, air temp, ID, Day Listed below as they appear in the data set: BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2, DAY with BEH_T & BEH_F = the response variable for a binomial GLM here is the head of the dataset (NB there are only two years and two sites) BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY [1,] 14 10 1 2007 0 12 10.98750 1 1 [2,] 37 23 1 2007 0 13 11.47333 1 2 [3,] 56 22 1 2007 0 14 12.16667 1 3 [4,] 43 23 1 2007 0 16 10.91515 1 5 [5,] 62 16 1 2007 0 17 12.81026 1 6 [6,] 30 20 1 2007 0 19 8.67037 1 8 (Sorry the headings are skewed) Because I don't want to do too complex a model to start with (just wanting to learn first with a 'simple' model) I have issues with independence of the data as there are repeats of individuals - i.e. data taken on the same IDs on different days. So in order to account for that I have decided to random sample one data point for each ID then run the GLM on that data for x number of simulations to see if the explanatory variables are the same/similar across all models. (This will reduce my data set to 98 data points, but it is the best way I can see of doing this without doing mixed-effects models, since not all IDs are seen at both sites in both years). I am also using the MuMIn package for running all subsets of your model the code I'm using is: for (S in 1:2){ Sample.dat<-ALL.R[1,] for (I in 1:98) { tmp<-ALL.R[ALL.R$ID2==I,] max<-dim(tmp)[1] if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE + Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + Sample.dat$MO_AIR_TEMP, family="binomial") mod<-dredge(m1.R)}}} At this point I have two issues if I do it manually then it seems to work i.e. gives me one output (e.g shown at bottom of post) where I then want to take the first line, the model with the best AIC using mod[1,] - no problem! However, letting the code run and for example using print ((mod[1,])) at the end it prints out the first line of 98 outputs - so I'm not too sure what I've done wrong here, but it appears to be running a model for each ID - something basic no doubt! Ideally, what I want to do is take a random sample of the data then run the model get one output for that take the top line (i.e. the best AIC) and save this, then run this routine say 100 times, saving that top line every time, then having a look at the results and take a model average. Anytime I've got close to this I have issues with overwriting the previous first line of the model selection and I can't seem to identify how to set this loop up properly. Any advice or guidance would be most appreciated, I have tried to explain my issues clearly but if more info is required please just ask, Many thanks in advance to those of you that took the time to read this! Ross Ross Culloch Ph.D. Student Durham University UK Here is an example of the model selection table from usingMuMIn: Model selection table (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc delta weight 30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1 687.7 0.000 0.707 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0 2.249 0.230 26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9 694.3 6.630 0.026 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2 8.493 0.010 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2 8.529 0.010 21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0 696.4 8.717 0.009 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0 10.330 0.004 28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4 698.1 10.370 0.004 20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0 702.5 14.770 0.000 10 530.4000 0.07447 -0.2639 3 324.0 703.1 703.3 15.640 0.000 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6 16.850 0.000 19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1 705.5 17.820 0.000 16 743.4000 0.4875 -0.3698 3 328.7 707.8 708.0 20.310 0.000 9 0.5512 0.06094 0.2286 3 328.8 707.9 708.1 20.430 0.000 8 0.6828 0.08019 -0.01688 3 328.9 708.0 708.2 20.540 0.000 18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9 709.3 21.620 0.000 25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7 710.1 22.410 0.000 17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9 710.4 22.670 0.000 2 0.4985 0.08406 2 335.8 712.8 713.0 25.270 0.000 7 0.4996 0.08516 -0.0023780 3 335.6 714.7 714.9 27.240 0.000 14 1.0760 -0.02288 0.5151 3 340.8 719.9 720.1 32.420 0.000 23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9 722.3 34.590 0.000 5 0.8587 0.5304 2 354.0 731.0 731.1 43.440 0.000 12 0.8663 -0.0042170 0.5473 3 353.5 732.5 732.8 45.070 0.000 24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4 739.8 52.140 0.000 15 942.8000 -0.02909 -0.4689 3 370.5 749.5 749.8 62.090 0.000 13 915.8000 0.0151200 -0.4556 3 384.7 763.7 764.0 76.290 0.000 6 900.3000 -0.4478 2 391.8 768.9 769.0 81.320 0.000 11 1.3530 0.0176300 -0.02957 3 402.3 781.3 781.6 93.890 0.000 4 1.3940 -0.02630 2 412.3 789.4 789.5 101.800 0.000 3 1.1010 0.0134300 2 424.4 801.4 801.6 113.800 0.000 1 1.1550 1 430.3 805.4 805.4 117.700 0.000 > -- View this message in context: http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1570593.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.