[R] Loop overwrite and data output problems
Hello R users, I have been using R for a while now for basic stats but I'm now trying to get my head around looping scripts and in some places I am failing! I have a data set with c. 1200 data points on 98 individual animals with data on each row representing a daily measure and I am asking the question "what variables affect the animal's behaviour?" the dataset includes these variables for analyses: presence of behaviour, absence of behaviour, site, year, rain, air temp, ID, Day Listed below as they appear in the data set: BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2, DAY with BEH_T & BEH_F = the response variable for a binomial GLM here is the head of the dataset (NB there are only two years and two sites) BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY [1,]14101 2007 0 1210.98750 1 1 [2,]37231 2007 0 1311.47333 1 2 [3,]56221 2007 0 1412.16667 1 3 [4,]43231 2007 0 1610.91515 1 5 [5,]62161 2007 0 1712.81026 1 6 [6,]30201 2007 0 19 8.67037 1 8 (Sorry the headings are skewed) Because I don't want to do too complex a model to start with (just wanting to learn first with a 'simple' model) I have issues with independence of the data as there are repeats of individuals - i.e. data taken on the same IDs on different days. So in order to account for that I have decided to random sample one data point for each ID then run the GLM on that data for x number of simulations to see if the explanatory variables are the same/similar across all models. (This will reduce my data set to 98 data points, but it is the best way I can see of doing this without doing mixed-effects models, since not all IDs are seen at both sites in both years). I am also using the MuMIn package for running all subsets of your model the code I'm using is: for (S in 1:2){ Sample.dat<-ALL.R[1,] for (I in 1:98) { tmp<-ALL.R[ALL.R$ID2==I,] max<-dim(tmp)[1] if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE + Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + Sample.dat$MO_AIR_TEMP, family="binomial") mod<-dredge(m1.R)}}} At this point I have two issues if I do it manually then it seems to work i.e. gives me one output (e.g shown at bottom of post) where I then want to take the first line, the model with the best AIC using mod[1,] - no problem! However, letting the code run and for example using print ((mod[1,])) at the end it prints out the first line of 98 outputs - so I'm not too sure what I've done wrong here, but it appears to be running a model for each ID - something basic no doubt! Ideally, what I want to do is take a random sample of the data then run the model get one output for that take the top line (i.e. the best AIC) and save this, then run this routine say 100 times, saving that top line every time, then having a look at the results and take a model average. Anytime I've got close to this I have issues with overwriting the previous first line of the model selection and I can't seem to identify how to set this loop up properly. Any advice or guidance would be most appreciated, I have tried to explain my issues clearly but if more info is required please just ask, Many thanks in advance to those of you that took the time to read this! Ross Ross Culloch Ph.D. Student Durham University UK Here is an example of the model selection table from usingMuMIn: Model selection table (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc delta weight 30 645.8000 0.03841-0.02148 0.2882 -0.3212 5 304.0 687.1 687.7 0.000 0.707 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0 2.249 0.230 26 785.1000-0.02543 0.4678 -0.3905 4 312.8 693.9 694.3 6.630 0.026 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2 8.493 0.010 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2 8.529 0.010 21 582.8000 0.06893-0.01967-0.2899 4 314.9 696.0 696.4 8.717 0.009 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0 10.330 0.004 28 600.1000 0.06612 0.0046710 -0.02092-0.2985 5 314.4 697.4 698.1 10.370 0.004 20 0.7526 0.05509-0.01808 0.2450 4 321.0 702.0 702.5 14.770 0.000 10 530.4000 0.07447-0.2639 3 324.0 703.1 703.3 15.640 0.000 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6 16.850 0.000 19 530. 0.07455 -0.0001489
Re: [R] Loop overwrite and data output problems
HI Ivan, thanks for your post, I really appreciate the time you've taken over my problem! if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) This part of the script works - I appreciate that it may not be the best option and I'm perhaps papering over the cracks but I did try your method and it didn't seem to work - but I am 100% sure that it is my fault! Most likely due to the Sample.dat <-list() ) command you suggest - not sure if you mean Sample.dat <-list(ALL.R[1,]) )? But that doesn't work. It seems like you have the correct answer though, with respect to the 'store your line in the Ith element of the list' comment which is exactly what I want to do. So after the model: m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE + Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + Sample.dat$MO_AIR_TEMP, family="binomial") mod<-dredge(m1.R) I want to do a similar command that will store the first line for each model output - but when I use similar if and else commands I can't get it to work, they just overwrite the data because I can't see where to set the variable to avoid this, for example, I want to take mod[1,] so I could follow the above script with: Line<-mod[1,] if (S==1) Line else {Line<-rbind(Line, mod)} but because I can't work out where to place the loop what obviously happens is that Line is overwritten on every loop resulting in the data overwriting itself, Sorry, I appreciate that I'm not explaining this very well. Best wishes, Ross -- View this message in context: http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1573391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: Loop overwrite and data output problems
Hi Petr, Thank you for your post - I really appreciate you taking the time over my problem. Apologies for not posting more data, it is just that the data set is rather large, and I don't like posting the whole thing on the website for that reason. I have managed to random sample the 98 individuals so that I effectively get 98 data points from the data set, I do this with the script below, which I appreciate may not be the best way to do it: for (S in 1:1){ Sample.dat<-ALL.R[1,] for (I in 1:98){ tmp<-ALL.R[ALL.R$ID2==I,] max<-dim(tmp)[1] if (I==1) Sample.dat[1,]<-tmp[sample(1:max,1),] else { Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) Here is the first section of the output for Sample.dat, some columns are not used in the analysis and the majority of the variables are not shown here, but what you can see is that the script has taken one data point from ID2 1:98 (columns are a bit skewed and ID2 appears under DAY because the numeric for the rows is inc. in the e.g.) SITE_NAME SITE YEAR NAME DAY ID2 n_DAY BEH_F BEH_T 1 NR1 2007 A1 3 1782256 31 NR1 2007 A2 1 224 123 80 NR1 2007 D2 23 336 234 106 NR1 2007 D5 19 434 925 136 NR1 2007 E2 19 5361224 160 NR1 2007 F10 13 648 444 193 NR1 2007 F4 16 736 531 222 NR1 2007 F8 15 847 839 263 NR1 2007 G3 26 930 624 292 NR1 2007 G4 25 1030 822 317 NR1 2007 G5 20 1136 630 339 NR1 2007 H1 12 12421131 370 NR1 2007 I1 13 13481632 411 NR1 2007 I4 24 1412 210 433 NR1 2007 J1 16 15361125 477 NR1 2007 J2 30 1636 135 500 NR1 2007 K1 23 1733 627 537 NR1 2007 K4 30 1836 432 567 NR1 2007 L1 30 19361224 592 NR1 2007 L2 25 2030 921 614 NR1 2007 M2 17 2136 432 644 NR1 2007 M3 17 2224 420 688 NR1 2007 M4 31 2336 333 707 NR1 2007 N1 20 2436 432 741 NR1 2007 N4 24 25 1 0 1 776 NR1 2007 P1 29 26361026 804 NR1 2007 R1 27 2718 018 836 NR1 2007 R4 29 2836 333 862 NR1 2007 S1 25 29301119 897 NR1 2007 S4 30 3036 333 911 NR1 2008 A1 11 31 1102486 930 NR1 2008 A2 3 32 1143480 1159NR1 2008 A3 16 33 1151996 1178NR1 2008 A4 8 34 1212992 1205NR1 2008 A5 8 35621943 1246NR1 2008 B1 22 36 1053966 1258NR1 2008 C1 7 37491237 1289NR1 2008 C3 11 38 1214081 1328NR1 2008 D1 23 39351223 1354NR1 2008 F1 22 40 1093178 1377NR1 2008 G1 18 41 1112091 1400NR1 2008 G2 14 42 11515 100 978 NR1 2008 H1 24 43912368 1438NR1 2008 H2 25 44911873 1003NR1 2008 I1 22 45 1092881 1452NR1 2008 I2 12 46301119 1491NR1 2008 I3 24 47912467 1025NR1 2008 I4 17 4834 925 1059NR1 2008 J1 24 49911675 1512NR1 2008 J3 18 50922171 1535NR1 2008 J4 14 5144 242 1564NR1 2008 J6 16 52 11513 102 1080NR1 2008 K1 18 53 111 4 107 1595NR1 2008 K2 20 54411229 1620NR1 2008 K3 18 5527 720 1104NR1 2008 L2 15 5648 939 1650NR1 2008 L4 21 57 1153382 1677NR1 2008 L5 21 58 1112487 1143NR1 2008 N1 28 59752451 1701NR1 2008 N3 18 60 1071889 1735NR1 2008 NNB 25 61911180 1757NR1 2008 O1 20 6220 911 2002FA0 2008 A10 8 63952867 2006FA0 2008 A11 2 64461432 2020FA0 2008 A12 6 65973067 2026FA0 2008 A13 2 66883355 2038FA0 2008 A14 4 67973265 2049FA0 2008 A15 5 68923458 2055FA0 2008 A16 1 6920 515 1888FA0 200
Re: [R] Odp: Loop overwrite and data output problems
Hi Pter, No doubt! I have put a very short form of the data set on the email - it is basically 2 data points from each individual, which should be enough to get an idea of where I'm going wrong.hopefully! I can send this as a .csv if you prefer? Cheers, Ross SITE_NAME SITEYEARNAMEDAY ID2 n_DAY BEH_T BEH_F DATEMO_AIR_TEMP PRECIP_MM_DAY DAY_PUPPED_EST DAY_LEAVE_EST PUP_AGE_EST NR 1 2007A1 3 1 78 22 56 02/10/2007 12.1667 0 -11 10 14 NR 1 2007A2 2 2 60 10 50 01/10/2007 11.4733 0 -10 12 12 NR 1 2007D2 20 3 36 11 25 19/10/2007 11.4083 0 5 25 16 NR 1 2007D5 12 4 42 15 27 11/10/2007 11.0667 4 5 23 8 NR 1 2007E2 22 5 28 9 19 21/10/2007 11.5667 0 8 24 15 NR 1 2007F10 14 6 33 4 29 13/10/2007 12.34545455 0 -12 15 26 NR 1 2007F4 9 7 60 4 56 08/10/2007 10.0133 0 8 27 2 NR 1 2007F8 9 8 60 23 37 08/10/2007 10.0133 0 8 33 2 NR 1 2007G3 19 9 36 3 33 18/10/2007 11.0917 0 12 30 8 NR 1 2007G4 12 10 42 5 37 11/10/2007 11.0667 4 10 26 3 NR 1 2007G5 9 11 12 3 9 08/10/2007 10.0133 0 9 26 1 NR 1 2007H1 19 12 35 9 26 18/10/2007 11.0917 0 10 30 10 NR 1 2007I1 29 13 36 9 27 28/10/2007 9.34722 8 12 31 18 NR 1 2007I4 17 14 36 5 31 16/10/2007 9.61944 9.5 12 29 6 NR 1 2007J1 30 15 36 14 22 29/10/2007 6.53889 8 14 34 17 NR 1 2007J2 24 16 12 0 12 23/10/2007 11.8167 2 13 34 12 NR 1 2007K1 29 17 36 10 26 28/10/2007 9.34722 8 16 32 14 NR 1 2007K4 27 18 12 2 10 26/10/2007 10.525 13 16 34 12 NR 1 2007L1 18 19 36 13 23 17/10/2007 7.8 8 16 34 3 NR 1 2007L2 24 20 12 0 12 23/10/2007 11.8167 2 16 33 9 NR 1 2007M2 18 21 36 7 29 17/10/2007 7.8 8 17 35 2 NR 1 2007M3 23 22 33 4 29 22/10/2007 11.6556 14 17 35 7 NR 1 2007M4 18 23 25 5 20 17/10/2007 7.8 8 17 35 2 NR 1 2007N1 19 24 36 4 32 18/10/2007 11.0917 0 18 36 2 NR 1 2007N4 29 25 36 4 32 28/10/2007 9.34722 8 18 30 12 NR 1 2007P1 20 26 18 7 11 19/10/2007 11.4083 0 20 38 1 NR 1 2007R1 32 27 36 3 33 31/10/2007 12.0111 18 23 41 10 NR 1 2007R4 31 28 36 11 25 30/10/2007 8.87778 4.5 27 45 5 NR 1 2007S1 27 29 24 4 20 26/10/2007 10.525 13 24 42 4 NR 1 2007S4 27 30 24 5 19 26/10/2007 10.525 13 25 43 3 NR 1 2008A1 16 31 112 35 77 15/10/2008 9.052.7 1 19 16 NR 1 2008A2 3 32 114 34 80 02/10/2008 8.1 5.5 -15 4 18 NR 1 2008A3 3 33 73 6 67 02/10/2008 8.1 5.5 3 21 1 NR 1 2008A4 9 34 107 15 92 08/10/2008 10.80 -6 12 15 NR 1 2008A5 5 35 16 8 8 04/10/2008 5.490909091 14.52 19 4 NR 1 2008
Re: [R] Odp: Loop overwrite and data output problems
Hi Petr, Thanks again for trying again with these data, I really appreciate it. Your script works perfectly, but the problem I'm having is how to store the model results so after your script I would do: m1.R<-glm(cbind(res$BEH_T, res$BEH_F) ~ res$SITE + res$YEAR + res$PRECIP_MM_DAY + res$PUP_AGE_EST + res$MO_AIR_TEMP, family="binomial") mod<-dredge(m1.R) where mod is a list not a vector. So your example has 10 iterations of the loop so there should therefore be 10 different mod[1,] that I want to store and that is what I can't work out how to do, for example I can do this: if (i>=1) print (mod[1,]) else print ("NO")} And I will get a print of each of the 10 model outputs that I want, but I want to store these somewhere. I did try to adjust your value <- matrix section of the script but had no luck. I hope this is a little clearer? Thank you again for your help, I really appreciate it! Ross -- View this message in context: http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1573703.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: Loop overwrite and data output problems
Hi Petr, Thanks again for your post the problem is now solved - thank you so much for trying and trying to get this to work. So the final script that actually worked was: ##ALL SUBSET DATA #Create vector to put data in mod <- vector(1000,mode="list") #first order your data according to ID2 dat.o<-ALL.R[order(ALL.R$ID2),] #how many values are in each ID2 and a breakpoint fro each ID2 len<-rle(dat.o$ID2)$lengths shift.len<-c(0,cumsum(len))[-(length(len)+1)] for(i in 1:1000) { samp<-sapply(lapply(split(dat.o$ID2, dat.o$ID2), function (x) 1:length (x)), sample, 1) Sample.dat<-dat.o[shift.len+samp,] m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE + Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + Sample.dat$MO_AIR_TEMP, family="binomial") model<-dredge(m1.R) mod[[i]]<-do.call("rbind", model[1,])} write.table(mod, "/FILE_PATH/test.txt", col.names=T, row.names=F, sep = "\t") Then with the file written to .csv I could open it in excel, transpose the data and type in the column and row names, a little bit of manual labour c. 3 mins, but worth it! Really, really appreciate your help with this Petr, I know I wasn't too clear from the start, but I wasn't entirely sure what the problem was myself! Best wishes, Ross -- View this message in context: http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1587493.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: Loop overwrite and data output problems
Hi Petr, Thanks again!!! model is a list. So your suggestion: mod <- matrix(NA, 1000, ncols) doesn't work. I thought that do.call and rbind would be the best for these data? Cheers, Ross -- View this message in context: http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1596889.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Intra-Class correlation psych package missing data
Hello R users, and perhaps William Revelle in particular, I'm curious as to how ICC deals with missing data, so for example you are sampling individuals over set periods in time and one individual is missing or was not recaptured at that given time point - leading to NA in the dataset. My thought was that it should then omit data by individual, but I'm not convinced that that is what it is doing? Does anyone know, I have looked at ?ICC but there is no information there, apologies if I have missed it in any other help file, I have looked, but to no avail! Thanks in advance, Ross -- View this message in context: http://n4.nabble.com/Intra-Class-correlation-psych-package-missing-data-tp1773942p1773942.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple loop code
Hi fellow R Users, I find that I typically rewrite my data specific to data in columns, which is by no means efficient and I am struggling to break out of this bad habit and utalise some of the excellent things R can do! I have tried to look at 'for' but I don't really follow it, and I wondered if anyone could help with a simple example using my script so I could follow this and build on it, so for example, wanting to change an ID code from alphanumeric to numeric. The example below works, but takes ages, given I have a lot of IDs, to do manually! Any thoughts on how to create a loop to go through each ID and give them a unique number would be most welcome! Cheers, Ross levels(dat.ID$ID2)[levels(dat.ID$ID2)=='A1']<-1 levels(dat.ID$ID2)[levels(dat.ID$ID2)=='A2']<-2 levels(dat.ID$ID2)[levels(dat.ID$ID2)=='D1']<-3 levels(dat.ID$ID2)[levels(dat.ID$ID2)=='D2']<-4 levels(dat.ID$ID2)[levels(dat.ID$ID2)=='D4']<-5 levels(dat.ID$ID2)[levels(dat.ID$ID2)=='D5']<-6 levels(dat.ID$ID2)[levels(dat.ID$ID2)=='D6']<-7 -- View this message in context: http://r.789695.n4.nabble.com/Simple-loop-code-tp2075322p2075322.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple loop code
Thanks Henrique, that works! for anyone else as slow as me, just: ##Assign x <- factor(dat.ID$ID2, labels = 1:7) ##Convert to dataframe x <- as.data.frame(x) ##Then bind to your data z <- cbind(y,x) Thanks again, I expected it to be more complicated! Cheers, Ross -- View this message in context: http://r.789695.n4.nabble.com/Simple-loop-code-tp2075322p2075586.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple loop code
Thanks David & Henrique, I've been using R for over two years and always used cbind or rbind, that was what I was taught by several folk, and on training courses, you learn something new every day! Cheers, Ross -- View this message in context: http://r.789695.n4.nabble.com/Simple-loop-code-tp2075322p2123641.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Boxplot intervals combining names
Hi R users, This seems like a simple problem but I have searched nabble for the answer and can't seem to find it. All I want to do is produce a boxplot where I have two boxes for one Individual but on the xaxis I only have one tick mark centred between the boxes so I can add the Individuals' name. I have 30 IDs and have shown the code I use below for a couple of IDs, I figure the data is not important here so it's not included. boxplot (ID1[,8],ID1[,9],ID2[,8],ID2[,9],xaxt='n') I have put all the ID names in as 'names1' and I have tried numerous variations on axis, e.g. axis(1,at=1:30,labels=names1) but nothing works: the boxplot appears to 'know' that there are 60 tick marks (data) and therefore only puts ticks half way up the graph, and using: axis(1,at=1:30,labels=names1) complains that there is a difference of length, which of course there is! I must be missing something simple here, but any suggestions would be gratefully received, Ross -- View this message in context: http://r.789695.n4.nabble.com/Boxplot-intervals-combining-names-tp2253442p2253442.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kite diagrams
Hi Par, I am trying to do the exact same thing with my class, I would like to use R too, as well as get them to draw it out. I have tried to follow the suggestions but with no luck. If you did get round to sorting the code I wondered if you'd be so kind as to let me into the secret on how to do it?! Best wishes, Ross -- View this message in context: http://r.789695.n4.nabble.com/Kite-diagrams-tp791596p2276007.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep problem decimal points looping
Hi R Users, I have been trying to work out how to rename column names using grep, basically I have generated these column names using tapply: [1] "NAME" "X1.1" "X2.1" "X3.1" "X4.1" "X5.1" "X6.1" "X7.1" "X8.1" [10] "X1.2" "X2.2" "X3.2" "X4.2" "X5.2" "X6.2" "X7.2" "X8.2" "X1.3" [19] "X2.3" "X3.3" "X4.3" "X5.3" "X6.3" "X7.3" "X8.3" "X1.5" "X2.5" [28] "X3.5" "X4.5" "X5.5" "X6.5" "X7.5" "X8.5" "X1.6" "X2.6" "X3.6" [37] "X4.6" "X5.6" "X6.6" "X7.6" "X8.6" "X1.8" "X2.8" "X3.8" "X4.8" [46] "X5.8" "X6.8" "X7.8" "X8.8" "X1.9" "X2.9" "X3.9" "X4.9" "X5.9" [55] "X6.9" "X7.9" "X8.9" "X1.10" "X2.10" "X3.10" "X4.10" "X5.10" "X6.10" [64] "X7.10" "X8.10" "X1.12" "X2.12" "X3.12" "X4.12" "X5.12" "X6.12" "X7.12" [73] "X8.12" "X1.13" "X2.13" "X3.13" "X4.13" "X5.13" "X6.13" "X7.13" "X8.13" [82] "X1.14" "X2.14" "X3.14" "X4.14" "X5.14" "X6.14" "X7.14" "X8.14" "X1.15" [91] "X2.15" "X3.15" "X4.15" "X5.15" "X6.15" "X7.15" "X8.15" "X1.16" "X2.16" [100] "X3.16" "X4.16" "X5.16" "X6.16" "X7.16" "X8.16" "X1.17" "X2.17" "X3.17" [109] "X4.17" "X5.17" "X6.17" "X7.17" "X8.17" "X1.18" "X2.18" "X3.18" "X4.18" [118] "X5.18" "X6.18" "X7.18" "X8.18" "X1.19" "X2.19" "X3.19" "X4.19" "X5.19" [127] "X6.19" "X7.19" "X8.19" "X1.20" "X2.20" "X3.20" "X4.20" "X5.20" "X6.20" [136] "X7.20" "X8.20" "X1.21" "X2.21" "X3.21" "X4.21" "X5.21" "X6.21" "X7.21" [145] "X8.21" "X1.22" "X2.22" "X3.22" "X4.22" "X5.22" "X6.22" "X7.22" "X8.22" [154] "X1.23" "X2.23" "X3.23" "X4.23" "X5.23" "X6.23" "X7.23" "X8.23" "X1.24" [163] "X2.24" "X3.24" "X4.24" "X5.24" "X6.24" "X7.24" "X8.24" "X1.25" "X2.25" [172] "X3.25" "X4.25" "X5.25" "X6.25" "X7.25" "X8.25" "X1.26" "X2.26" "X3.26" [181] "X4.26" "X5.26" "X6.26" "X7.26" "X8.26" "X1.27" "X2.27" "X3.27" "X4.27" [190] "X5.27" "X6.27" "X7.27" "X8.27" "X1.28" "X2.28" "X3.28" "X4.28" "X5.28" [199] "X6.28" "X7.28" "X8.28" "X1.29" "X2.29" "X3.29" "X4.29" "X5.29" "X6.29" [208] "X7.29" "X8.29" "X1.30" "X2.30" "X3.30" "X4.30" "X5.30" "X6.30" "X7.30" [217] "X8.30" "X1.31" "X2.31" "X3.31" "X4.31" "X5.31" "X6.31" "X7.31" "X8.31" [226] "X1.32" "X2.32" "X3.32" "X4.32" "X5.32" "X6.32" "X7.32" "X8.32" "X1.33" [235] "X2.33" "X3.33" "X4.33" "X5.33" "X6.33" "X7.33" "X8.33" What the names mean are behaviour.day the X is not important to the data, it is the numbers I am trying to select on. So I want to split the data by day i.e. selecting for the number after the decimal. I am using this code (where scananal is the data) with out looping so the number following the decimal I change manually (NB the data have been changed to character): DAY <- grep("(X[[:digit:]]+).3",colnames(scananal)) However, this will select for day 3, 30, 31, 32, etc I have tried to use fixed = TRUE, but that just returns integer(0). But if I use 30, it will select only 30. Not sure what I'm doing wrong here, and I assumed that fixed = T would fix this, but doesn't. I have tried to loop this too, but with no luck, so if anyone can point me in the right direction about how to loop using grep I would be most grateful! The main problem I have is where to put the loop, for example: for(i in 1:33){ print(i) DAY[[i]] <- grep("(X[[:digit:]]+).[[i]]",colnames(scananal)) } which doesn't work, and no doubt there are obvious reasons for this! Any help would be much appreciated, All the best, Ross -- View this message in context: http://r.789695.n4.nabble.com/grep-problem-decimal-points-looping-tp2319773p2319773.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep problem decimal points looping
Hi David, Thanks very much for that reply! I might be a touch out of my comfort zone, but I can see how the loop script works and where I went wrong, but I'm not sure if I am asking the correct questions here, or perhaps more accurately I'm using the wrong command for the task in question - and as you say more info would be better! So. I want to split the data by day to look at the proportion of time an individual spent in each of the eight behaviours - there are 30 rows (i.e. individuals). So I'm going over old code trying to make it better (not that it could be worse!), especially trying to make it more efficient! So my old code did this (manually for each day): ##DAY1## DAY1 <-cbind(scananal$X1.1, scananal$X2.1, scananal$X3.1, scananal$X4.1, scananal$X5.1, scananal$X6.1, scananal$X7.1, scananal$X8.1) head(DAY1) which would give, for example, head(DAY1) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 140212203 [2,] 230010000 [3,]00000000 [4,]00000000 [5,]00000000 [6,]00000000 I'd run the following script to get the proportions then bind that together with other data ###DAY1~~~### ## CALC NSCANS PER ID ## n <- rowSums(DAY1) ## GIVE THE DAY NUMBER TO THE DATAFILE DAY <- rep(1,30) ## CALC PROPORTION OF TIME IN EACH ACTIVITY ## scansprop <- as.data.frame(prop.table(DAY1,1)) head(scansprop) ##CALC AS ARC_SINE_TRANSFORMED### transscan<-asin(scansprop) head (transscan) ##gives column headings## names(transscan) ##CHECK IT ALL ADDS TO ONE!! ## rowSums(scansprop) ##MERGES ALL THE DATA FOR THE DAY DAY1_SUM <- cbind(n,DAY,DAY1,scansprop,transscan) Then I would merge each of the days, so this script works, but I know it is rather a poor effort in R script to say the least! I'm trying to work through this myself, but hit a hurdle in the first instance! Not sure if this is any clearer? Cheers, Ross -- View this message in context: http://r.789695.n4.nabble.com/grep-problem-decimal-points-looping-tp2319773p2319941.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] store and repeat data based on row names (loop, if statement)
Hello fellow R users, I have an issue that has me a little confused - sorry if the subject makes little sense, I wasn't sure how to refer to this problem. I have a data set I've extracted from ArcInfo (a section is shown below). It is spatial data, showing the distance from one ID to another. I want to get the actual 'TO' ID from the data set (there is no easy way to do this in Arc so I thought I would try in R). The way to do this is to find the dist = 0 row for an ID then that is that IDs unique 'TO' code, so if you look down the second column the highest no. is 4, and A1 = 2, A1.1 = 1, A2 = 4, A2.1 = 3. So I need to get that data and then put it in a new column that will basically read A1.1, A1, A2.1, A2, A1.1, A1, A2.1, A2, A1.1, A1, A2.1, A2, A1.1, A1, A2.1, A2. If anyone has any hints or tips or places to look I would be most grateful! Cheers, Ross TO DISTID 1 2.63981 'A1' 2 0 'A1' 3 6.95836 'A1' 4 8.63809 'A1' 1 0 'A1.1' 2 2.63981 'A1.1' 3 8.03071 'A1.1' 4 8.90896 'A1.1' 1 8.90896 'A2' 2 8.63809 'A2' 3 2.85602 'A2' 4 0 'A2' 1 8.03071 'A2.1' 2 6.95836 'A2.1' 3 0 'A2.1' 4 2.85602 A2.1' -- View this message in context: http://r.789695.n4.nabble.com/store-and-repeat-data-based-on-row-names-loop-if-statement-tp2236928p2236928.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] store and repeat data based on row names (loop, if statement)
Hi Jim, Many thanks - that has worked perfectly, thanks so much for your help! Best wishes, Ross -- View this message in context: http://r.789695.n4.nabble.com/store-and-repeat-data-based-on-row-names-loop-if-statement-tp2236928p2237628.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] storing output data from a loop that has varying row numbers
Hi All, I am trying to run a loop that will have varying numbers of rows with each output. Previously I have had the same number of rows so I would use (and I appreciate that this will no doubt achieve some gasps as being thoroughly inefficient!): xdfrow<-(0) xdfrow1<-(1:32) xdfrow2<-(33:64) xdfrow3<-(65:96) xdfrow4<-(97:128) xdfrow5<-(129:160) xdfrow6<-(161:192) xdfrow7<-(193:224) and so on xdf <- matrix(999, nrow=1024, ncol=7) xdf <- as.data.frame(xdf) NAM <- c("NAME","ID2","DAY","BEH", "B_FALSE", "B_TRUE","TOTAL") colnames(xdf)<-NAM I then use this matrix and then run the loop and assign the data to each of the xdfrows just doing +1 on each loop. (If that makes sense? Not really important, just trying to show that I do try and solve some of my own problems, albeit perhaps not in the best manner!) _ However, the data I'm working with now has a very varied number of rows (0:2500) over a large data set and I can't work out how is best to do this. So my loop would be: for (i in 1:33){ SEL_DAY<-seal_dist[seal_dist[,10]==i,] print(paste("DAY", i, "of 33")) for (s in 1:11){ SEL_HR<-SEL_DAY[SEL_DAY[,5]==s,] print(paste("HR", s, "of 11")) indx <- subset(SEL_HR, SEL_HR$DIST == 0) SEL_HR$TO_ID <- indx$ID[match(SEL_HR$TO, indx$TO)]} } where i is day and s is the hr within the day, the loop works fine because it prints as i expect it too. I have not given any info on the data because I assume this is more of a method question and will be very straight forward to most people on here!? But I am happy to post data if it is needed. I assume I need to set up a matrix before the loop, e.g. DIST_LOOP<-matrix(NA,1000,ncol=11) and then I should be able to put something before the first } that allows me to add to the matrix, but everything I have tried doesn't work e.g. DIST_LOOP[[i]]<-SEL_HR Any help would be much appreciated, Best wishes, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238396.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
Hi Ivan, Thanks for your help, your initial suggestion did not work, but that is no doubt down to my lack of making sense! Here is a short example of my dataset. Basically the loop is set up to match the ID with the TO column based on DIST = 0. So A1 = 2, A1.1 =1, A2 = 4, A2.1 = 3. That is fine for HR 9, but for HR 10 the numbers no longer match those IDs so I need to loop the data and store each loop - if that makes sense. FROM TO DIST ID HR DD MM YY ANIMAL DAY 1 1 1 2.63981'A1' 9 30 9 7 1 1 2 1 2 0.0'A1' 9 30 9 7 1 1 3 1 3 6.95836'A1' 9 30 9 7 1 1 4 1 4 8.63809'A1' 9 30 9 7 1 1 5 1 1 0.0 'A1.1' 9 30 9 7 7 1 6 1 2 2.63981 'A1.1' 9 30 9 7 7 1 7 1 3 8.03071 'A1.1' 9 30 9 7 7 1 8 1 4 8.90896 'A1.1' 9 30 9 7 7 1 9 1 1 8.90896'A2' 9 30 9 7 1 1 101 2 8.63809'A2' 9 30 9 7 1 1 111 3 2.85602'A2' 9 30 9 7 1 1 121 4 0.0'A2' 9 30 9 7 1 1 131 1 8.03071 'A2.1' 9 30 9 7 7 1 141 2 6.95836 'A2.1' 9 30 9 7 7 1 151 3 0.0 'A2.1' 9 30 9 7 7 1 161 4 2.85602 A2.1' 9 30 9 7 7 1 171 1 3.53695'A1' 10 30 9 7 1 1 181 2 4.32457'A1' 10 30 9 7 1 1 191 3 0.0'A1' 10 30 9 7 1 1 201 4 8.85851'A1' 10 30 9 7 1 1 211 5 12.09194'A1' 10 30 9 7 1 1 221 1 7.44743 'A1.1' 10 30 9 7 7 1 231 2 0.0 'A1.1' 10 30 9 7 7 1 241 3 4.32457 'A1.1' 10 30 9 7 7 1 251 4 13.16728 'A1.1' 10 30 9 7 7 1 261 5 16.34761 'A1.1' 10 30 9 7 7 1 271 1 6.13176'A2' 10 30 9 7 1 1 281 2 13.16728'A2' 10 30 9 7 1 1 291 3 8.85851'A2' 10 30 9 7 1 1 301 4 0.0'A2' 10 30 9 7 1 1 311 5 3.40726'A2' 10 30 9 7 1 1 321 1 9.03345 'A2.1' 10 30 9 7 7 1 331 2 16.34761 'A2.1' 10 30 9 7 7 1 341 3 12.09194 'A2.1' 10 30 9 7 7 1 351 4 3.40726 'A2.1' 10 30 9 7 7 1 361 5 0.0 'A2.1' 10 30 9 7 7 1 371 1 0.0 'MALE1' 10 30 9 7 12 1 381 2 7.44743 'MALE1' 10 30 9 7 12 1 391 3 3.53695 'MALE1' 10 30 9 7 12 1 401 4 6.13176 'MALE1' 10 30 9 7 12 1 411 5 9.03345 'MALE1' 10 30 9 7 12 1 So the loop is: DIST_LOOP<-matrix(NA,NA,ncol=11) for (i in 1:33){ SEL_DAY<-seal_dist[seal_dist[,10]==i,] SEL_DAY[i]=dist[i] print(paste("DAY", i, "of 33")) for (s in 1:11){ SEL_HR<-SEL_DAY[SEL_DAY[,5]==s,] print(paste("HR", s, "of 11")) indx <- subset(SEL_HR, SEL_HR$DIST == 0) SEL_HR$TO_ID <- indx$ID[match(SEL_HR$TO, indx$TO)] DIST_LOOP[i,]<-SEL_HR } } But storing the data in the DIST_LOOP matrix doesn't work, I am just told in another post that a list might be better than a matrix? I hope this makes more sense!? Many thanks, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238483.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
Hi Joris, Thanks for your help! The data as requested: structure(list(FROM = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), TO = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), DIST = c(2.63981, 0, 6.95836, 8.63809, 0, 2.63981, 8.03071, 8.90896, 8.90896, 8.63809, 2.85602, 0, 8.03071, 6.95836, 0, 2.85602, 3.53695, 4.32457, 0, 8.85851, 12.09194, 7.44743, 0, 4.32457, 13.16728, 16.34761, 6.13176, 13.16728, 8.85851, 0, 3.40726, 9.03345, 16.34761, 12.09194, 3.40726, 0, 0, 7.44743, 3.53695, 6.13176, 9.03345), ID = structure(c(12L, 12L, 12L, 12L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 143L, 12L, 12L, 12L, 12L, 12L, 11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 13L, 13L, 94L, 94L, 94L, 94L, 94L), .Label = c("'11.1'", "'15.1'", "'15.5'", "'18.1'", "'24.2'", "'26.1'", "'26.2'", "'28.3'", "'4.2'", "'7.1'", "'A1.1'", "'A1'", "'A2.1'", "'A2'", "'B1'", "'C1'", "'D1.1'", "'D1'", "'D2.1'", "'D2'", "'D3.1'", "'D3'", "'D4.1'", "'D4'", "'D5.1'", "'D5'", "'D6.1'", "'D6'", "'E1.1'", "'E1'", "'E2.1'", "'E2'", "'E4'", "'E5'", "'F1.1'", "'F1'", "'F10.1'", "'F10'", "'F11'", "'F2'", "'F3'", "'F4.1'", "'F4'", "'F5.1'", "'F5'", "'F7'", "'F8.1'", "'F8'", "'G2.1'", "'G2'", "'G3.1'", "'G3'", "'G4.1'", "'G4'", "'G5.1'", "'G5'", "'H1.1'", "'H1'", "'H2'", "'H3.1'", "'H3'", "'H8'", "'I1.1'", "'I1'", "'I2'", "'I4.1'", "'I4'", "'J1.1'", "'J1'", "'J2.1'", "'J2'", "'J3'", "'J6'", "'J7'", "'JUV'", "'K1.1'", "'K1'", "'K2'", "'K3'", "'K4.1'", "'K4'", "'L1.1'", "'L1'", "'L2.1'", "'L2'", "'L4'", "'M1'", "'M2.1'", "'M2'", "'M3.1'", "'M3'", "'M4.1'", "'M4'", "'MALE1'", "'N1.1'", "'N1'", "'N2'", "'N3'", "'N4.1'", "'N4'", "'O1'", "'O2'", "'O3.1'", "'O3'", "'O4.1'", "'O4'", "'O5'", "'P1.1'", "'P1'", "'Q1'", "'Q2'", "'Q3'", "'R1.1'", "'R1'", "'R2'", "'R3.1'", "'R3'", "'R4.1'", "'R4'", "'R5.1'", "'R5'", "'S1.1'", "'S1'", "'S2.1'", "'S2'", "'S3.1'", "'S3'", "'S4.1'", "'S4'", "'T1'", "'U1.1'", "'U1'", "'U2'", "'U3'", "'UKFEM'", "'UKMAL'", "'UKPUP'", "'V1.1'", "'V1'", "'W1.1'", "'W1'", "'WR'", "A2.1'"), class = "factor"), HR = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), DD = c(30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), MM = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), YY = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), ANIMAL = c(1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 12L, 12L, 12L, 12L, 12L), DAY = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("FROM", "TO", "DIST", "ID", "HR", "DD", "MM", "YY", "ANIMAL", "DAY"), row.names = c(NA, 41L ), class = "data.frame") The output should be as the original file is, but it should have an additional column for 'TO_ID' I hope that makes sense? Cheers, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238576.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
Hi Ivan, Thanks again for your help! I'll just go through your questions... I'm still really confused about your question. -Sorry!!! Let me ask you some specific questions (maybe someone more experienced would understand at once, but I'm no expert; I hope I can still help you! In any case, I would like to understand for myself ;) ) - is "seal_dist" the name of your data.frame? yes so.. head(seal_dist) FROM TODIST ID HR DD MM YY ANIMAL DAY 11 1 2.63981 'A1' 9 30 9 7 1 1 21 2 0.0 'A1' 9 30 9 7 1 1 31 3 6.95836 'A1' 9 30 9 7 1 1 41 4 8.63809 'A1' 9 30 9 7 1 1 51 1 0.0 'A1.1' 9 30 9 7 7 1 61 2 2.63981 'A1.1' 9 30 9 7 7 1 - what do you want to do with SEL_DAY[i]=dist[i] That was a (desperate) attempt to do 'something', but didn't work - so shouldn't have been in the script I posted, sorry! ? What is "dist"? It is a measure of distance from one point (ID) to another i.e. the distance between A1 and A1.1 If I understand well, you want to replace the values in FROM (then TO, then DIST...) with the values from the same column number in dist? The problem is that Arc doesn't output the data as I'd like, so I want to create a new column to add to the data. What Arc has done is taken a distance between each ID for each hour, but because the number of IDs in each hour don't match it means that the TO number is not unique to the ID throughout the entire dataset, only on that given hour. So when distance = 0 in the TO column then that TO number -s equal to the ID i.e. the distance to A1 to A1 is 0, so I then want to use that information to create a new column that will tell me the actual ID. If that is any clearer? - Since I still haven't understood your goal completely, I still don't understand why you add the column TO_ID to SEL_HR. see above - In any case, a matrix cannot work because you want to store data of different classes in DIST_LOOP (ID is character and the others are numeric). You can either use a data.frame (if you really want to have the table-like structure, which is a list) or a list. I see, can you advise on how to set up a list to write to? - Moreover, the output from dput(your data) would really help to see what you have! I have not long posted it, I hope it helps!! Thanks again for your help Ivan, much appreciated, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2238630.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
Hi Ivan, Thanks, Jorvis did answer the question - but good to know about list() and that matrix is no good for a mixture of output. I'm slowly getting my head around it! Thanks again for your help, it really was much appreciated! Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2239708.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storing output data from a loop that has varying row numbers
Hi Jorvis, Many thanks for sorting that! I haven't seen it done that way before, so I'll have to look in to the properties of lapply a bit more to get a full appreciation of other approaches to looping data in R. Thanks again for your help, it is much appreciated, Ross -- View this message in context: http://r.789695.n4.nabble.com/storing-output-data-from-a-loop-that-has-varying-row-numbers-tp2238396p2239711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] partial matches across rows not columns
Hi R users, I am trying to omit rows of data based on partial matches an example of my data (seal_dist) is below: A quick break down of my coding and why I need to answer this - I am dealing with a colony of seals where for example A1 is a female with pup and A1.1 is that female's pup, the important part of the data here is DIST which tells the distance between one seal (ID) and another (TO_ID). What I want to do is take a mean for these data for a nearest neighbour analysis but I want to omit any cases where there is the distance between a female and her pup, i.e. in the previous e.g. omit rows where A1 and A1.1 occur. I have looked at grep and pmatch but these appear to work across columns and don't appear to do what I'm looking to do, If anyone can point me in the right direction, I'd be most greatful, Best wishes, Ross FROM TO DISTID HR DD MM YY ANIMAL DAY TO_ID TO_ANIMAL 2 1 2 4.81803A1 1 30 9 9 1 1 MALE112 3 1 3 2.53468A1 1 30 9 9 1 1A2 3 4 1 4 7.57332A1 1 30 9 9 1 1 A1.1 7 5 1 1 7.57332 A1.1 1 30 9 9 7 1A1 1 6 1 2 7.89665 A1.1 1 30 9 9 7 1 MALE112 7 1 3 6.47847 A1.1 1 30 9 9 7 1A2 3 9 1 1 2.53468A2 1 30 9 9 3 1A1 1 10 1 2 2.59051A2 1 30 9 9 3 1 MALE112 12 1 4 6.47847A2 1 30 9 9 3 1 A1.1 7 13 1 1 4.81803 MALE1 1 30 9 9 12 1A1 1 15 1 3 2.59051 MALE1 1 30 9 9 12 1A2 3 16 1 4 7.89665 MALE1 1 30 9 9 12 1 A1.1 7 17 1 1 3.85359A1 2 30 9 9 1 1 MALE112 19 1 3 4.88826A1 2 30 9 9 1 1A2 3 20 1 4 7.25773A1 2 30 9 9 1 1 A1.1 7 21 1 1 9.96431 A1.1 2 30 9 9 7 1 MALE112 22 1 2 7.25773 A1.1 2 30 9 9 7 1A1 1 23 1 3 5.71725 A1.1 2 30 9 9 7 1A2 3 25 1 1 8.73759A2 2 30 9 9 3 1 MALE112 26 1 2 4.88826A2 2 30 9 9 3 1A1 1 28 1 4 5.71725A2 2 30 9 9 3 1 A1.1 7 30 1 2 3.85359 MALE1 2 30 9 9 12 1A1 1 31 1 3 8.73759 MALE1 2 30 9 9 12 1A2 3 32 1 4 9.96431 MALE1 2 30 9 9 12 1 A1.1 7 33 1 1 7.95399A1 3 30 9 9 1 1 MALE112 35 1 3 0.60443A1 3 30 9 9 1 1 A1.1 7 36 1 4 1.91136A1 3 30 9 9 1 1A2 3 37 1 1 8.29967 A1.1 3 30 9 9 7 1 MALE112 38 1 2 0.60443 A1.1 3 30 9 9 7 1A1 1 40 1 4 1.43201 A1.1 3 30 9 9 7 1A2 3 41 1 1 9.71659A2 3 30 9 9 3 1 MALE112 42 1 2 1.91136A2 3 30 9 9 3 1A1 1 43 1 3 1.43201A2 3 30 9 9 3 1 A1.1 7 46 1 2 7.95399 MALE1 3 30 9 9 12 1A1 1 47 1 3 8.29967 MALE1 3 30 9 9 12 1 A1.1 7 48 1 4 9.71659 MALE1 3 30 9 9 12 1A2 3 -- View this message in context: http://r.789695.n4.nabble.com/partial-matches-across-rows-not-columns-tp2247757p2247757.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intra-Class correlation psych package missing data
Hi Bill, No worries, always a million things to do! Thanks very much for the reply, that has cleared that up and I'll look out for the update next week. Many thanks, Ross -- View this message in context: http://r.789695.n4.nabble.com/Intra-Class-correlation-psych-package-missing-data-tp1773942p2250304.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] partial matches across rows not columns
Hi Jim and Hi Jannis, Thanks very much to both of you for your help! Both methods work perfectly! Always good to know that there is more than one way to skin a cat when it comes to R! I will just need to get a grip on the regular expressions, it would seem. Many thanks again for you r help, much appreciated, Ross -- View this message in context: http://r.789695.n4.nabble.com/partial-matches-across-rows-not-columns-tp2247757p2250306.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing GLM coefficients & repeatability
Many thanks for taking the time to read this! I am looking at the repeatability of behaviour between re-sighted individuals across discrete time periods (annual breeding seasons). My approach was to run a GLM (with a logit link - the data are proportional, presence v. absence of behaviour) for each breeding season. I included the re-sighted individuals as a factor (categorical variable) (i.e. the models only contained individuals that were seen in all of the breeding seasons). Inevitably the variables that are retained in the best models are not the same for each breeding season and in one (out of 3 cases) individual is not retained within the best model (although I suspect that is a product of a considerably smaller sample size for that breeding season). I use the best model that has retained individual id and extract the coefficients of the individuals. I then use the ICC command in the package psych to test for repeatability in these values over the three breeding seasons. The results are in fact repeatable, which does support the basic analyses using just the behaviour (without trying to account for potential covariates), which is encouraging. However, I have had a look on nabble and other forms to see if this is at all statistically sound or if I am making a fundamental error in how I am treating the coefficients. I have found a couple of posts, but I don't think that they relate directly to my question. I appreciate that some may suggest using mixed-effects modelling with individual as a random effect. My issue is that the behaviours I am interested in are very rare and are best suited for a beta-binomial distribution (tested using Ben Bolker's script/e.g. in his book). And such a distribution is not available in lme4. Therefore, I'm trying to find another approach to assess whether individual is important in predicting a behaviour, and whether individuals are repeatable/consistent in this respect. Any advice would be most appreciated, Best wishes, Ross -- View this message in context: http://r.789695.n4.nabble.com/comparing-GLM-coefficients-repeatability-tp3772844p3772844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Force regression line to a 1:1 relationship
Hello, I appreciate this is likely to be an easy question. I am trying to obtain the residuals from a linear regression where the line is forced to have a 1:1 relationship. An example of the data: A<-c(0.9803922, 1.3850416, 0.8241758, 0.000, 0.4672897, 1.1904762, 0.000, 0.9456265, 1.5151515) B<-c(1.3229572, 1.9471488, 1.3182674, 0.7007708, 1.0185740, 1.0268562, 0.8695652, 0.3016591, 1.9667171) plot(A, B, ylim=c(0,2), xlim=c(0,2)) abline(0,1, col="lightgrey", lty="dashed",lwd=2)#1:1 relationship = what I want to use in the lm() #Normal regression AB<-lm(A~B) #plot regression line abline(lm(AB)) How can I force the regression to have a 1:1 relationship, I assume it is to do with offset() but I have somewhat fried my brain trying numerous variations and I am not convinced any are correct. I was also hoping the plot function would show me that the calculation is correct, but any time I use the offset() command there is no line plotted? Any hints or tips would be much appreciated! Ross -- View this message in context: http://r.789695.n4.nabble.com/Force-regression-line-to-a-1-1-relationship-tp3809733p3809733.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot - class "character" problem
I suspect it is to do with your method of creating the dataframe, I would check to see if the columns in the df are numeric, which you can do by: is.numeric(flat_data$time) for each variable, if it is not numeric (and at least one must be a character, given the error message) then redefine as a numeric flat_data$time<-as.numeric(flat_data$time) I reckon people better versed in R will have a more efficient solution, but that should work.. Ross -- View this message in context: http://r.789695.n4.nabble.com/ggplot-class-character-problem-tp3809657p3809786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force regression line to a 1:1 relationship
yes, that is correct. The idea being that I want to know the residuals of the data points compared to a 1:1 line (as shown in the plot), if that makes sense? I appreciate that this might not be considered a typical approach, and it would probably take a while to explain (defend) why I am doing it! -- View this message in context: http://r.789695.n4.nabble.com/Force-regression-line-to-a-1-1-relationship-tp3809733p3810045.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force regression line to a 1:1 relationship
Dear John, Thank you for that, and for explaining why the abline() command wont/dosen't work. The approach is based on reviewers comments that I am a tad sceptical about myself but yet curious enough to test their suggestion..I don't think it is very straightforward to explain; however, it involves using the residuals of the lm() and plotting them against a covariate to assess whether or not the deviation from the 1:1 relationship is in someway influenced by the other covariate. I hope that shines a small amount of light on this rather unorthodox approach?! Many thanks again for that John! Ross -- View this message in context: http://r.789695.n4.nabble.com/Force-regression-line-to-a-1-1-relationship-tp3809733p3810101.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force regression line to a 1:1 relationship
David & JC, Excellent point, of course it does - and of course that is (should have been) obvious!!! That is what I get for taking a reviewers comment/suggestion as gospel without applying a bit of thought! I'm off to go and kick myself. Cheers, Ross -- View this message in context: http://r.789695.n4.nabble.com/Force-regression-line-to-a-1-1-relationship-tp3809733p3810172.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force regression line to a 1:1 relationship
Many thanks to all of you! AV plots are what I am trying to plot. Perhaps to reduce confusion I can give you an example of what I am doing: I am looking at behaviour of re-sighted individuals over two time points. I use lm() on these data and obtain the residuals. Then I am interested to know whether an individuals' residual is related to a site fidelity mesure over the two time periods. Such that an individual that maintains a high degree of site fidelity shows less variation (has a smaller residual value) in (for example) aggressive behaviour. Therefore, using the absolute values of the residuals (as I am not interested in less or more aggressive) I plot these against the site fidelity measure to assess whether there is a correlation. The 1:1 relationship was to assess the deviation from 'absolute agreement', where the method above takes into consideration plasticity/noise between the two time periods. I hope this is a little clear and, although not a quote from the reviewer, this is essentially what was suggested. -- View this message in context: http://r.789695.n4.nabble.com/Force-regression-line-to-a-1-1-relationship-tp3809733p3815014.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.