The data set did not show up. The R-help list tends to strip out most file types as a safety precaution. Try renaming the file from xxx.csv to xxx.txt and it should come through alright.
John Kane Kingston ON Canada > -----Original Message----- > From: kwa...@kemri-wellcome.org > Sent: Sun, 3 Jul 2016 09:39:59 +0000 > To: jdnew...@dcn.davis.ca.us, r-help@r-project.org > Subject: Re: [R] R - Populate Another Variable Based on Multiple > Conditions | For a Large Dataset > > Hi Jeff, pardon me, I was surely not making it easy. I hope this time I > will ☺ > > Attached is snippet of the dataset in csv format and below is the > R.script I have managed so far. > > ----------------------------------------------------------------------------------------------------------------------------------------------- > ----------------------------------------------------------------------------------------------------------------------------------------------- > > drug_study <- read.csv("drug_study.csv", header = T); head(drug_study) > drug_study$date <- as.Date(drug_study$date, "%m/%d/%Y") > drug_study$study_id <- "" #create new column > > individual <- unique (drug_study$ID) #vector of individuals > datalength <- dim(drug_study)[1] #number of rows in dataframe > > for (i in 1:length(individual)) { > for (j in 1:datalength) { > start_admin <- drug_study[c(drug_study$ID == individual[i] & > drug_study$year == 2007 & drug_study$drug_admin == "Y" & drug_study$month > == 5),2] #capture date of start > end_admin <- drug_study[(drug_study$ID == individual[i] & > drug_study$year == 2008 & drug_study$drug_admin == "Y" & drug_study$month > == 2),2] #capture date of end > > if(drug_study[j,1] == individual[i] & drug_study[j,2] >= start_admin > & drug_study[j,2] < end_admin) { > drug_study[j,6] <- paste(start_admin) #populate respective row if > condition is met > } > } > } > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > For this dataset, there exists three individuals, J1/3, R1/3, R10/1. > > The script works for the last two individuals but not J1/3 with the error > below: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Error in if (drug_study[j, 1] == individual[i] & drug_study[j, 2] >= > start_admin & : > argument is of length zero > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > I figured it’s because this individuals start_admin and end_admin dates > aren’t captured because the if-loop fails. There’s my first problem, > there are thousands of individuals with varying > start_admin and end_admin dates and I need a script to capture these for > every individual. > > Secondly, the above script is taking almost an hour to run for the entire > dataset, just for the individuals whose start_admin and end_admin dates > can be captured by the if-loop. > > I need help in coming up with a script that will tackle the problem > taking into account the different start_admin and end_admin dates and be > resourceful with regards to time. > > Regards > ------------------------------------------------------------------------------- > Kevin Kariuki > > ############################################################################################################################################### > ############################################################################################################################################### > > On 7/3/16, 8:42 AM, "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> wrote: > > You are making this hard on yourself by not paying attention the Posting > Guide listed in the footer of every email on this list. You would > probably also find [1] helpful also. > > [1] > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > -- > Sent from my phone. Please excuse my brevity. > > On July 2, 2016 3:41:07 PM PDT, Kevin Wamae <kwa...@kemri-wellcome.org> > wrote: > >Hi Jeff, sorry for referring to you as Jennifer earlier, accept my > >apologies. >> > >I attached a sample dataset in the question, am afraid it must have > >failed to attach. >> > >I have attached it again.. >> >> > >Regards > >------------------------------------------------------------------------------- > >Kevin Kariuki >> >> > >On 7/2/16, 7:37 PM, "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> wrote: >> > >I can understand you not wanting to supply your actual data online, but > >only you know what your data looks like so only you can create a > >simulated data set that we could show you how to work with. > >-- > >Sent from my phone. Please excuse my brevity. >> > >On July 2, 2016 2:57:39 AM PDT, Kevin Wamae <kwa...@kemri-wellcome.org> > >wrote: > >>I have a drug-trial study dataset (attached image). >>> > >>Since its a large and complex dataset (at least to me) and I hope to > >be > >>as clear as possible with my question. > >>The dataset is from a study where individuals are given drugs and > >>followed up over a period spanning two consecutive years. Individuals > >>do not start treatment on the same day and once they start, the > >>variable "drug-admin" is marked "x" as well as the time they stop > >>treatment in the following year. > >>There exists another variable, "study_id", that I hope to populate as > >>can be seen in the dataset, with the following conditions: >>> > >>For every individual > >>• if the individual has entries that show they received drugs both > >>on the start and end date (marked with the "x") > >>• if the start of drug administration falls in month == 2 | 3 and > >>end of administration falls in month == 2 | 4 > >>• then, using the date that marks the start of drug administration, > >>populate the variable _"study_id"_ in all the rows that fall within > >the > >>timeframe that the individual was given drugs but excluding the end of > >>drug administration. > >>I have tried my level best and while I have explored several examples > >>online, I haven't managed to solve this. The dataset contains close to > >>6000 individuals spanning 10 years and my best bet was to use a loop > >>which keeps crushing R after running for close to 30min. I have also > >>read that dplyr may do the job but my attempts have been in vain. >>> > >>sample code > >>------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >>individual <- unique (df$ID) #vector of individuals > >>datalength <- dim(df)[1] #number of rows in dataframe >>> > >>for (i in 1:length(individual)) { >>> for (j in 1:datalength) { > >>start_admin <- df[(df$year == 2007] & df$drug_admin == "x" & > >c(df$month > >>== 2 | df$month == 3),1] #capture date of start > >>end_admin <- df[(df$year == 2008] & df$drug_admin == "x" & c(df$month > >>== 2 | df$month == 4),1] #capture date of end >>> > >>if(df[datalength,1] == individual(i) & df[datalength,2] >= start_admin > >>& df[datalength,2] < end_admin) { > >>df[datalength,6] <- start_admin #populate respective row if condition > >>is met >>> } >>> } >>> } >>> > >>------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> > >>Above is the code that keeps failing.. >>> > >>Any help is highly appreciated.... >>> >>> > >>______________________________________________________________________ >>> > >>This e-mail contains information which is confidential. It is intended > >>only for the use of the named recipient. If you have received this > >>e-mail in error, please let us know by replying to the sender, and > >>immediately delete it from your system. Please note, that in these > >>circumstances, the use, disclosure, distribution or copying of this > >>information is strictly prohibited. KEMRI-Wellcome Trust Programme > >>cannot accept any responsibility for the accuracy or completeness of > >>this message as it has been transmitted over a public network. > >Although > >>the Programme has taken reasonable precautions to ensure no viruses > >are > >>present in emails, it cannot accept responsibility for any loss or > >>damage arising from the use of the email or attachments. Any views > >>expressed in this message are those of the individual sender, except > >>where the sender specifically states them to be the views of > >>KEMRI-Wellcome Trust Programme. > >>______________________________________________________________________ >>> >>> > >>------------------------------------------------------------------------ >>> > >>______________________________________________ > >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>https://stat.ethz.ch/mailman/listinfo/r-help > >>PLEASE do read the posting guide > >>http://www.R-project.org/posting-guide.html > >>and provide commented, minimal, self-contained, reproducible code. >> >> >> >> > >______________________________________________________________________ >> > >This e-mail contains information which is confidential. It is intended > >only for the use of the named recipient. If you have received this > >e-mail in error, please let us know by replying to the sender, and > >immediately delete it from your system. Please note, that in these > >circumstances, the use, disclosure, distribution or copying of this > >information is strictly prohibited. KEMRI-Wellcome Trust Programme > >cannot accept any responsibility for the accuracy or completeness of > >this message as it has been transmitted over a public network. Although > >the Programme has taken reasonable precautions to ensure no viruses are > >present in emails, it cannot accept responsibility for any loss or > >damage arising from the use of the email or attachments. Any views > >expressed in this message are those of the individual sender, except > >where the sender specifically states them to be the views of > >KEMRI-Wellcome Trust Programme. > >______________________________________________________________________ > > > > > ______________________________________________________________________ > > This e-mail contains information which is confidential. It is intended > only for the use of the named recipient. If you have received this e-mail > in error, please let us know by replying to the sender, and immediately > delete it from your system. Please note, that in these circumstances, > the use, disclosure, distribution or copying of this information is > strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any > responsibility for the accuracy or completeness of this message as it > has been transmitted over a public network. Although the Programme has > taken reasonable precautions to ensure no viruses are present in emails, > it cannot accept responsibility for any loss or damage arising from the > use of the email or attachments. Any views expressed in this message are > those of the individual sender, except where the sender specifically > states them to be the views of KEMRI-Wellcome Trust Programme. > ______________________________________________________________________ > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ____________________________________________________________ Can't remember your password? Do you need a strong and secure password? Use Password manager! It stores your passwords & protects your account. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.