Hi Dan, Hi All, many thanks for your help.
Please find enclosed my little function for your use: -- cut -- #------------------------------------------------------------------------------- # Module : t_count_na.R # Author : Georg Maubach # Date : 2016-05-24 # Update : 2016-05-25 # Description : Count NA's # Source System : R 3.2.2 (64 Bit) # Target System : R 3.2.2 (64 Bit) # License : CC-BY-SA-NC #--------1---------2---------3---------4---------5---------6---------7---------8 test <- FALSE t_count_na <- function(dataset, variables = "all") { # Counts the number of NA within given set of veriables # # Args: # dataset : Object with dimnames, e.g. data frame, data table. # variables: Character vector with variable names. # # Operation: # Adds the variable "na_count" to the given dataset containing the count of # NA's within the given variables # # Returns: # Original dataset with variable "na_count" added. # # Error handling: # None. # # Credits: # http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame # http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html version <- "2016-05-25" if (identical(variables, "all")) { variable_list <- names(dataset) } else { variable_list <- variables } dataset[["na_count"]] <- apply(dataset[,variable_list], 1, function(x) sum(is.na(x))) return(dataset) } #------------------------------------------------------------------------------- test <- function(do_test = FALSE) { cat("\n", "\n", "Test function t_count_na()", "\n", "\n") # Example dataset gene <- c("ENSG00000208234","ENSG00000199674","ENSG00000221622","ENSG00000207604", "ENSG00000207431","ENSG00000221312","ENSG00134940305","ENSG00394039490", "ENSG09943004048") hsap <- c(0,0,0, 0, 0, 0, 1,1, 1) mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA) mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 , NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA) ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam) ds_example$gene <- as.character(ds_example$gene) cat("\n", "\n", "Example dataset before function call", "\n", "\n") print(ds_example) cat("\n", "\n", "Function call", "\n", "\n") ds_example <- t_count_na(dataset = ds_example, variables = c("mmul", "mmus")) cat("\n", "\n", "Example dataset after function call", "\n", "\n") print(ds_example) } test(do_test = test) # EOF . -- cut -- Kind regards Georg Maubach Von: "Nordlund, Dan (DSHS/RDA)" <nord...@dshs.wa.gov> An: "r-help@r-project.org" <r-help@r-project.org>, Datum: 24.05.2016 21:41 Betreff: Re: [R] Creating a data frame from scratch Gesendet von: "R-help" <r-help-boun...@r-project.org> I would probably write the function something like this: t_count_na <- function(dataset, variables = "all") { if (identical(variables, "all")) { variable_list <- names(dataset) } else { variable_list <- variables } apply(dataset[,variable_list], 1, function(x) sum(is.na(x))) } Hope this is helpful, Dan Daniel Nordlund, PhD Research and Data Analysis Division Services & Enterprise Support Administration Washington State Department of Social and Health Services > -----Original Message----- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@gmx.de > Sent: Tuesday, May 24, 2016 11:55 AM > To: r-help@r-project.org > Subject: [R] Creating a data frame from scratch > > Hi All, > > I need to create a data frame from scratch and fill variables created on the fly > with values. What I have so far: > > -- schnipp -- > > # Example dataset > gene <- > c("ENSG00000208234","ENSG00000199674","ENSG00000221622","ENSG00000 > 207604", > > "ENSG00000207431","ENSG00000221312","ENSG00134940305","ENSG0039403 > 9490", > "ENSG09943004048") > hsap <- c(0,0,0, 0, 0, 0, 1,1, 1) > mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA) > mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 , > NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA) > > ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam) > ds_example$gene <- as.character(ds_example$gene) > > t_count_na <- function(dataset, > variables = "all") > # credit: http://stackoverflow.com/questions/4862178/remove-rows-with- > nas-in-data-frame > { > ds_na <- data.frame() > # if variables = "all" create character vector of variable names > if (variables == "all") { > variable_list <- dimnames(dataset)[[ 2 ]] > } > # if a character vector with variable names is given > # to run the function on a defined set of selected variables > else { > variable_list <- variables > } > > for (var in variable_list) { > new_name <- paste0("na_", var) > ds_na[[ new_name ]] <- as.data.frame(is.na(dataset[[ var ]])) > } > > ds_na[[ "na_count" ]] <- rowSums(ds_na) > return(ds_na) > } > > test <- t_count_na(dataset = ds_example, variables = c("mmul", "mmus")) > > -- schnipp -- > > gives: > > Error in `[[<-.data.frame`(`*tmp*`, new_name, value = > list(`is.na(dataset[[var]])` = c(TRUE, : > replacement has 9 rows, data has 0 In addition: Warning message: > In if (variables == "all") { : > the condition has length > 1 and only the first element will be used > > My goal is to create a dataset from scratch on the fly which has the same > amount of variables as the dataset ds_example plus a single variable storing > the amount of NA's in a row for the given variables. This is the basis for a > decious which cases to keep and which to drop. > > I do not want to alter the base dataset like ds_example in the first place nor > do I want to make a copy of the existing dataset due to memory allocation. > The function shall also work with big data, e. g. datasets with more than 1 GB > memory consumption. > > I also do not want the newly created variables to be stored in the original > data frame. They shall be separate. > > A former similar solution worked: > http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html > > Why doesn't this one? > > How do I create the variables within the data frame if the data frame is > empty? > > Kind regards > > Georg Maubach > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.