Dear friend Duncan, Thank you so much for your kind reply. Yes, that is exactly what is happening, there are a lot of NA values at the start, so R assumes that the field is of type boolean. The challenge that I am facing is that I want to read into R an Excel file that has many sheets (46 in this case) but I wanted to combine all 46 sheets into a single dataframe (since the columns are exactly the same for all 46 sheets). The rio package does this nicely, the problem is that, once I have the full dataframe (which amounts to roughly 2.98 million rows total), I cannot change the data type from boolean to numeric. I tried doing dataset$my_field = as.numeric(dataset$my_field), I also tried to do dataset <- dataset[complete.cases(dataset), ], didn't work either.
The only thing that worked for me was to take a single sheed and through the read_excel function use the guess_max parameter and set it to a sufficiently large number (a number >= to the total amount of the full merged dataset). I want to automate the merging of the N number of Excel sheets so that I don't have to be manually doing it. Unless there is a way to accomplish something similar to what rio's package function import_list does, that is able to keep the field's numeric data type nature. Cheers, Paul El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (<murdoch.dun...@gmail.com>) escribió: > On 30/01/2024 11:10 a.m., Paul Bernal wrote: > > Dear friends, > > > > Hope you are doing well. I am currently using R version 4.3.2, and I > have a > > .xlsx file that has 46 sheets on it. I basically combined all 46 sheets > > and read them as a single dataframe in R using package rio. > > > > I read a solution using package readlx, as suggested in a StackOverflow > > discussion as follows: > > df <- read_excel(path = filepath, sheet = sheet_name, guess_max = > 100000). > > Now, when you have so many sheets (46 in my case) in an Excel file, the > rio > > methodology is more practical. > > > > This is what I did: > > path = > > > "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI > > (4).xlsx" > > figidat = import_list(path, rbind = TRUE) #here figidat refers to my > dataset > > > > Now, it successfully imports and merges all records, however, some fields > > (despite being numeric), R interprets as a boolean field. > > > > Here is the structure of the field that is causing me problems (I > apologize > > for the length): > > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > ... > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA, > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, > > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame" > > ), row.names = c(NA, -7033L)) > > > > As you can see, when I do the dput, it gives me a bunch of TRUE and FALSE > > values, when in reality I have records with value $0, records with > amounts > >> $0 and also a bunch of blank records. > > > > Any help will be greatly appreciated. > > I don't know how read_excel() determines column types, but some > functions look only at the first n rows to guess the type. It appears > you have a lot of NA values at the start. That is a logical value, so > that might be what is going wrong. > > In read.table() and related functions, you can specify the types of > column explicitly. It sounds as though that's what you should do if > read_excel() offers that as a possibility. > > Duncan Murdoch > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.