Hi Bert, Below the information you asked me for:
nrow(mydataset) [1] 2986276 ######## sapply(mydataset, "class") $`Transit Date` [1] "POSIXct" "POSIXt" $`Market Segment` [1] "character" $`Número de Tránsitos` [1] "numeric" $`Tar No` [1] "character" $`Beam Range (Operations)` [1] "character" $`Operational Vessel Ranges Group` [1] "character" $`Rcnst PCUMS` [1] "numeric" $`Toll Amount` [1] "numeric" $Beam [1] "numeric" $Length [1] "numeric" $`Trn Draft (FT)` [1] "numeric" $`Other Income Amt` [1] "numeric" $`Total Other Income Amount` [1] "logical" $`Booking Charges` [1] "numeric" $`Booking Cancellation` [1] "logical" $`Booking Auction` [1] "logical" $`_file` [1] "integer" Hope this helps you understand what I am dealling with. Cheers, Paul El mar, 30 ene 2024 a las 14:19, Bert Gunter (<bgunter.4...@gmail.com>) escribió: > Incidentally, "didn't work" is not very useful information. Please tell us > exactly what error message or apparently aberrant result you received. > Also, what do you get from: > > sapply(your_dataframe, "class") > nrow(your_dataframe) > > (as I suspect what you think it is, isn't). > > Cheers, > Bert > > On Tue, Jan 30, 2024 at 11:01 AM Bert Gunter <bgunter.4...@gmail.com> > wrote: > >> "I cannot change the data type from >> boolean to numeric. I tried doing dataset$my_field = >> as.numeric(dataset$my_field), I also tried to do dataset <- >> dataset[complete.cases(dataset), ], didn't work either. " >> >> Sorry, but all I can say is: huh? >> >> > dt <- data.frame(a = c(NA,NA, FALSE, TRUE), b = 1:4) >> > dt >> a b >> 1 NA 1 >> 2 NA 2 >> 3 FALSE 3 >> 4 TRUE 4 >> > sapply(dt, class) >> a b >> "logical" "integer" >> > dt$a <- as.numeric(dt$a) >> > dt >> a b >> 1 NA 1 >> 2 NA 2 >> 3 0 3 >> 4 1 4 >> > sapply(dt, class) >> a b >> "numeric" "integer" >> >> So either I'm missing something or you are. Happy to be corrected and >> chastised if the former. >> >> Cheers, >> Bert >> >> >> On Tue, Jan 30, 2024 at 10:41 AM Paul Bernal <paulberna...@gmail.com> >> wrote: >> >>> Dear friend Duncan, >>> >>> Thank you so much for your kind reply. Yes, that is exactly what is >>> happening, there are a lot of NA values at the start, so R assumes that >>> the >>> field is of type boolean. The challenge that I am facing is that I want >>> to >>> read into R an Excel file that has many sheets (46 in this case) but I >>> wanted to combine all 46 sheets into a single dataframe (since the >>> columns >>> are exactly the same for all 46 sheets). The rio package does this >>> nicely, >>> the problem is that, once I have the full dataframe (which amounts to >>> roughly 2.98 million rows total), I cannot change the data type from >>> boolean to numeric. I tried doing dataset$my_field = >>> as.numeric(dataset$my_field), I also tried to do dataset <- >>> dataset[complete.cases(dataset), ], didn't work either. >>> >>> The only thing that worked for me was to take a single sheed and through >>> the read_excel function use the guess_max parameter and set it to a >>> sufficiently large number (a number >= to the total amount of the full >>> merged dataset). I want to automate the merging of the N number of Excel >>> sheets so that I don't have to be manually doing it. Unless there is a >>> way >>> to accomplish something similar to what rio's package function >>> import_list >>> does, that is able to keep the field's numeric data type nature. >>> >>> Cheers, >>> Paul >>> >>> El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (< >>> murdoch.dun...@gmail.com>) >>> escribió: >>> >>> > On 30/01/2024 11:10 a.m., Paul Bernal wrote: >>> > > Dear friends, >>> > > >>> > > Hope you are doing well. I am currently using R version 4.3.2, and I >>> > have a >>> > > .xlsx file that has 46 sheets on it. I basically combined all 46 >>> sheets >>> > > and read them as a single dataframe in R using package rio. >>> > > >>> > > I read a solution using package readlx, as suggested in a >>> StackOverflow >>> > > discussion as follows: >>> > > df <- read_excel(path = filepath, sheet = sheet_name, guess_max = >>> > 100000). >>> > > Now, when you have so many sheets (46 in my case) in an Excel file, >>> the >>> > rio >>> > > methodology is more practical. >>> > > >>> > > This is what I did: >>> > > path = >>> > > >>> > >>> "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI >>> > > (4).xlsx" >>> > > figidat = import_list(path, rbind = TRUE) #here figidat refers to my >>> > dataset >>> > > >>> > > Now, it successfully imports and merges all records, however, some >>> fields >>> > > (despite being numeric), R interprets as a boolean field. >>> > > >>> > > Here is the structure of the field that is causing me problems (I >>> > apologize >>> > > for the length): >>> > > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >>> > ... >>> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA, >>> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, >>> > > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame" >>> > > ), row.names = c(NA, -7033L)) >>> > > >>> > > As you can see, when I do the dput, it gives me a bunch of TRUE and >>> FALSE >>> > > values, when in reality I have records with value $0, records with >>> > amounts >>> > >> $0 and also a bunch of blank records. >>> > > >>> > > Any help will be greatly appreciated. >>> > >>> > I don't know how read_excel() determines column types, but some >>> > functions look only at the first n rows to guess the type. It appears >>> > you have a lot of NA values at the start. That is a logical value, so >>> > that might be what is going wrong. >>> > >>> > In read.table() and related functions, you can specify the types of >>> > column explicitly. It sounds as though that's what you should do if >>> > read_excel() offers that as a possibility. >>> > >>> > Duncan Murdoch >>> > >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.