On 07/18/2020 11:42 PM, Rasmus Liland wrote: > On 2020-07-18 18:09 +0100, Rui Barradas wrote: > | �s 17:59 de 18/07/2020, H escreveu: > | | On Fri, Jul 17, 2020 at 6:28 PM H <age...@meddatainc.com> wrote: > | | | > | | | The problem I am having is that > | | | the csv files have header rows > | | | with column names that are > | | | slightly different from the column > | | | names I have assigned in the > | | | dataframe and it seems that when I > | | | read the csv data into the > | | | dataframe, the column names from > | | | the csv file replace the column > | | | names I chose when creating the > | | | dataframe. > | | | > | | | A secondary issue is that the csv > | | | files have a column with a date in > | | | mm/dd/yyyy format that I would > | | | like to make into a Date type > | | | column in my dataframe. Again, I > | | | have been unable to find a way - > | | | if at all possible - to force a > | | | conversion into a Date format when > | | | importing into the dataframe. The > | | | best I have so far is to import is > | | | a character column and then use > | | | as.Date() to later force the > | | | conversion of the dataframe > | | | column. > | | > | | The documentation for read.csv.sql() > | | suggests that colClasses() and/or > | | field.types() should work but I may > | | well have misunderstood the > | | documentation, hence my question in > | | this group. > | > | As for colClasses, those are R class > | names. > > Ok Mister H, I might have hit the nail > on the head this time with this badass > example for your usecase: > > # Make a csv with %d/%m/%Y dates in it ... > Lines <- "STM05-1 2005/02/28 17:35 Good -35.562 177.158 > STM05-1 2005/02/28 19:44 Good -35.487 177.129 > STM05-1 2005/02/28 23:01 Unknown -35.399 177.064 > STM05-1 2005/03/01 07:28 Unknown -34.978 177.268 > STM05-1 2005/03/01 18:06 Poor -34.799 177.027 > STM05-1 2005/03/01 18:47 Poor -34.85 177.059 > STM05-2 2005/02/28 12:49 Good -35.928 177.328 > STM05-2 2005/02/28 21:23 Poor -35.926 177.314 > " > DF <- read.table(textConnection(Lines), as.is = TRUE, > col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long")) > DF$Date <- format(as.Date(DF$Date, "%Y/%m/%d"), "%d/%m/%Y") > write.csv(DF, file="df.csv", row.names=FALSE) > > colClasses <- > c("character", > "Date", > "character", > "character", > "numeric", > "numeric") > sql <- paste0( > "select ", > "date(", # [2] > "substr(Date, 8, 4) || '-' || ", # [1] > "substr(Date, 5, 2) || '-' || ", > "substr(Date, 2, 2)), Long, Lat, Quality ", > "from ff where Quality like '%oo%' and Long>177.129") > ff <- file(description="df.csv", open="r") > dat <- sqldf::read.csv.sql( > sql=sql, colClasses=colClasses) > close(ff) > > str(dat) > > as.Date(dat[,1]) > dat[,3] > > Both sqlite and Postgres has a function > substr you can call on strings like > this.[5] I have a hunch this has always > been possible in sql from way back ... > > The warning from sqldf about unused > connections, might suggest file > descriptor handling to be a bit crusty > ... [3] > > The thing is, defining the second column > as of type Date in colClasses happens to > work, but it's still character when you > check with str(dat) ... perhaps it has > something to do with this info from [4]: > > as_tibble_row() converts a vector to > a tibble with one row. The input > must be a bare vector, e.g. vectors > of dates are not supported yet. If > the input is a list, all elements > must have length one. > > [1] > https://stackoverflow.com/questions/15563656/convert-string-to-date-in-sqlite > [2] https://www.sqlite.org/lang_datefunc.html > [3] https://groups.google.com/forum/#!topic/sqldf/mcQ_K_E--q8 > [4] https://tibble.tidyverse.org/reference/as_tibble.html > [5] https://www.sqlite.org/lang_corefunc.html#substr, > https://www.postgresql.org/docs/9.1/functions-string.html, > http://www.h2database.com/html/functions.html#substring > > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you for your extensive example. However, I have decided to simply convert column types as necessary and rename columns as desired after importing the data since that seems the simplest solution. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.