Thank you David
Min. 1st Qu. Median Mean 3rd Qu. Max.
"1977-07-16" "1984-03-13" "1990-08-16" "1990-12-28" "1997-07-29" "2002-12-31"
WHP
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Thursday, July 12, 2018 11:29 AM
To: Bill Poling <bill.pol...@zelis.com>
Cc: r-help (r-help@r-project.org) <r-help@r-project.org>
Subject: Re: [R] Help with replace()
On Jul 12, 2018, at 8:17 AM, Bill Poling
<bill.pol...@zelis.com<mailto:bill.pol...@zelis.com>> wrote:
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
Hi.
I have data set with day month year integers. I am creating a date column from
those using lubridate.
a hundred or so rows failed to parse.
The problem is April and September have day = 31.
paste(df1$year, df1$month, df1$day, sep = "-")
ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed
to parse. As expected in tutorial
#The resulting Date vector can be added to df1 as a new column called date:
df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning
head(df1)
sapply(df1$date,class) #"date"
summary(df1$date)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31"
"129"
is_missing_date <- is.na(df1$date)
View(is_missing_date)
date_columns <- c("year", "month", "day")
missing_dates <- df1[is_missing_date, date_columns]
head(missing_dates)
# year month day
# 3144 2000 9 31
# 3817 2000 4 31
# 3818 2000 4 31
# 3819 2000 4 31
# 3820 2000 4 31
# 3856 2000 9 31
I am trying to replace those with 30.
Seems like a fairly straightforward application of "[<-" with a conditional
argument. (No need for tidyverse.)
missing_dates$day[ missing_dates$day==31 & ( missing_dates$month %in% c(4,9) )]
<- 30
missing_dates
year month day
3144 2000 9 30
3817 2000 4 30
3818 2000 4 30
3819 2000 4 30
3820 2000 4 30
3856 2000 9 30
Best;
David.
I am all over the map in Google looking for a fix, but haven't found one. I am
sure I have over complicated my attempts with ideas(below) from these and other
sites.
https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1>
https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace<https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace>
https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument<https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument>
The following are screwy attempts at this simple repair,
??mutate_if
??replace
is_missing_date <- is.na(df1$date)
View(is_missing_date)
date_columns <- c("year", "month", "day")
missing_dates <- df1[is_missing_date, date_columns]
head(missing_dates)
#year month day
# 3144 2000 9 31
# 3817 2000 4 31
# 3818 2000 4 31
# 3819 2000 4 31
# 3820 2000 4 31
# 3856 2000 9 31
#So need those months with 30 days that are 31 to be 30
View(missing_dates)
install.packages("dplyr")
library(dplyr)
View(missing_dates)
# ..those were the values you're going to replace
I thought this function from stackover would work, but get error when I try to
add filter
#https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1<https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1>
df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
.data_Frame[, .search_Columns] <- ifelse(.data_Frame[,
.search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[,
.search_Columns]
return(.data_Frame)
}
df.Rep(missing_dates, 3, 31, 30)
#--So I should be able to apply this to the complete df1 data somehow?
head(df1)
df.Rep(df1, filter(month == c(4,9)), 31, 30)
#Error in month == c(4, 9) : comparison (1) is possible only for atomic and
list types
Other screwy attempts:
select(df1, month, day, year)
str(df1)
#'data.frame': 34786 obs. of 14 variables:
#To choose rows, use filter():
#mutate_if(df1, month =4,9), day = 30)
filter(df1, month == c(4,9), day == 31)
df1 %>%
group_by(month == c(4,9), day == 31) %>%
tally()
# 1 FALSE FALSE 31161
# 2 FALSE TRUE 576
# 3 TRUE FALSE 2981
# 4 TRUE TRUE 68
df1 %>%
mutate(day=replace(day, month == c(4,9), 30)) %>%
as.data.frame()
View(as.list(df1, month == 4))
View(df1, month == c(4,9), day == 31)
df1 %>%
group_by(month == c(4,9), day == 31) %>%
tally()
View(df1, month == c(4,9))
# df1 %>%
# group_by(month == c(4,9), day == 30) %>%
I know there is a simple solution and it is driving me mad that it eludes me,
despite being new to R.
Thank you for any advice.
WHP
Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's
Corollary to Clarke's Third Law
Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}}
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.