[R] Missing Data Imputation for Complex Survey Data
Dear all, I've got a bit of a challenge on my hands. I've got survey data produced by a government agency for which I want to use the person-weights in my analyses. This is best accomplished by specifying weights in {survey} and then calculating descriptive statistics/models through functions in that package. However, there is also missingness in this data that I'd like to handle with imputation via {mi}. To properly use imputed datasets in regression, they need to be pooled using the lm.mi function in {mi}. However, I can't figure out how to carry out a regression on data that is properly weighted that has also had its missing values imputed, because both packages use their own mutually incompatible data objects. Does anyone have any thoughts on this? I've done a lot of reading and I'm not really seeing anything on point. Thanks in advance! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple imputed files
Hi, I think you want the {mitools} package. http://cran.r-project.org/web/packages/mitools/mitools.pdf. Anthony Damico's site, asdfree.com, has a lot of good code examples using various government datasets. Nate On Mon, Jan 26, 2015 at 5:23 AM, hnlki wrote: > Dear, > > My dataset consists out of 5 imputed files (that I did not imputed myself). > Is was wondering what is the best way to analyse them in R. I am aware that > packages to perform multiple imputation (like Mice & Amelia) exist, but > they > are used to perform MI. As my data is already imputed, I would like to know > how I can split it and how I should obtain pooled regression results. If I > can use the existing MI packages, how should I define my imputation > variable? > > Kind regards, > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/multiple-imputed-files-tp4702289.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assigning categorical values to dates
Hi all, If I have a tibble as follows: tibble(dates = c(rep("2021-07-04", 2), rep("2021-07-25", 3), rep("2021-07-18", 4))) how in the world do I add a column that evaluates each of those dates and assigns it a categorical value such that datescycle 2021-07-04 1 2021-07-04 1 2021-07-25 3 2021-07-25 3 2021-07-25 3 2021-07-18 2 2021-07-18 2 2021-07-18 2 2021-07-18 2 Not to further complicate matters, but some months I may only have one date, and some months I will have 4 dates - so thats not a fixed quantity. We've literally been doing this by hand at my job and I'd like to automate it. Thanks in advance! Nate Parsons [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning categorical values to dates
I am not averse to a factor-based solution, but I would still have to manually enter that factor each month, correct? If possible, I’d just like to point R at that column and have it do the work. — Nathan Parsons, B.SC, M.Sc, G.C. Ph.D. Candidate, Dept. of Sociology, Portland State University Adjunct Professor, Dept. of Sociology, Washington State University Graduate Advocate, American Association of University Professors (OR) Recent work (https://www.researchgate.net/profile/Nathan_Parsons3/publications) Schedule an appointment (https://calendly.com/nate-parsons) > On Wednesday, Jul 21, 2021 at 8:30 PM, Tom Woolman (mailto:twool...@ontargettek.com)> wrote: > > Couldn't you convert the date columns to character type data in a data > frame, and then convert those strings to factors in a 2nd step? > > The only downside I think to treating dates as factor levels is that > you might have an awful lot of factors if you have a large enough > dataset. > > > > Quoting "N. F. Parsons" : > > > Hi all, > > > > If I have a tibble as follows: > > > > tibble(dates = c(rep("2021-07-04", 2), rep("2021-07-25", 3), > > rep("2021-07-18", 4))) > > > > how in the world do I add a column that evaluates each of those dates and > > assigns it a categorical value such that > > > > dates cycle > > > > 2021-07-04 1 > > 2021-07-04 1 > > 2021-07-25 3 > > 2021-07-25 3 > > 2021-07-25 3 > > 2021-07-18 2 > > 2021-07-18 2 > > 2021-07-18 2 > > 2021-07-18 2 > > > > Not to further complicate matters, but some months I may only have one > > date, and some months I will have 4 dates - so thats not a fixed quantity. > > We've literally been doing this by hand at my job and I'd like to automate > > it. > > > > Thanks in advance! > > > > Nate Parsons > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [EXT] Re: Assigning categorical values to dates
@Tom Okay, yeah. That might actually be an elegant solution. I will mess around with it. Thank you - I’m not in the habit of using factors and am not super familiar with how they automatically sort themselves. @Andrew Yes. Each month is a different 30,000 row file upon which this task must be performed. @Bert If you’re not interested in being helpful, why comment? Am I interupting your clubhouse time? I’m legitimately stumped by this one and reaching out in earnest. “You’ve been told how to do it” Seriously? We all have different backgrounds and knowledge levels with the entire atlas of the wonderful world of R and I neither need or want your opinion on my corner of it. Don’t be a Hooke. I’m not here to impress or inspire confidence in you - I’m here with a question that has had me spinning my wheels for the better part of a day and need fresh perspectives. Your response certainly inspires no confidence in me as to the nature of your character or your knowledge on the topic. Best regards all, — Nathan Parsons, B.SC, M.Sc, G.C. Ph.D. Candidate, Dept. of Sociology, Portland State University Adjunct Professor, Dept. of Sociology, Washington State University Graduate Advocate, American Association of University Professors (OR) Recent work (https://www.researchgate.net/profile/Nathan_Parsons3/publications) Schedule an appointment (https://calendly.com/nate-parsons) > On Wednesday, Jul 21, 2021 at 9:12 PM, Andrew Robinson (mailto:a...@unimelb.edu.au)> wrote: > I wonder if you mean that you want the levels of the factor to reset within > each month? That is not obvious from your example, but implied by your > question. > > Andrew > > > -- > Andrew Robinson > Director, CEBRA and Professor of Biosecurity, > School/s of BioSciences and Mathematics & Statistics > University of Melbourne, VIC 3010 Australia > Tel: (+61) 0403 138 955 > Email: a...@unimelb.edu.au > Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/ > > I acknowledge the Traditional Owners of the land I inhabit, and pay my > respects to their Elders. > > > > > > On 22 Jul 2021, 1:47 PM +1000, N. F. Parsons , > wrote: > > External email: Please exercise caution > > > > I am not averse to a factor-based solution, but I would still have to > > manually enter that factor each month, correct? If possible, I’d just like > > to point R at that column and have it do the work. > > > > — > > Nathan Parsons, B.SC, M.Sc, G.C. > > > > Ph.D. Candidate, Dept. of Sociology, Portland State University > > Adjunct Professor, Dept. of Sociology, Washington State University > > Graduate Advocate, American Association of University Professors (OR) > > > > Recent work > > (https://www.researchgate.net/profile/Nathan_Parsons3/publications) > > Schedule an appointment (https://calendly.com/nate-parsons) > > > > > On Wednesday, Jul 21, 2021 at 8:30 PM, Tom Woolman > > > mailto:twool...@ontargettek.com)> wrote: > > > > > > Couldn't you convert the date columns to character type data in a data > > > frame, and then convert those strings to factors in a 2nd step? > > > > > > The only downside I think to treating dates as factor levels is that > > > you might have an awful lot of factors if you have a large enough > > > dataset. > > > > > > > > > > > > Quoting "N. F. Parsons" : > > > > > > > Hi all, > > > > > > > > If I have a tibble as follows: > > > > > > > > tibble(dates = c(rep("2021-07-04", 2), rep("2021-07-25", 3), > > > > rep("2021-07-18", 4))) > > > > > > > > how in the world do I add a column that evaluates each of those dates > > > > and > > > > assigns it a categorical value such that > > > > > > > > dates cycle > > > > > > > > 2021-07-04 1 > > > > 2021-07-04 1 > > > > 2021-07-25 3 > > > > 2021-07-25 3 > > > > 2021-07-25 3 > > > > 2021-07-18 2 > > > > 2021-07-18 2 > > > > 2021-07-18 2 > > > > 2021-07-18 2 > > > > > > > > Not to further complicate matters, but some months I may only have one > > > > date, and some months I will have 4 dates - so thats not a fixed > > > > quantity. > > > > We've literally been doing this by hand at my job and I'd like to > > > > automate > > > > it. > > > > > > > > Thanks in advance! > > > > > > > > Nate Parsons > > > > > > &g
Re: [R] Assigning categorical values to dates
I had no idea that ‘cur_group_id()’ existed!?!! Will definitely try that. Thank you!!! — Nathan Parsons, B.SC, M.Sc, G.C. Ph.D. Candidate, Dept. of Sociology, Portland State University Adjunct Professor, Dept. of Sociology, Washington State University Graduate Advocate, American Association of University Professors (OR) Recent work (https://www.researchgate.net/profile/Nathan_Parsons3/publications) Schedule an appointment (https://calendly.com/nate-parsons) > On Wednesday, Jul 21, 2021 at 11:54 PM, Rui Barradas (mailto:ruipbarra...@sapo.pt)> wrote: > Hello, > > Here are 3 solutions, one of them the coercion to factor one. > Since you are using tibbles, I assume you also want a dplyr solution. > > > library(dplyr) > > df1 <- tibble(dates = c(rep("2021-07-04", 2), > rep("2021-07-25", 3), > rep("2021-07-18", 4))) > > # base R > as.integer(factor(df1$dates)) > match(df1$dates, unique(sort(df1$dates))) > > # dplyr > df1 %>% group_by(dates) %>% mutate(cycle = cur_group_id()) > > > My favorite is by far the 1st but that's a matter of opinion. > > > Hope this helps, > > Rui Barradas > > > Às 04:46 de 22/07/21, N. F. Parsons escreveu: > > I am not averse to a factor-based solution, but I would still have to > > manually enter that factor each month, correct? If possible, I’d just like > > to point R at that column and have it do the work. > > > > — > > Nathan Parsons, B.SC, M.Sc, G.C. > > > > Ph.D. Candidate, Dept. of Sociology, Portland State University > > Adjunct Professor, Dept. of Sociology, Washington State University > > Graduate Advocate, American Association of University Professors (OR) > > > > Recent work > > (https://www.researchgate.net/profile/Nathan_Parsons3/publications) > > Schedule an appointment (https://calendly.com/nate-parsons) > > > > > On Wednesday, Jul 21, 2021 at 8:30 PM, Tom Woolman > > > mailto:twool...@ontargettek.com)> wrote: > > > > > > Couldn't you convert the date columns to character type data in a data > > > frame, and then convert those strings to factors in a 2nd step? > > > > > > The only downside I think to treating dates as factor levels is that > > > you might have an awful lot of factors if you have a large enough > > > dataset. > > > > > > > > > > > > Quoting "N. F. Parsons" : > > > > > > > Hi all, > > > > > > > > If I have a tibble as follows: > > > > > > > > tibble(dates = c(rep("2021-07-04", 2), rep("2021-07-25", 3), > > > > rep("2021-07-18", 4))) > > > > > > > > how in the world do I add a column that evaluates each of those dates > > > > and > > > > assigns it a categorical value such that > > > > > > > > dates cycle > > > > > > > > 2021-07-04 1 > > > > 2021-07-04 1 > > > > 2021-07-25 3 > > > > 2021-07-25 3 > > > > 2021-07-25 3 > > > > 2021-07-18 2 > > > > 2021-07-18 2 > > > > 2021-07-18 2 > > > > 2021-07-18 2 > > > > > > > > Not to further complicate matters, but some months I may only have one > > > > date, and some months I will have 4 dates - so thats not a fixed > > > > quantity. > > > > We've literally been doing this by hand at my job and I'd like to > > > > automate > > > > it. > > > > > > > > Thanks in advance! > > > > > > > > Nate Parsons > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > __ > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning categorical values to dates
Thank you all so much for your time and your help! I am truly grateful for the suggested solutions, but more importantly, for the lessons! Nate Parsons On Thu, Jul 22, 2021 at 4:13 AM Eric Berger wrote: > While the base R solution using 'factor' appears to win based on elegance, > chapeau to the creativity of the other suggestions. > For those who are not aware, R 4.1.0 introduced two features: (1) native > pipe |> and (2) new shorter syntax for anonymous functions. > Erich's suggestion used the native pipe and Rui went with the spirit and > added an anonymous function using the new syntax. > > Everyone has their preferred coding style. I tend to prefer fewer lines of > code (if there is no cost in understanding). > I think the new anonymous function syntax helps in this regard and I see > no reason to use piping if not necessary. > So here is a modified, one-line version of Rui's last suggestion (sans the > amazing observation about handling interactions). > > mutate(date_df, cycle=(\(ranks) match(dates, > ranks))(sort(unique(dates > > Eric > > > > > On Thu, Jul 22, 2021 at 11:11 AM Uwe Ligges < > lig...@statistik.tu-dortmund.de> wrote: > >> For a data.frame d, I'd simply do >> >> d$cycle <- factor(d$dates, labels=1:3) >> >> but I have not idea about tibbles. >> >> >> Best, >> Uwe Ligges >> >> >> On 22.07.2021 05:12, N. F. Parsons wrote: >> > Hi all, >> > >> > If I have a tibble as follows: >> > >> > tibble(dates = c(rep("2021-07-04", 2), rep("2021-07-25", 3), >> > rep("2021-07-18", 4))) >> > >> > how in the world do I add a column that evaluates each of those dates >> and >> > assigns it a categorical value such that >> > >> > datescycle >> > >> > 2021-07-04 1 >> > 2021-07-04 1 >> > 2021-07-25 3 >> > 2021-07-25 3 >> > 2021-07-25 3 >> > 2021-07-18 2 >> > 2021-07-18 2 >> > 2021-07-18 2 >> > 2021-07-18 2 >> > >> > Not to further complicate matters, but some months I may only have one >> > date, and some months I will have 4 dates - so thats not a fixed >> quantity. >> > We've literally been doing this by hand at my job and I'd like to >> automate >> > it. >> > >> > Thanks in advance! >> > >> > Nate Parsons >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.