Re: [R] Archive format
Hi Joe, I have read your question with great interest. I am a little bit astonished to read about your project. There is a big national institute in Germany called GESIS (https://de.wikipedia.org/wiki/GESIS_%E2%80%93_Leibniz-Institut_f%C3%BCr_Sozialwissenschaften) which does the same job you are trying to set-up since 1986 now. You could try to exchange ideas with them. Your subject is very complex with regard to reproducible research. You might want to have a look at (1) https://cran.r-project.org/web/views/ReproducibleResearch.html (2) Gandrud, Christopher: Reproducible Research with R and R Studio (https://www.amazon.com/Reproducible-Research-Studio-Second-Chapman/dp/1498715370) Kind regards Georg > Gesendet: Mittwoch, 29. März 2017 um 10:44 Uhr > Von: "Joe Gain" > An: R-help@r-project.org > Cc: bwfdm-i...@lists.kit.edu > Betreff: [R] Archive format > > Hello, > > we are collecting information on the subject of research data management > in German on the webplatform: > > www.forschungsdaten.info > > One of the topics, which we are writing about, is how to *archive* data. > Unfortunately, none of us in the project is an expert with respect to R > and so I would like to ask the list, what they recommend? A related > question is to do with the sharing of data. We have already asked some > academics, who have basically replied that they don't really know other > than to strongly recommend a plain text format. > > We would also like to know, if members of the list recommend converting > formats from commercial software such as S-Plus, Terr, SPSS etc. to an > R-compatible format for long term archivation? Are there any general > rules and best practices, when it comes to archiving (and sharing) > statistical data and statistical programs? > > Any comments would be much appreciated! > Joe > > -- > B 1003 > Kommunikations-, Informations-, Medienzentrum (KIM) > Universitaet Konstanz > > t: ++49-7531-883234 > e: joe.g...@uni-konstanz.de > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Antwort: Re: Way to Plot Multiple Variables and Change Color
Hi Ulrik, many thanks for your reply. I had to take an unplanned break and was not in the office during the last two weeks. Thus my late reply. I followed your advice and converted the variable in argument "fill" to factor. Now the color change works: -- cut -- d_result <- structure(list("variable" = c("Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) "), value = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1 = very satisfied", "2", "3", "4", "5", "6 = very dissatified"), class = "factor"), n = c(14L, 20L, 24L, 14L, 16L, 14L, 9L, 15L, 21L, 20L, 14L, 23L, 19L, 17L, 16L, 14L, 16L, 20L, 22L, 17L, 15L, 16L, 20L, 12L, 19L, 15L, 16L, 15L, 18L, 19L, 18L, 15L, 18L, 18L, 16L, 17L, 17L, 20L, 17L, 17L, 14L, 16L, 16L, 25L, 16L, 17L, 8L, 20L)), .Names = c("variable", "value", "n"), row.names = c(NA, -48L), vars = list("variable"), drop = TRUE, indices = list(0:5, 6:11, 12:17, 18:23, 24:29, 30:35, 36:41, 42:47), group_sizes = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), biggest_group_size = 6L, labels = structure(list( "variable" = structure(1:8, .Label = c("Item 1 (ø = 3.3) ", "Item 2 (ø = 3.8) ", "Item 3 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 5 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 7 (ø = 3.4) ", "Item 8 (ø = 3.3) "), class = "factor")), row.names = c(NA, -8L), class = "data.frame", vars = list("variable"), drop = TRUE, .Names = "variable"), class = c("grouped_df", "tbl_df", "tbl", "data.frame")) ggplot( d_result, aes(x = variable, y = n, fill = rev(factor(value + geom_bar( stat = "identity") + coord_cartesian(ylim = c(0,100)) + coord_flip() + scale_y_continuous(name = "Percent") + scale_fill_manual( values = rev( c( "forestgreen", "limegreen", "gold", "orange1", "tomato3", "darkred"))) + ggtitle( paste( "Question 8: Satisfaction?")) + labs(fill = "Rating") + scale_x_discrete( name = element_blank()) + # scale_color_manual( # values = rev( # c( # "forestgreen", "limegreen", # "gold", "orange1", # "tomato3", "darkred"))) + geom_text( aes(label = n), color = "white", position = position_stack(vjust = 0.5)) + theme_minimal() + theme( legend.position = "right") -- cut -- I tried to change the order of the items on the y-axis, e.g. Item 8 should be last and Item 1 first. I tried to reverse the order of the items within ggplot using rev()
[R] Antwort: Re: Antwort: Re: Antwort: Re: Way to Plot Multiple Variables and Change Color (SOLVED)
Hi David, many thanks for your answer. I followed your suggesting and came up with the following code: -- cut -- ggplot( d_result, aes(x = variable, y = n, fill = value)) + geom_bar( stat = "identity") + coord_cartesian(ylim = c(0,100)) + coord_flip() + scale_y_continuous(name = "Percent") + scale_fill_manual( values = rev( c( "forestgreen", "limegreen", "gold", "orange1", "tomato3", "darkred"))) + ggtitle( paste( "Question 8: Some Text")) + labs(fill = "Rating") + scale_x_discrete( name = element_blank(), drop = FALSE) + # keep factor levels if no value exists geom_text( aes(label = n), color = "white", position = position_stack(vjust = 0.5)) + theme_minimal() + theme( legend.position = "right") + guides(fill = guide_legend(reverse = TRUE)) -- cut -- In addition to your suggestion I changed "fill = rev(factor(value))" to "fill = value" and I added guides(fill = guide_legend(reverse = TRUE)) to get the legend in the order from 1 .. 6 instead of 6 .. 1. In my data I added the counts (n) before the mean value in the labels of the left hand side. Now it looks to me as a version conforming to the ESOMAR and BVM standards. Many thanks again for your help. Kind regards Georg Von:David Winsemius An: g.maub...@weinwolf.de, Kopie: r-help@r-project.org Datum: 10.04.2017 22:21 Betreff:Re: [R] Antwort: Re: Antwort: Re: Way to Plot Multiple Variables and Change Color > On Apr 10, 2017, at 1:06 PM, David Winsemius wrote: > > >> On Apr 10, 2017, at 7:45 AM, g.maub...@weinwolf.de wrote: >> >> Hi Ulrik, >> >> many thanks for your reply. I had to take an unplanned break and was not >> in the office during the last two weeks. Thus my late reply. >> >> I followed your advice and converted the variable in argument "fill" to >> factor. Now the color change works: >> >> -- cut -- >> >> d_result <- structure(list("variable" = c("Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", >> "Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", "Item 1 (ø = 3.3) ", >> "Item 1 (ø = 3.3) ", "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", >> "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", "Item 2 (ø = 3.8) ", >> "Item 2 (ø = 3.8) ", "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", >> "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", "Item 3 (ø = 3.4) ", >> "Item 3 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", >> "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", "Item 4 (ø = 3.4) ", >> "Item 4 (ø = 3.4) ", "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", >> "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", "Item 5 (ø = 3.5) ", >> "Item 5 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", >> "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", "Item 6 (ø = 3.5) ", >> "Item 6 (ø = 3.5) ", "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", >> "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", "Item 7 (ø = 3.4) ", >> "Item 7 (ø = 3.4) ", "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) ", >> "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) ", "Item 8 (ø = 3.3) ", >> "Item 8 (ø = 3.3) "), value = >> structure(c(1L, 2L, 3L, 4L, 5L, >> 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, >> 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, >> 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1 = very >> satisfied", >> "2", "3", >> "4", "5", "6 = very dissatified"), class = "factor"), >> n = c(14L, 20L, 24L, 14L, 16L, 14L, 9L, 15L, >> 21L, 20L, 14L, >>23L, 19L, 17L, 16L, 14L, 16L, 20L, 22L, >> 17L, 15L, 16L, 20L, >>12L, 19L, 15L, 16L, 15L, 18L, 19L, 18L, >> 15L, 18L, 18L, 16L, >>17L, 17L, 20L, 17L, 17L, 14L, 16L, 16L, >> 25L, 16L, 17L, 8L, >>20L)), .Names = c("variable", "value", >> "n"), row.names = >> c(NA, >> -48L), vars = list("variable"), drop = TRUE, >> indices = >> list(0:5, >>6:11, 12:17, 18:23, 24:29, 30:35, 36:41, >> 42:47), >> group_sizes = c(6L, >> 6L, 6L, 6L, 6L, 6L, 6L, 6L), >> biggest_group_size = 6L, >> labels = st
[R] ggplot2: ..n.. and ..count.. in geom_text
Hi All, I have the following code: -- cut (g03_02_p02 <- ggplot(data = d_kzb_input) + geom_bar( mapping = aes(x = v03_02_r01, y = round(..prop.. * 100, 0)), fill = c_ww_palette["blue"]) + scale_y_continuous(limits = c(0, c_y_limit)) + theme_classic() + ggtitle(paste0("Question 3", "(n = ", <>, ")")) + # How can I refer to the number of cases for this plot? Is there something like "..n.."? xlab("Orders") + ylab("Percent") + geom_text( aes(label = ..count..), # How can I refer to the counts for the labels of the columns? color = "white", position = position_stack(vjust = 0.5))) -- cut -- I would like to refer to the internal statistics of the geom_bar(): How can I refer to the number of cases for this plot? Is there something like "..n.."? How can I refer to the counts for the labels of the columns? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Follow-up: RStudio: Place for Storing Options (as plain text)
Hi All, some time ago I asded a question about the places where RStudio stores it configuration information. I came across this posting https://support.rstudio.com/hc/en-us/articles/206382178?version=1.0.136&mode=desktop explaining RStudio keybindings (predefined and customized). At the end of the article is the information that RStudio stores keybindings in ~/.R/rstudio/keybindings/rstudio_commands.json ~/.R/rstudio/keybindings/editor_commands.json I want to share this with you. Kind regards Georg - Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 19.04.2017 10:10 - Von:Georg Maubach/WWBO/WW/HAW An: R-help mailing list , Kopie: Martin Maechler , Jeff Newmiller Datum: 08.03.2017 08:59 Betreff:Follow-up: [R] RStudio: Place for Storing Options (as plain text) Hi All, I got a late reply from RStudio Support concerning the question where RStudio store options and configurations: -- cut -- The post RStudio Config Files has a new comment. . . . Unfortunately, it's unlikely that we'll be able to provide a programmatic R interface in the near future -- the way we lay out and store RStudio's client state does not make it as amenable to public consumption as we might hope. That said, you can generally copy everything within that folder to a new machine (at the same relative path from the user home directory), and expect preferences to be respected + restored as you might expect. . . . --cut -- The result of the discussion is: We can copy the complete RStudio directory for storing options and configurations under %localappdata%\RStudio-Desktop or C:\Users\\AppData\Local\RStudio-Desktop and copy it completely to a new installation of RStudio. A programmatic approach to edit RStudio options and configurations is not possible due to design decisions. The purpose of the initial question was to find a way to save RStudio options and configurations, e g. on git/github or similar. This is possible by initialising the above given directory with git or similar. An open question is what happens if a new RStudio release makes changes to the options and configurations. If the stored directory can be completely used would need additional clearification, i.e. for each new version. Kind regards Georg Von:Martin Maechler An: Kopie: , Datum: 23.02.2017 08:37 Betreff:Re: [R] RStudio: Place for Storing Options > Jeff Newmiller > on Sat, 11 Feb 2017 08:09:36 -0800 writes: > For the record, then, Google listened to my incantation of > "rstudio configuration file" and the second result was: > https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-Desktop-s-State > RStudio Desktop is also open source, so you can download > the source code and look at the operating-system-specific > bits (for "where") if the above link goes out of date or > disappears. Thanks a lot, Jeff! And for the archives: On reasonable OS's, the hidden directory/folder containing all the info is ~/.rstudio-desktop/ and if "things are broken" the recommendation is to rename that mv ~/.rstudio-desktop ~/backup-rstudio-desktop and (zip and) send along with your e-mail to the experts for diagnosis. > On Thu, 9 Feb 2017, Martin Maechler wrote: >> >>> Ulrik Stervbo on Thu, 9 >>> Feb 2017 14:37:57 + writes: >> >> > Hi Georg, > maybe someone here knows, but I think you >> are more likely to get answers to > Rstudio related >> questions with RStudio support: > >> https://support.rstudio.com/hc/en-us >> >> > Best, > Ulrik >> >> Indeed, thank you, Ulrik. >> >> In this special case, however, I'm quite sure many >> readers of R-help would be interested in the answer; so >> once you receive an answer, please post it (or a link to >> a public URL with it) here on R-help, thank you in >> advance. >> >> We would like to be able to *save*, or sometimes *set* / >> *reset* such options "in a scripted manner", e.g. for >> controlled exam sessions. >> >> Martin Maechler, ETH Zurich >> >> > On Thu, 9 Feb 2017 at 12:35 >> wrote: >> >> >> Hi All, >> I would like to make a backup of my RStudio >> IDE options I configure using >> "Tools/Global Options" >> from the menu bar. Searching the >> web did not reveal >> anything. >> >> >> Can you tell me where RStudio IDE does store its >> configuration? >> >> >> Kind regards >> Georg >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and >> more, see https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide >> commented, minimal, self-contained, reproducible code. >> > -
[R] Multiple-Response Analysis: Cleaning of Duplicate Codes
Hi All, in my current project I am working with multiple-response questions (MRSets): -- Coding -- 100 Main Code 1 110 Sub Code 1.1 120 Sub Code 1.2 130 Sub Code 1.3 200 Main Code 2 210 Sub Code 2.1 220 Sub Code 2.2 230 Sub Code 2.3 300 Main Code 3 310 Sub Code 3.1 320 Sub Code 3.2 The coding for the variables is to detailed. Therefore I have recoded all sub codes to the respective main code, e.g. all 110, 120 and 130 to 100, all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300. Now it happens that some respondents get several times the same main code. If the coding was done for respondent 1 with 120 and 130 after recoding the values are 100 and 100. If I count this, it would mean that I weight the multiple values of this respondent by factor 2. This is not my aim. I would like to count the 100 for the respective respondent only once. Here is my script so far: # -- cut -- library(expss) d_sample <- structure( list( c05_01 = c( 110, 110, 130, 110, 110, 110, 110, 110, 110, 110, 110, 999, 110, 495, 160, 110, 410 ), c05_02 = c(NA, NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 170, NA, 130), c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410, NA, NA, NA, NA, NA, NA, NA), c05_04 = c( NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ ), c05_05 = c( NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ ) ), .Names = c("c05_01", "c05_02", "c05_03", "c05_04", "c05_05"), row.names = c( "1", "2", "3", "4", "5", "10", "11", "12", "13", "14", "15", "20", "21", "22", "23", "24", "25" ), class = "data.frame" ) c05_xx_r01 <- d_sample %>% select(starts_with("c05_")) %>% recode(c( 110 %thru% 195 ~ 100, 210 %thru% 295 ~ 200, 310 %thru% 395 ~ 300, 410 %thru% 495 ~ 400, 510 %thru% 595 ~ 500, 810 %thru% 895 ~ 800, 910 %thru% 999 ~ 900)) names(c05_xx_r01) <- paste0("c05_0", 1:5, "_r01") d_sample <- cbind(d_sample, c05_xx_r01) # -- cut -- I would like to eliminate all duplicates codes, e. g. 100 and 100 for respondents in row 3, 6, 13, 14 and 15 to 100 only once: # -- cut -- d_sample_1 <- structure( list( c05_01 = c( 110, 110, 130, 110, 110, 110, 110, 110, 110, 110, 110, 999, 110, 495, 160, 110, 410 ), c05_02 = c(NA, NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 170, NA, 130), c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410, NA, NA, NA, NA, NA, NA, NA), c05_04 = c( NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ ), c05_05 = c( NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ ), c05_01_r01 = c( 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 900, 100, 400, 100, 100, 400 ), c05_02_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 100), c05_03_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, NA, NA, NA), c05_04_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), c05_05_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) ), .Names = c( "c05_01", "c05_02", "c05_03", "c05_04", "c05_05", "c05_01_r01", "c05_02_r01", "c05_03_r01", "c05_04_r01
[R] Antwort: Re: Multiple-Response Analysis: Cleaning of Duplicate Codes (SOLVED)
Hi Bert, many thanks for your reply. I appreciate your help a lot. I would like to do the operation (= finding the duplicates) row-wise. During this night a solution showed up in my dreams :) Instead of using duplicates() to flag and filter the values I could use unique instead with the same result. I tested: # -- cut -- apply(X = c05_xx_r01, MARGIN = 1, unique) # -- cut -- This finds the unique values for each row. That is nice but lacks the requirement that I need a dataframe with a set of variables back that is as long as the total amount of unique values for the complete data.frame/matrix or the amount of variable of the original data.frame respectively. The result of the above operation gives a list instead of a data.frame due to the fact that the amount of resulting values vary from 1 to 7. Therefore no data.frame but a list is returned. I search the web for a solution and found: http://stackoverflow.com/questions/15753091/convert-mixed-length-named-list-to-data-frame The complete solution would then look like: # -- cut -- library(stringi) library(tidyverse) my_list <- apply(c05_xx_r01, MARGIN = 1, unique) my_tibble <- as_tibble(stringi::stri_list2matrix(my_list, byrow = TRUE) # DONE ! # -- cut -- All-in-all thanks again for your help. Kind regards Georg P.S: I had a look into ?unique. The statement "unique(c05_xx_r01, MARGIN = 1) does not do the job, cause this looks for unique combinations of values on all columns. But that is not the desired outcome. Von:Bert Gunter An: g.maub...@weinwolf.de, Kopie: R-help Datum: 25.04.2017 19:10 Betreff:Re: [R] Multiple-Response Analysis: Cleaning of Duplicate Codes If I understand you correctly, one way is: > z <- rep(LETTERS[1:3],4) > z [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" > z[!duplicated(z)] [1] "A" "B" "C" ?duplicated -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Apr 25, 2017 at 9:36 AM, wrote: > Hi All, > > in my current project I am working with multiple-response questions > (MRSets): > > -- Coding -- > 100 Main Code 1 > 110 Sub Code 1.1 > 120 Sub Code 1.2 > 130 Sub Code 1.3 > > 200 Main Code 2 > 210 Sub Code 2.1 > 220 Sub Code 2.2 > 230 Sub Code 2.3 > > 300 Main Code 3 > 310 Sub Code 3.1 > 320 Sub Code 3.2 > > The coding for the variables is to detailed. Therefore I have recoded all > sub codes to the respective main code, e.g. all 110, 120 and 130 to 100, > all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300. > > Now it happens that some respondents get several times the same main code. > If the coding was done for respondent 1 with 120 and 130 after recoding > the values are 100 and 100. If I count this, it would mean that I weight > the multiple values of this respondent by factor 2. This is not my aim. I > would like to count the 100 for the respective respondent only once. > > Here is my script so far: > > # -- cut -- > > library(expss) > > d_sample <- > structure( > list( > c05_01 = c( > 110, > 110, > 130, > 110, > 110, > 110, > 110, > 110, > 110, > 110, > 110, > 999, > 110, > 495, > 160, > 110, > 410 > ), > c05_02 = c(NA, > NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, > 170, > NA, 130), > c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410, > NA, NA, NA, NA, NA, NA, NA), > c05_04 = c( > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_ > ), > c05_05 = c( > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_ > ) > ), > .Names = c("c05_01", >"c05_02", "c05_03", "c05_04", "c05_05"), > row.names = c( > "1", > "2", > "3", > "4", > "5", > "10", > "11", > "12", > "13", > "14", > "15", > "20", > "21", > "22", > "23", > "24", > "25" > ), > class = "data.frame" > ) > > c05_xx_r01 <- d_sample %>% > select(starts_with("c05_")) %>% > recode(c( > 110 %thru% 195 ~ 100, > 210 %thru% 295 ~ 200, > 310 %thru% 395 ~ 300, > 410 %thru% 495 ~ 400, > 510 %thru% 595
[R] Antwort: Re: Multiple-Response Analysis: Cleaning of Duplicate Codes (SOLVED)
Hi Bert, many thanks for your reply. I appreciate your help a lot. I would like to do the operation (= finding the duplicates) row-wise. During this night a solution showed up in my dreams :) Instead of using duplicates() to flag and filter the values I could use unique instead with the same result. I tested: # -- cut -- apply(X = c05_xx_r01, MARGIN = 1, unique) # -- cut -- This finds the unique values for each row. That is nice but lacks the requirement that I need a dataframe with a set of variables back that is as long as the total amount of unique values for the complete data.frame/matrix or the amount of variable of the original data.frame respectively. The result of the above operation gives a list instead of a data.frame due to the fact that the amount of resulting values vary from 1 to 7. Therefore no data.frame but a list is returned. I search the web for a solution and found: http://stackoverflow.com/questions/15753091/convert-mixed-length-named-list-to-data-frame The complete solution would then look like: # -- cut -- library(stringi) library(tidyverse) my_list <- apply(c05_xx_r01, MARGIN = 1, unique) my_tibble <- as_tibble(stringi::stri_list2matrix(my_list, byrow = TRUE) # DONE ! # -- cut -- All-in-all thanks again for your help. Kind regards Georg P.S: I had a look into ?unique. The statement "unique(c05_xx_r01, MARGIN = 1) does not do the job, cause this looks for unique combinations of values on all columns. But that is not the desired outcome. Von:Bert Gunter An: g.maub...@weinwolf.de, Kopie: R-help Datum: 25.04.2017 19:10 Betreff:Re: [R] Multiple-Response Analysis: Cleaning of Duplicate Codes If I understand you correctly, one way is: > z <- rep(LETTERS[1:3],4) > z [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C" > z[!duplicated(z)] [1] "A" "B" "C" ?duplicated -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Apr 25, 2017 at 9:36 AM, wrote: > Hi All, > > in my current project I am working with multiple-response questions > (MRSets): > > -- Coding -- > 100 Main Code 1 > 110 Sub Code 1.1 > 120 Sub Code 1.2 > 130 Sub Code 1.3 > > 200 Main Code 2 > 210 Sub Code 2.1 > 220 Sub Code 2.2 > 230 Sub Code 2.3 > > 300 Main Code 3 > 310 Sub Code 3.1 > 320 Sub Code 3.2 > > The coding for the variables is to detailed. Therefore I have recoded all > sub codes to the respective main code, e.g. all 110, 120 and 130 to 100, > all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300. > > Now it happens that some respondents get several times the same main code. > If the coding was done for respondent 1 with 120 and 130 after recoding > the values are 100 and 100. If I count this, it would mean that I weight > the multiple values of this respondent by factor 2. This is not my aim. I > would like to count the 100 for the respective respondent only once. > > Here is my script so far: > > # -- cut -- > > library(expss) > > d_sample <- > structure( > list( > c05_01 = c( > 110, > 110, > 130, > 110, > 110, > 110, > 110, > 110, > 110, > 110, > 110, > 999, > 110, > 495, > 160, > 110, > 410 > ), > c05_02 = c(NA, > NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, > 170, > NA, 130), > c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410, > NA, NA, NA, NA, NA, NA, NA), > c05_04 = c( > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_ > ), > c05_05 = c( > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_, > NA_real_ > ) > ), > .Names = c("c05_01", >"c05_02", "c05_03", "c05_04", "c05_05"), > row.names = c( > "1", > "2", > "3", > "4", > "5", > "10", > "11", > "12", > "13", > "14", > "15", > "20", > "21", > "22", > "23", > "24", > "25" > ), > class = "data.frame" > ) > > c05_xx_r01 <- d_sample %>% > select(starts_with("c05_")) %>% > recode(c( > 110 %thru% 195 ~ 100, > 210 %thru% 295 ~ 200, > 310 %thru% 395 ~ 300, > 410 %thru% 495 ~ 400, > 510 %thru% 595
[R] Factors and Alternatives
Hi All, I am using factors in a study for the social sciences. I discovered the following: -- cut -- library(dplyr) test1 <- c(rep(1, 4), rep(0, 6)) d_test1 <- data.frame(test) test2 <- factor(test1) d_test2 <- data.frame(test2) test3 <- factor(test1, levels = c(0, 1), labels = c("WITHOUT Contact", "WITH Contact")) d_test3 <- data.frame(test3) d_test1 %>% filter(test1 == 0) # works OK d_test2 %>% filter(test2 == 0) # works OK d_test3 %>% filter(test3 == 0) # does not work, why? myf <- function(ds) { print(levels(ds$test3)) print(labels(ds$test3)) print(as.numeric(ds$test3)) print(as.character(ds$test3)) } # This showsthat it is not possible to access the original # values which were the basis to build the factor: myf(d_test3) -- cut -- Why is it not possible to use a factor with labels for filtering with the original values? Is there a data structure that works like a factor but gives also access to the original values? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Factors and Alternatives
Hi Bob, many thanks for your reply. I have read the documentation. In my current project I use "item batteries" for dimensions of touchpoints which are rated by our customers. I wrote functions to analyse them. If I create a factor before filtering and analysing I lose the original values of the variable. If I use the original variable for filtering and analysis I might happen that for some dimensions values were not selected. This means they are not NA but none of the respondents chose "4" for instance on a scale from 1 to 6. That means that creating a factor from the analysed data with the complete scale (1:6) fails due the different vector length (amount of remaining unique values in the analysis vs values in the scale). As I have a function doing the analysis I am looking for a way to make my function robust to such circumstances and be able to use it to analyse all "item batteries". Thus my question. I believe my findings are not odd. Maybe there is a way dealing with that kind of problems in R and I am eager to learn how it can be solved using R. What would you suggest? Kind regards Georg Von:"Bob O'Hara" An: g.maub...@weinwolf.de, Kopie: r-help Datum: 09.05.2017 12:26 Betreff:Re: [R] Factors and Alternatives That's easy! First > str(test3) Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1 tells you that the internal values are 1 and 2, and the labels are "WITHOUT Contact" and "WITH Contact". If you read the help page for factor() you'll see this: levels: an optional vector of the values (as character strings) that ‘x’ might have taken. The default is the unique set of values taken by ‘as.character(x)’, sorted into increasing order _of ‘x’_. Note that this set can be specified as smaller than ‘sort(unique(x))’. labels: _either_ an optional character vector of (unique) labels for the levels (in the same order as ‘levels’ after removing those in ‘exclude’), _or_ a character string of length 1. So, when you create test3 you say that test can take values 0 and 1, and these should be labelled as "WITHOUT Contact" and "WITH Contact". So R internally codes "1" as 1 and "0" as 2 (internally R codes factors as integers, which can be both useful and dangerous), and then gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't care that they were 1 and 0, because you've told it to change the labels. If you want to filter by the original values, then don't change the labels (or at least not until after you've filtered by the original labels), or convert the filter to the new labels. You're asking for a data structure with two sets of labels, which sounds odd in general. Bob On 9 May 2017 at 12:12, wrote: > Hi All, > > I am using factors in a study for the social sciences. > > I discovered the following: > > -- cut -- > > library(dplyr) > > test1 <- c(rep(1, 4), rep(0, 6)) > d_test1 <- data.frame(test) > > test2 <- factor(test1) > d_test2 <- data.frame(test2) > > test3 <- factor(test1, > levels = c(0, 1), > labels = c("WITHOUT Contact", "WITH Contact")) > d_test3 <- data.frame(test3) > > d_test1 %>% filter(test1 == 0) # works OK > d_test2 %>% filter(test2 == 0) # works OK > d_test3 %>% filter(test3 == 0) # does not work, why? > > myf <- function(ds) { > print(levels(ds$test3)) > print(labels(ds$test3)) > print(as.numeric(ds$test3)) > print(as.character(ds$test3)) > } > > # This showsthat it is not possible to access the original > # values which were the basis to build the factor: > myf(d_test3) > > -- cut -- > > Why is it not possible to use a factor with labels for filtering with the > original values? > Is there a data structure that works like a factor but gives also access > to the original values? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bob O'Hara NOTE NEW ADDRESS!!! Institutt for matematiske fag NTNU 7491 Trondheim Norway Mobile: +49 1515 888 5440 Journal of Negative Results - EEB: www.jnr-eeb.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: RE: Antwort: Re: Factors and Alternatives (SOLVED)
Hi David, Hi Bob, many thanks for your help. Your solution - just to use all levels instead of just the one's found in the data - helped. The original code looked like this: -- cut -- c_v10_val_labs <- c( "1 = sehr gut", "2", "3", "4", "5", "6 = sehr schlecht" ) # where c_v10_val_labs is handed over to my function as "val_labs". ds_results$value <- factor(ds_results$value, levels = sort(unique(ds_results$value)), # old code labels = sort(unique(val_labs))) -- cut -- If I write instead -- cut -- ds_results$value <- factor(ds_results$value, levels = seq_along(val_labs), # new code 1st version labels = sort(unique(val_labs))) -- cut -- Your solution builds a factor with all factor levels even if a value for factor is not present (not NA, but does just not occur in the data, i.e. not stated by any respondent). In Zumel's book "Practical Data Science with R" ( https://www.amazon.de/Practical-Data-Science-Nina-Zumel/dp/1617291560), Shelter Island: Manning, 2014, p. 23-24, Listing 2-5, a mapping using subscripts is described: -- cut -- mapping <- list( 'A40'='car (new)', 'A41'='car (used)', 'A42'='furniture/equipment', 'A43'='radio/television', 'A44'='domestic appliances', ... ) for(i in 1:(dim(d))[2]) { if(class(d[,i])=='character') { d[,i] <- as.factor(as.character(mapping[d[,i]])) } } -- cut - Simple stated this would mean: -- cut -- val_labs <- list( "1" = "1 = sehr gut", "2" = "2", "3" = "3", "4" = "4", "5" = "5", "6" = "6 = sehr schlecht" ) set.seed(12345) answers = c(sample(1:5, 10, replace = TRUE)) test <- factor(unlist(val_labs[answers])) # or just val_labs <- c( "1 = sehr gut", "2", "3", "4", "5", "6 = sehr schlecht" ) set.seed(12345) answers = c(sample(1:5, 10, replace = TRUE)) test <- val_labs[answers] -- cut -- Adapting this to my code would give: -- cut -- ds_results$value <- factor(ds_results$value, levels = sort(unique(ds_results$value)), labels = val_labs[sort(unique(ds_results$value))]) # new code 2nd version -- cut -- This results in a factor just as long as the vector of unique resulting values. Both solutions work. Which version is best depends on the overall process and the purpose of the code. I document all this for use by readers who refer later to the list archives. Using your version and running my code reveals that ggplot runs into difficulties cause the legend lacks values and the sequence and coloring of the legend is wrong. But that's another story. Many thanks again for your help. Kind regards Georg Von:David L Carlson An: "g.maub...@weinwolf.de" , "Bob O'Hara" , Kopie: r-help Datum: 09.05.2017 14:37 Betreff:RE: [R] Antwort: Re: Factors and Alternatives I'm not sure I understand your question, but you can easily include all possible answers when you create the factor by using the levels= argument as Bob pointed out. Here is an example of values that range from 1 to 6, but value 3 is not represented. Notice that a factor level 3 is created even though it does not appear in the data: > set.seed(42) > x <- sample.int(6, 10, replace=TRUE) > table(x) x 1 2 4 5 6 1 1 3 3 2 > y <- factor(x, levels=1:6) > y [1] 6 6 2 5 4 4 5 1 4 5 Levels: 1 2 3 4 5 6 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 Von:"Bob O'Hara" An: g.maub...@weinwolf.de, Kopie: r-help Datum: 09.05.2017 13:58 Betreff:Re: Re: [R] Factors and Alternatives For the problem you state, would it be enough to explicitly define your levels? fac <- rep(c("a", "b", "d"), each=4) fac.f <- factor(fac, levels=c("a", "b", "c", "d")) table(fac.f) # but be warned... fac.f2 <- factor(fac.f) table(fac.f2) This has the advantage that the code explicitly documents what the possible values are, so if something goes wrong down-stream, you know it is a real problem (well, unless you have some type conversions screwing things up). You might also want to do some defensive programming, and put some checks in the code, to make sure your factors have the right number of levels. Bob -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of g.maub...@weinwolf.de Sent: Tuesday, May 9, 2017 6:37 AM To: Bob O'Hara Cc: r-help Subject: [R] Antwort: Re: Factors and Alternatives Hi Bob, many thanks for your reply. I have read the documentation. In my current project I use "item batteries" for dimensions of touchpoints which are rated by our customers. I wrote functions to analyse them. If I create a factor before filtering and analysing I lose the original values of the variable. If I use the original variable for filtering and analysis I might happen that for some dimensions values were not selected. T
[R] Off-Topic: Project Organisation
Hi All, this post is somewhat off-topic cause it deals with a meta issue related to project organisation instead of real R code. I have updated my blog concerning a possible directory and file structure for marketing research projects and data mining projects alike: https://github.com/gmaubach/R-Know-How/wiki/R-Blog There I condensed best practices already communicated in articels, books, packages and guidelines into a new universial structure. It shall serve as a template and construction kit which you can use to create a structure that suits your project best. Comments and suggestions are welcome. Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot: Pie Chart with correct labels
Hi All, I would like to do the following pie chart using ggplot from an official data source ( http://www.deutscheweine.de/fileadmin/user_upload/Website/Service/Downloads/Statistik_2016-2017-neu.pdf , Tab 8, Page 14): -- cut -- cat("# weinimport_piechart.R\n") # -- Input d_wine_import_DE <- structure(list(Land = structure(1:24, .Label = c("Italien", "Frankreich", "Spanien", "USA", "Südafrika", "Chile", "Österreich", "Australien", "Portugal", "Griechenland", "Argentinien", "Neuseeland", "Ungarn", "Mazedonien", "Schweiz", "Dänemark", "Moldawien", "Türkei", "Belgien/Luxemburg", "Rumänien", "Ukraine", "Kroatien", "Israel", "Georgien"), class = "factor"), Menge_hl_2015 = c(5481000, 2248000, 3824000, 493000, 845000, 539000, 308000, 446000, 153000, 99000, 64000, 43000, 123000, 186000, 5000, 9000, 28000, 7000, 1, 15000, 4000, 4000, 2000, 2000)), .Names = c("Land", "Menge_hl_2015"), class = "data.frame", row.names = c(NA, -24L)) names(d_wine_import_DE) # -- Data - d_result <- data.frame( country = d_wine_import_DE$Land, abs = d_wine_import_DE$Menge_hl_2015) %>% mutate(rel = round(abs / sum(abs) * 100, 1)) %>% dplyr::arrange(desc(abs)) %>% dplyr::mutate(rel_labs = paste(rel, "%")) %>% # rev() does not work dplyr::mutate(breaks = cumsum(abs) - (abs / 2)) # rev() does not work # -- Plot - d_result %>% ggplot() + geom_bar( aes(x = "", y = abs, fill = country), stat = "identity") + # %SOURCE% # coord_polar(): Wickham: ggplot2, Springer, 2nd Ed., p. 166 coord_polar(theta = "y", start = 0) + guides( fill = guide_legend( title = "Länder", reverse = FALSE) ) + scale_y_continuous( breaks = d_result$breaks, # simply "breaks" does not work labels = d_result$rel_labs, # simply "breaks" does not work trans = "reverse" ) + # %SOURCE% # Kassambra: Guide to Create Beautiful Graphics # in R, sthda.com, 2nd Ed., 2013, p. 136ff theme_minimal() + theme( panel.border = element_blank(), panel.grid = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank() # axis.text.x = element_text(size = 15) ) + labs( title = paste0("Weinimport nach Deutschland 2015")) -- cut -- I can not figure out how to align the labels (values in %) with the reverse printed countries. Also the breaks and labels do need the dataset name although I thought "breaks" and "rel_labs" is sufficient due to the piping operator. Can you help me by telling how to 1. get the order of the labels right 2. Why I need to reference "breaks" and "labels" completely? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] purrr::pmap does not work
Hi All, I try to do a scatterplot for a bunch of variables. I plot a dependent variable against a bunch of independent variables: -- cut -- graphics::plot( v01_r01 ~ v08_01_up11, data = dataset, xlab = "Dependent", ylab = "Independent #1" ) -- cut -- It is tedious to repeat the statement for all independent variables. Found an alternative, i.e. : -- cut -- mu <- list(5, 10, -3) sigma <- list(1, 5, 10) n <- list(1, 3, 5) fargs <- list(mean = mu, sd = sigma, n = n) fargs %>% purrr::pmap(rnorm) %>% str() -- cut -- I tried to use this for may scatterplot task: -- cut -- var_battery$v08 <- paste0("v08_", formatC(1:8, width = 2, format = "d", flag = "0")) v08_var_labs <- paste0("Label_", 1:8) dataset <- as.data.frame( matrix( data = sample( x = 1:11, size = 90, replace = TRUE), nrow = 10, ncol = 9)) names(dataset) <- c("v01_r01", var_battery$v08) independent <- as.list(dataset$v01_r01) dependent <- as.list(dataset[var_battery$v08]) fargs <- list( x = independent, y = dependent, ylab = v08_var_labs) fargs %>% purrr::pmap( function(d = dataset, xvalue = x, yvalue = y, xlab = "Label for x variable", ylab = ylab) { graphics::plot( xvalue ~ yvalue, data = d, xlab = xlab, ylab = ylab) } ) -- cut -- The last statement comes back with Error: Element 2 has length 8, not 1 or 10. How can I get it up n running? Do you suggest a better solution for the task described? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Paths in knitr
Hi All, I have to compile a report for the management and decided to use RMarkdown and knitr. I compiled all needed plots (using separate R scripts) before compiling the report, thus all plots reside in my graphics directory. The RMarkdown report needs to access these files. I have defined ```{r setup, include = FALSE} knitr::opts_knit$set( echo = FALSE, xtable.type = "html", base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung", root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung", fig.path = "results/graphics") # relative path required, see http://yihui.name/knitr/options ``` and then referenced my plot using because I want to be able to customize the plotting attributes. But that fails with the message "pandoc.exe: Could not fetch email_distribution_pie.png". If I give it the absolute path "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics/email_distribution_pie.png" it works fine as well if I copy the plot into the directory where the report.RMD file resides. How can I tell knitr to fetch the ready-made plots from the graphics directory? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Paths in knitr
Hi Yihui, Hi Duncan, I corrected my typo. Unfortunately knitr did not find my plots in the directory where they reside which is different from the Rmd document. The documentation of knitr says: base.dir: (NULL) an absolute directory under which the plots are generate root.dir: (NULL) the root directory when evaluating code chunks; if NULL, the directory of the input document will be used >From that description I thought, if the base.dir can be used for writng plots, it is then also used for reading plots if set? No, it is not. If I set the root directory to the plots/graphics directory will knitr then find my plots? No, it does not. Reading blog posts my thoughts looked not so strange to me, e.g. https://philmikejones.wordpress.com/2015/05/20/set-root-directory-knitr/. Unfortunately, it does not work for me. I am using a RStudio project file. Could it be that this interferes which the knitr options? I tried the solution that Duncan suggested: c_path_plots <- "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics `r knitr::include_graphics(file.path(c_path_plots, "email_distribution_pie.png"))` This solution works fine. I will go with it for this project as I have to finish my report soon. I read Hadley's book on bulding R Packages ( https://www.amazon.de/R-Packages-Hadley-Wickham/dp/1491910593) and found it quite complicated and time consuming to build one. Thus I did not try yet to build my own packages. At the end of last week I heard from another library (http://reaktanz.de/R/pckg/roxyPackage/) which shall make building packages much easier. I plan to try that shortly. On my path to become better in analytics using R, I will try to use modules of Rmd files which can then easily be integrated into a Rmd report. I have yet to see how I can include these file into a complete report. Kind regards Georg - Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 12.06.2017 08:47 - Von:Yihui Xie An: g.maub...@gmx.de, Kopie: R Help Datum: 09.06.2017 20:53 Betreff:Re: [R] Paths in knitr Gesendet von: "R-help" I'd say it is an expert-only option. If you do not understand what it means, I strongly recommend you not to set it. Similarly, you set the root_dir option and I don't know why you did it, but it is a typo anyway (should be root.dir). Regards, Yihui -- https://yihui.name On Fri, Jun 9, 2017 at 4:50 AM, wrote: > Hi Yi, > > many thanks for your reply. > > Why I do have to se the base.dir option? Cause, to me it is not clear from > the documentation, where knitr looks for data files and how I can adjust > knitr to tell it where to look. base.dir was a try, but did not work. > > Can you give me a hint where I can find information/documentation on this > path issue? > > Kind regards > > Georg > > > > Gesendet: Donnerstag, 08. Juni 2017 um 15:05 Uhr > > Von: "Yihui Xie" > > An: g.maub...@weinwolf.de > > Cc: "R Help" > > Betreff: Re: [R] Paths in knitr > > > > Why do you have to set the base.dir option? > > > > Regards, > > Yihui > > -- > > https://yihui.name > > > > > > On Thu, Jun 8, 2017 at 6:15 AM, wrote: > > > Hi All, > > > > > > I have to compile a report for the management and decided to use > RMarkdown > > > and knitr. I compiled all needed plots (using separate R scripts) > before > > > compiling the report, thus all plots reside in my graphics directory. > The > > > RMarkdown report needs to access these files. I have defined > > > > > > ```{r setup, include = FALSE} > > > knitr::opts_knit$set( > > > echo = FALSE, > > > xtable.type = "html", > > > base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung", > > > root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung", > > > fig.path = "results/graphics") # relative path required, see > > > http://yihui.name/knitr/options > > > ``` > > > > > > and then referenced my plot using > > > > > > > > > > > > because I want to be able to customize the plotting attributes. > > > > > > But that fails with the message "pandoc.exe: Could not fetch > > > email_distribution_pie.png". > > > > > > If I give it the absolute path > > > "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/ > graphics/email_distribution_pie.png" > > > it works fine as well if I copy the plot into the directory where the > > > report.RMD file resides. > > > > > > How can I tell knitr to fetch the ready-made plots from the graphics > > > directory? > > > > > > Kind regards > > > > > > Georg > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/li
[R] Antwort: Re: Re: Paths in knitr
Hi Yihui, I took root.dir and base.dir out. Everything works fine despite the change. I have implemented the solution Duncun suggested. I have difficulties with the scaling / image size in my report. Some plots are too big, some are too small. I need to adjust any plot. Steep learning curve :) Kind regards Georg Von:Yihui Xie An: g.maub...@weinwolf.de, Kopie: Duncan Murdoch , R Help Datum: 12.06.2017 18:29 Betreff:Re: Re: [R] Paths in knitr Gesendet von: xieyi...@gmail.com Will there be anything wrong if you do not set these options? Regards, Yihui -- https://yihui.name On Mon, Jun 12, 2017 at 2:24 AM, wrote: > Hi Yihui, > Hi Duncan, > > I corrected my typo. Unfortunately knitr did not find my plots in the > directory where they reside which is different from the Rmd document. > > The documentation of knitr says: > > base.dir: (NULL) an absolute directory under which the plots are generate > root.dir: (NULL) the root directory when evaluating code chunks; if NULL, > the directory of the input document will be used > > From that description I thought, if the base.dir can be used for writng > plots, it is then also used for reading plots if set? No, it is not. > If I set the root directory to the plots/graphics directory will knitr > then find my plots? No, it does not. > > Reading blog posts my thoughts looked not so strange to me, e.g. > https://philmikejones.wordpress.com/2015/05/20/set-root-directory-knitr/ . > Unfortunately, it does not work for me. > > I am using a RStudio project file. Could it be that this interferes which > the knitr options? > > I tried the solution that Duncan suggested: > > c_path_plots <- > "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics > > `r knitr::include_graphics(file.path(c_path_plots, > "email_distribution_pie.png"))` > > This solution works fine. I will go with it for this project as I have to > finish my report soon. > > I read Hadley's book on bulding R Packages ( > https://www.amazon.de/R-Packages-Hadley-Wickham/dp/1491910593) and found > it quite complicated and time consuming to build one. Thus I did not try > yet to build my own packages. At the end of last week I heard from another > library (http://reaktanz.de/R/pckg/roxyPackage/) which shall make building > packages much easier. I plan to try that shortly. > > On my path to become better in analytics using R, I will try to use > modules of Rmd files which can then easily be integrated into a Rmd > report. I have yet to see how I can include these file into a complete > report. > > Kind regards > > Georg > > > - Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 12.06.2017 08:47 > - > > Von:Yihui Xie > An: g.maub...@gmx.de, > Kopie: R Help > Datum: 09.06.2017 20:53 > Betreff:Re: [R] Paths in knitr > Gesendet von: "R-help" > > > > I'd say it is an expert-only option. If you do not understand what it > means, I strongly recommend you not to set it. > > Similarly, you set the root_dir option and I don't know why you did it, > but > it is a typo anyway (should be root.dir). > > Regards, > Yihui > -- > https://yihui.name > > On Fri, Jun 9, 2017 at 4:50 AM, wrote: > >> Hi Yi, >> >> many thanks for your reply. >> >> Why I do have to se the base.dir option? Cause, to me it is not clear > from >> the documentation, where knitr looks for data files and how I can adjust >> knitr to tell it where to look. base.dir was a try, but did not work. >> >> Can you give me a hint where I can find information/documentation on > this >> path issue? >> >> Kind regards >> >> Georg >> >> >> > Gesendet: Donnerstag, 08. Juni 2017 um 15:05 Uhr >> > Von: "Yihui Xie" >> > An: g.maub...@weinwolf.de >> > Cc: "R Help" >> > Betreff: Re: [R] Paths in knitr >> > >> > Why do you have to set the base.dir option? >> > >> > Regards, >> > Yihui >> > -- >> > https://yihui.name >> > >> > >> > On Thu, Jun 8, 2017 at 6:15 AM, wrote: >> > > Hi All, >> > > >> > > I have to compile a report for the management and decided to use >> RMarkdown >> > > and knitr. I compiled all needed plots (using separate R scripts) >> before >> > > compiling the report, thus all plots reside in my graphics > directory. >> The >> > > RMarkdown report needs to access these files. I have defined >> > > >> > > ```{r setup, include = FALSE} >> > > knitr::opts_knit$set( >> > > echo = FALSE, >> > > xtable.type = "html", >> > > base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung", >> > > root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung", >> > > fig.path = "results/graphics") # relative path required, see >> > > http://yihui.name/knitr/options >> > > ``` >> > > >> > > and then referenced my plot using >> > > >> > > >> > > >> > > because I want to be able to customize the plotting attributes. >> > > >> > > But that fails with the message "pandoc.exe: Could not fetch >> > > email_distribution_pie.png". >> > > >> > > If I give it the absolute path >> > > "H:/2017
[R] WG: Fw: Re: rmarkdown and font size
Hi Dan, Hi All, I read the below post. I am wondering how do I know which "keys" are available, e.g. "core.r" and "pre". Where kind I find the definition of what can be adjusted and which "words" to use? Kind regards Georg > Gesendet: Donnerstag, 08. Juni 2017 um 16:16 Uhr > Von: "Nordlund, Dan (DSHS/RDA)" > An: "MacQueen, Don" , "r-help@r-project.org" > Betreff: Re: [R] rmarkdown and font size > > You can change the style, modifying a variety of things. E.g, > > --- > title: Test > --- > > > > body{ /* Normal */ > font-size: 12px; > } > td { /* Table */ > font-size: 8px; > } > h1.title { > font-size: 38px; > color: DarkRed; > } > h1 { /* Header 1 */ > font-size: 28px; > color: DarkBlue; > } > h2 { /* Header 2 */ > font-size: 22px; > color: DarkBlue; > } > h3 { /* Header 3 */ > font-size: 18px; > font-family: "Times New Roman", Times, serif; > color: DarkBlue; > } > code.r{ /* Code block */ > font-size: 12px; > } > pre { /* Code block - determines code spacing between lines */ > font-size: 14px; > } > > > Here is some normal text. It is a 12-point font. The table is in 8-point . > > ```{r example, echo=FALSE, results='asis'} > tmp <- data.frame(a=1:5, b=letters[1:5]) > print( knitr::kable(tmp, row.names=FALSE)) > ``` > > > Hope this is helpful, > > Dan > > Daniel Nordlund, PhD > Research and Data Analysis Division > Services & Enterprise Support Administration > Washington State Department of Social and Health Services > > > > -Original Message- > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > > MacQueen, Don > > Sent: Wednesday, June 07, 2017 4:58 PM > > To: r-help@r-project.org > > Subject: [R] rmarkdown and font size > > > > Suppose I have a file (named "tmp.rmd") containing: > > > > > > --- > > title: Test > > --- > > > > ```{r example, echo=FALSE, results='asis'} > > tmp <- data.frame(a=1:5, b=letters[1:5]) > > print( knitr::kable(tmp, row.names=FALSE)) > > ``` > > > > > > > > And I render it with: > > > > rmarkdown::render('tmp.rmd', > > output_format=c('html_document','pdf_document')) > > > > I get two files: > > tmp.pdf > > tmp.html > > > > Is there a way to control (change or specify) the font size of the table in the > > pdf output? > > (or of the entire document, if it can't be changed for just the table) > > > > With my actual data, the table is too wide to fit on a page in the pdf output; > > perhaps if I reduce the font size I can get it to fit. > > > > I would like the html version to still look decent, but I don't care very much > > what happens to its font size. > > > > Thanks! > > -Don > > > > -- > > Don MacQueen > > > > Lawrence Livermore National Laboratory > > 7000 East Ave., L-627 > > Livermore, CA 94550 > > 925-423-1062 > > > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Filtering String Variables
# Hi All, # # I have the following data frame (example): Debitor <- c("968691", "968691", "968691", "A04046", "A04046", "L0006", "L0006", "L0006", "L0023", "L0023", "L0056", "L0056", "L0094", "L0094", "L0094", "L0124", "L0124", "L0143", "L0170", "13459", "473908", "394704", "4711", "4712", "4713") Debitor <- as.character(Debitor) var1 <- c(11, 12, 13, 14, 14, 12, 13, 14, 10, 11, 12, 12, 12, 12, 12, 15, 17, 11, 14, 12, 17, 13, 15, 16, 11) ds_example <- data.frame(Debitor, var1) ds_example$case_id <- 1:nrow(ds_example) ds_example <- ds_example[, sort(colnames(ds_example))] ds_example # I would like to generate a data frame that contains the duplicates AND the # corresponding non-duplicates to the duplicates. # For example, finding the duplicates with deliver case 2 and 3 but the list # should also contain case 1 because case 1 is the corresponding case to the # duplicate cases 2 and 3. # For the whole example dataset that would be: needed <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) needed <- as.logical(needed) ds_example <- data.frame(ds_example, needed) ds_example # To find the duplicates and the corresponding non-duplicates duplicates <- duplicated(ds_example$Debitor) list_of_duplicated_debitors <- as.character(ds_example[duplicates, "Debitor"]) filter_variable <- unique(list_of_duplicated_debitors) ds_duplicates <- ds_example["Debitor" == filter_variable] # Result: dataset with 0 columns ds_duplicates <- ds_example["Debitor"] %in% filter_variable # Result: FALSE # How can I create a dataset like this ds_example <- ds_example[needed, ] ds_example # using the Debitor IDs? Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] WG: Filtering String Variables (SOLVED)
Hi All, the solution for my question is as follows ## Filter duplicates and correpsonding non-duplicates ### To filter duplicates and their corresponding non-duplicates use the ### following code snippet: Debitor <- c("968691", "968691", "968691", "A04046", "A04046", "L0006", "L0006", "L0006", "L0023", "L0023", "L0056", "L0056", "L0094", "L0094", "L0094", "L0124", "L0124", "L0143", "L0170", "13459", "473908", "394704", "4711", "4712", "4713") Debitor <- as.character(Debitor) var1 <- c(11, 12, 13, 14, 14, 12, 13, 14, 10, 11, 12, 12, 12, 12, 12, 15, 17, 11, 14, 12, 17, 13, 15, 16, 11) ds_example <- data.frame(Debitor, var1) ds_example$case_id <- 1:nrow(ds_example) ds_example <- ds_example[, sort(colnames(ds_example))] ds_example # This task is to generate a data frame that contains the duplicates AND the # corresponding non-duplicates to the duplicates. # For example, finding the duplicates will deliver case 2 and 3 but the list # should also contain case 1 because case 1 is the corresponding case to the # duplicate cases 2 and 3. # For the whole example dataset that would be: needed <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) needed <- as.logical(needed) ds_example <- data.frame(ds_example, needed) ds_example # To find the duplicates and the corresponding non-duplicates duplicates <- duplicated(ds_example$Debitor) list_of_duplicated_debitors <- as.character(ds_example[duplicates, "Debitor"]) filter_variable <- unique(list_of_duplicated_debitors) ### Wrong code. Do not run. ### ds_duplicates <- ds_example["Debitor" == filter_variable] # Result: dataset with 0 columns ### duplicates_and_correponding_non_duplicates <- ds_example["Debitor"] %in% filter_variable # Result: FALSE duplicates_and_correponding_non_duplicates <- ds_example$Debitor %in% filter_variable # Result: OK duplicates_and_correponding_non_duplicates <- ds_example[, "Debitor"] %in% filter_variable # Result: OK ### Create the dataset with duplicates and corresponding non-duplicates ds_example <- ds_example[duplicates_and_correponding_non_duplicates, ] ds_example It was a simple mistake when subscripting. Kind regards Georg Maubach - Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 23.05.2016 15:54 - Von:Georg Maubach/WWBO/WW/HAW An: r-help@r-project.org, Datum: 23.05.2016 15:28 Betreff:Filtering String Variables # Hi All, # # I have the following data frame (example): Debitor <- c("968691", "968691", "968691", "A04046", "A04046", "L0006", "L0006", "L0006", "L0023", "L0023", "L0056", "L0056", "L0094", "L0094", "L0094", "L0124", "L0124", "L0143", "L0170", "13459", "473908", "394704", "4711", "4712", "4713") Debitor <- as.character(Debitor) var1 <- c(11, 12, 13, 14, 14, 12, 13, 14, 10, 11, 12, 12, 12, 12, 12, 15, 17, 11, 14, 12, 17, 13, 15, 16, 11) ds_example <- data.frame(Debitor, var1) ds_example$case_id <- 1:nrow(ds_example) ds_example <- ds_example[, sort(colnames(ds_example))] ds_example # I would like to generate a data frame that contains the duplicates AND the # corresponding non-duplicates to the duplicates. # For example, finding the duplicates with deliver case 2 and 3 but the list # should also contain case 1 because case 1 is the corresponding case to the # duplicate cases 2 and 3. # For the whole example dataset that would be: needed <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) needed <- as.logical(needed) ds_example <- data.frame(ds_example, needed) ds_example # To find the duplicates and the corresponding non-duplicates duplicates <- duplicated(ds_example$Debitor) list_of_duplicated_debitors <- as.character(ds_example[duplicates, "Debitor"]) filter_variable <- unique(list_of_duplicated_debitors) ds_duplicates <- ds_example["Debitor" == filter_variable] # Result: dataset with 0 columns ds_duplicates <- ds_example["Debitor"] %in% filter_variable # Result: FALSE # How can I create a dataset like this ds_example <- ds_example[needed, ] ds_example # using the Debitor IDs? Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE
[R] Creating a data frame from scratch
Hi All, I need to create a data frame from scratch and fill variables created on the fly with values. What I have so far: -- schnipp -- # Example dataset gene <- c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0207604", "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG00394039490", "ENSG09943004048") hsap <- c(0,0,0, 0, 0, 0, 1,1, 1) mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA) mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 , NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA) ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam) ds_example$gene <- as.character(ds_example$gene) t_count_na <- function(dataset, variables = "all") # credit: http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame { ds_na <- data.frame() # if variables = "all" create character vector of variable names if (variables == "all") { variable_list <- dimnames(dataset)[[ 2 ]] } # if a character vector with variable names is given # to run the function on a defined set of selected variables else { variable_list <- variables } for (var in variable_list) { new_name <- paste0("na_", var) ds_na[[ new_name ]] <- as.data.frame(is.na(dataset[[ var ]])) } ds_na[[ "na_count" ]] <- rowSums(ds_na) return(ds_na) } test <- t_count_na(dataset = ds_example, variables = c("mmul", "mmus")) -- schnipp -- gives: Error in `[[<-.data.frame`(`*tmp*`, new_name, value = list(`is.na(dataset[[var]])` = c(TRUE, : replacement has 9 rows, data has 0 In addition: Warning message: In if (variables == "all") { : the condition has length > 1 and only the first element will be used My goal is to create a dataset from scratch on the fly which has the same amount of variables as the dataset ds_example plus a single variable storing the amount of NA's in a row for the given variables. This is the basis for a decious which cases to keep and which to drop. I do not want to alter the base dataset like ds_example in the first place nor do I want to make a copy of the existing dataset due to memory allocation. The function shall also work with big data, e. g. datasets with more than 1 GB memory consumption. I also do not want the newly created variables to be stored in the original data frame. They shall be separate. A former similar solution worked: http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html Why doesn't this one? How do I create the variables within the data frame if the data frame is empty? Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Creating a data frame from scratch (SOLVED)
Hi Dan, Hi All, many thanks for your help. Please find enclosed my little function for your use: -- cut -- #--- # Module: t_count_na.R # Author: Georg Maubach # Date : 2016-05-24 # Update: 2016-05-25 # Description : Count NA's # Source System : R 3.2.2 (64 Bit) # Target System : R 3.2.2 (64 Bit) # License : CC-BY-SA-NC #1-2-3-4-5-6-7-8 test <- FALSE t_count_na <- function(dataset, variables = "all") { # Counts the number of NA within given set of veriables # # Args: # dataset : Object with dimnames, e.g. data frame, data table. # variables: Character vector with variable names. # # Operation: # Adds the variable "na_count" to the given dataset containing the count of # NA's within the given variables # # Returns: # Original dataset with variable "na_count" added. # # Error handling: # None. # # Credits: # http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame # http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html version <- "2016-05-25" if (identical(variables, "all")) { variable_list <- names(dataset) } else { variable_list <- variables } dataset[["na_count"]] <- apply(dataset[,variable_list], 1, function(x) sum(is.na(x))) return(dataset) } #--- test <- function(do_test = FALSE) { cat("\n", "\n", "Test function t_count_na()", "\n", "\n") # Example dataset gene <- c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0207604", "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG00394039490", "ENSG09943004048") hsap <- c(0,0,0, 0, 0, 0, 1,1, 1) mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA) mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 , NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA) ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam) ds_example$gene <- as.character(ds_example$gene) cat("\n", "\n", "Example dataset before function call", "\n", "\n") print(ds_example) cat("\n", "\n", "Function call", "\n", "\n") ds_example <- t_count_na(dataset = ds_example, variables = c("mmul", "mmus")) cat("\n", "\n", "Example dataset after function call", "\n", "\n") print(ds_example) } test(do_test = test) # EOF . -- cut -- Kind regards Georg Maubach Von:"Nordlund, Dan (DSHS/RDA)" An: "r-help@r-project.org" , Datum: 24.05.2016 21:41 Betreff:Re: [R] Creating a data frame from scratch Gesendet von: "R-help" I would probably write the function something like this: t_count_na <- function(dataset, variables = "all") { if (identical(variables, "all")) { variable_list <- names(dataset) } else { variable_list <- variables } apply(dataset[,variable_list], 1, function(x) sum(is.na(x))) } Hope this is helpful, Dan Daniel Nordlund, PhD Research and Data Analysis Division Services & Enterprise Support Administration Washington State Department of Social and Health Services > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@gmx.de > Sent: Tuesday, May 24, 2016 11:55 AM > To: r-help@r-project.org > Subject: [R] Creating a data frame from scratch > > Hi All, > > I need to create a data frame from scratch and fill variables created on the fly > with values. What I have so far: > > -- schnipp -- > > # Example dataset > gene <- > c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0 > 207604", > > "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG0039403 > 9490", > "ENSG09943004048") > hsap <- c(0,0,0, 0, 0, 0, 1,1, 1) > mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA) > mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 , > NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA) > > ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam) > ds_example$gene <- as.character(ds_example$gene) > > t_count_na <- function(dataset, >variables = "all") > # credit: http://stackoverflow.com/questions/4862178/remove-rows-with- > nas-in-data-frame > { > ds_na <- data.frame() > # if variables = "all" create character vector of variable names > if (variables == "all") { > variable_list <- dimnames(dataset)[[ 2 ]] > } > # if a character vector with variable names is given > # to run the function on a defined set of selected variables > else { > variable_list <- variables > } > > for (var in variable_list) { > new_name <- paste0("na_", var) > ds_na[[ new_name ]] <- as.data.frame(is.na(datas
[R] Difference subsetting (dataset$variable vs. dataset["variable"]
Hi All, I thought dataset$variable is the same as dataset["variable"]. I tried the following: > str(ZWW_Kunden$Branche) chr [1:49673] "231" "151" "151" "231" "231" "111" "231" "111" "231" "231" "151" "111" ... > str(ZWW_Kunden["Branche"]) 'data.frame':49673 obs. of 1 variable: $ Branche: chr "231" "151" "151" "231" ... and get different results: "chr {1:49673]" vs. "data.frame". First one is a simple vector, second one is a data.frame. This has consequences when subsetting a dataset and filter cases: > ZWW_Kunden["Branche"] %in% c("315", "316", "317") [1] FALSE > head(ZWW_Kunden$Branche %in% c("315", "316", "317")) # head() only to shorten output [1] FALSE FALSE FALSE FALSE FALSE FALSE I have thought dataset$variable is the same as dataset["variable"] but actually it's not. Can you explain what the difference is? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variable labels and value labels
Hi All, I am using R for social sciences. In this field I am used to use short variable names like "q1" for question 1, "q2" for question 2 and so on and label the variables like q1 : "Please tell us your age" or q2 : "Could you state us your household income?" or something similar indicating which question is stored in the variable. Similar I am used to label values like 1: "Less than 18 years", 2 : "18 to 30 years", 3 : "31 to 60 years" and 4 : "61 years and more". I know that the packages Hmisc and memisc have a functionality for this but these labeling functions are limited to the packages they were defined for. Using the question tests as variable names is possible but very inconvenient. I there another way for labeling variables and values in R? Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Utility Functions
Hi All, I was new to R and this list a couple of mounths ago. When processing my data I got tremendous support from R-Help mailing list. The solutions I have worked out with your help might be also helpful for others. I have put the solutions in a couple of small functions with documentation and tests. You can find the software on Sourceforge.net at https://sourceforge.net/projects/r-project-utilities/files/?source=navbar You should download at least "r_toolbox.R" and store it in a directory like "r_toolbox" in your favourite project folder. Within "r_toolbox" folder put all the other files. You have to adjust the variable "t_toolbox_path" to your favourite project directory including the "r_toolbox" folder, e. g. "C:\My-Projects\t-toolbox\" on Windows or "/home/username/my-projects/r-toolbox" on Unix-like systems. You can use them for your projects. Although I developed them with great care these functions come with absolutely no warrenty. You need to use them at your own risk. As the functions are small and overseeable you will find out quickly by reading the source code that the functions are save to use. If you have any recommendations or improvement proposals please get back to me. Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing miniCRAN on Debian
Hi All, I am installng miniCRAN on Debian GNU Linux 8 Jessie (Linux analytics7 4.5.0-0.bpo.2-amd64 #1 SMP Debian 4.5.4-1~bpo8+1 (2016-05-13) x86_64 GNU/Linux) and R 3.3.0 -- cut -- > sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie) locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.3.0 -- cut -- After running sudo apt-get install libssl-dev libcurl4-openssl-dev libxml2-dev libhunspell-dev and calling install.packages(pkgs = "miniCRAN", repos = "http://cran.csiro.au";, dependencies = TRUE) I get the message - ANTICONF ERROR --- Configuration failed because hunspell was not found. Try installing: * deb: libhunspell-dev (Debian, Ubuntu, etc) * rpm: hunspell-devel (Fedora, CentOS, RHEL) * brew: hunspell (Mac OSX) If hunspell is already installed, check that 'pkg-config' is in your PATH and PKG_CONFIG_PATH contains a hunspell.pc file. If pkg-config is unavailable you can set INCLUDE_DIR and LIB_DIR manually via: R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...' Running find / -name hunspell.pc gives /usr/lib/x86_64-linux-gnu/pkgconfig/hunspell.pc and running find / -name pkg-config gives /usr/share/bash-completion/completions/pkg-config How do I need to configure R correctly to get miniCRAN running? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Unable to update R software to 3.3.0
Hi all, I did it today on Debian GNU Linux 8 Jessie this way: vim /etc/apt/sources.list deb http://cran.uni-muenster.de/bin/linux/debian jessie-cran3 ESC;:wq apt.get update apt-get install r-base r-base-dev This worked for me. When installing R packages from within R I found that R needed the following: apt-get install libssl-dev libcurl4-openssl-dev libhunspell-dev libxml2-dev You probably might to wish to install this also. HTH. Kind regards Georg Von:Marc Schwartz An: Sunish Kumar Bilandi , Kopie: R-help Datum: 01.06.2016 17:18 Betreff:Re: [R] Unable to update R software to 3.3.0 Gesendet von: "R-help" > On Jun 1, 2016, at 1:33 AM, Sunish Kumar Bilandi wrote: > > Hi Team, > > I am using RedHat 5 and installed R using YUM, (R version 3.2.3) Now I want to update R version tp 3.3.0, but I am unable to do that, Is there any alternate to do this? > > Hope to hear from your side. > > Regards, > > > Sunish Bilandi > Business Analyst, CIDA-01 > Evalueserve Hi, First, RHEL and related distributions (e.g. Fedora), have a dedicated R-SIG list: https://stat.ethz.ch/mailman/listinfo/r-sig-fedora Future queries in this domain should be submitted there, as many of the RH package maintainers (e.g. Tom Callaway, aka Spot) read that list. For R 3.3.0, it would appear that it is about a day away from being available for release: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-6fc2c863b0 So for now, it would be available via the EPEL testing repos. Otherwise, you can wait until it is available via release in the next day or so, or download the RPMS directly here: http://koji.fedoraproject.org/koji/buildinfo?buildID=762521 Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: RE: Variable labels and value labels
Hi Petr, I am looking for a general procedure that I can use with any package of R. As to my current experience it probably will happen that I need a procedure from another package than hmisc or memisc and the my solution shall work even than so that I do need to find another way to do it. Kind regards Georg Von:PIKAL Petr An: "g.maub...@weinwolf.de" , "r-help@r-project.org" , Datum: 31.05.2016 14:56 Betreff:RE: [R] Variable labels and value labels Hi see in line > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@weinwolf.de > Sent: Tuesday, May 31, 2016 2:01 PM > To: r-help@r-project.org > Subject: [R] Variable labels and value labels > > Hi All, > > I am using R for social sciences. In this field I am used to use short variable > names like "q1" for question 1, "q2" for question 2 and so on and label the > variables like q1 : "Please tell us your age" or q2 : "Could you state us your > household income?" or something similar indicating which question is stored > in the variable. > > Similar I am used to label values like 1: "Less than 18 years", 2 : "18 to > 30 years", 3 : "31 to 60 years" and 4 : "61 years and more". Seems to me that it is work for factors nnn <- sample(1:4, 20, replace=TRUE) q1 <-factor(nnn, labels=c("Less than 18 years", "18 to 30 years", "31 to 60 years","61 years and more")) You can store such variables in data.frame with names "q1" to "qwhatever" and possibly "Subject" And you can store annotation of questions in another data frame with 2 columns e.g. "Question" and "Description" Basically it is an approach similar to database and in R you can merge those two data.frames by ?merge. > > I know that the packages Hmisc and memisc have a functionality for this but > these labeling functions are limited to the packages they were defined for. It seems to me strange. What prevents you to use functions from Hmisc? Regards Petr > Using the question tests as variable names is possible but very inconvenient. > > I there another way for labeling variables and values in R? > > Kind regards > > Georg Maubach > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on
[R] Antwort: Re: Variable labels and value labels
Hi Jim, many thanks for the hint. When looking at the documentation I did not get how I do control which value gets which label. Is it possible to define it? Kind regards Georg Von:Jim Lemon An: g.maub...@weinwolf.de, r-help mailing list , Datum: 01.06.2016 03:59 Betreff:Re: [R] Variable labels and value labels Hi Georg, You may find the "add.value.labels" function in the prettyR package useful. Jim On Tue, May 31, 2016 at 10:00 PM, wrote: > Hi All, > > I am using R for social sciences. In this field I am used to use short > variable names like "q1" for question 1, "q2" for question 2 and so on and > label the variables like q1 : "Please tell us your age" or q2 : "Could you > state us your household income?" or something similar indicating which > question is stored in the variable. > > Similar I am used to label values like 1: "Less than 18 years", 2 : "18 to > 30 years", 3 : "31 to 60 years" and 4 : "61 years and more". > > I know that the packages Hmisc and memisc have a functionality for this > but these labeling functions are limited to the packages they were defined > for. Using the question tests as variable names is possible but very > inconvenient. > > I there another way for labeling variables and values in R? > > Kind regards > > Georg Maubach > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging variables
Hi All, I merged two datasets: ds_merge1 <- merge(x = ds_bw_customer_4_match, y = ds_zww_customer_4_match, by.x = "customer", by.y = "customer", all.x = TRUE, all.y = FALSE) R created a new dataset with the variables customer.x and customer.y. I would like to merge these two variable back together. I wrote a little function (code can be run) for it: -- cut -- customer.x <- c("Miller", "Smith", NA,"Bird", NA) customer.y <- c("Miller", NA, "Doe", "Fish", NA) ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE) t_merge_variables <- function(dataset, var1, var2, merged_var) { # Initialize dataset[[merged_var]] = rep(NA, nrow(dataset)) dataset[["mismatch"]] = rep(NA, nrow(dataset)) for (i in 1:nrow(dataset)) { # Check 1: var1 missing, var2 missing if (is.na(dataset[[i, var1]]) & is.na(dataset[[i, var2]])) { dataset[["mismatch"]] <- 1 # var1 & var2 are missing # Check 2: var1 filled, var2 missing } else if (!is.na(dataset[[i, var1]]) & is.na(dataset[[i, var2]])) { dataset[[i, merged_var]] <- dataset[[i, var1]] dataset[["mismatch"]] <- 0 # Check 3: var1 missing, var2 filled } else if (is.na(dataset[[i, var1]]) & !is.na(dataset[i, var2])) { dataset[[i, merged_var]] <- dataset[[i, var2]] dataset[["mismatch"]] <- 0 # Check 4: var1 == var2 } else if (dataset[[i, var1]] == dataset[[i, var2]]) { dataset[[i, merged_var]] <- dataset[[i, var1]] dataset[["mismatch"]] <- 0 # Leftover: var1 != var2 } else { dataset[[i, merged_var]] <- NA dataset[["mismatch"]] <- 2 # var1 != var2 } # end if } # end for return(dataset) } ds_var_merge1 <- t_merge_variables(dataset = ds_test, var1 = "customer.x", var2 = "customer.y", merged_var = "customer") ds_var_merge1 -- cut -- It is executed without error but delivers the wrong values in the variable "mismatch". This variable is always 1 although it should be NA, 1 or 2 respectively. Can you tell me why the variable is not correctly set? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: RE: Merging variables
Hi David, Hi Petr, many thanks for your help. With your hints I got the idea how I could do it and I came up with this solution: -- cut -- #--- # Module: t_merge_variables.R # Author: Georg Maubach # Date : 2016-06-06 # Update: 2016-06-06 # Description : Merge two variables # Source System : R 3.2.5 (64 Bit) # Target System : R 3.2.5 (64 Bit) # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #1-2-3-4-5-6-7-8 t_module_name = "t_merge_variables.R" t_version = "2016-06-06" cat( paste0("\n", t_module_name, " (Version: ", t_version, ")", "\n", "\n", "This software comes with ABSOLUTELY NO WARRANTY.", "\n", "\n")) # If do_test is not defined globally define it here locally by un-commenting it # Switch t_do_test to TRUE to run test t_do_test <- FALSE # [ Function Defintion ] t_merge_variables <- function(dataset, var1, var2, merged_var) { # Merges two variables with identical, different or missing values # # Args: # dataset (data frame, data table): #Object with dimnames, e.g. data frame, data table. # var1 (character): #Variable 1 to be merged. # var2 (character): #Variable 2 to be merged. # merged_var (class based on input variable, coercion done if possible): #Variable with the merged variables var1 and var2. # # Operation: # Var1 and var2 are merged like follows: # if var1 == var2: merged_var <- var1 # if var1 != var2: merged_var <- -900 (-900 = indicating mismatch) # if var1 is filled & var2 is missing: merged_var <- var1 # if var1 is missing & var2 is filled: merged_var <- var2 # if var1 is missing & var2 is filled: merged_var <- -999 #(-999 = indicating NA) # # Returns: # Original dataset and variable given in "merged_var" will be added. # # Error handling: # None. # # Credits: # https://www.mail-archive.com/r-help@r-project.org/msg236012.html # Initialize dataset[merged_var] = rep(NA, nrow(dataset)) dataset[merged_var] <- # Check 1: var1 missing, var2 missing ifelse(is.na(dataset[, var1]) & is.na(dataset[, var2]), # then dataset[[merged_var]] <- 0, # Check 2: var1 filled, var2 missing ifelse(!is.na(dataset[, var1]) & is.na(dataset[, var2]), # then dataset[[merged_var]] <- dataset[, var1], # Check 3: var1 missing, var2 filled ifelse(is.na(dataset[ , var1]) & !is.na(dataset[, var2]), # then dataset[[merged_var]] <- dataset[ , var2], # Check 4: var1 == var2 ifelse(dataset[, var1] == dataset[, var2], # then: use var1 dataset[[merged_var]] <- dataset[, var1], #Leftover: var1 != var2 dataset[merged_var] <- 1 return(dataset) } # [ Test Defintion ] t_test <- function(do_test = FALSE) { if (do_test == TRUE) { cat("\n", "\n", "Test function t_count_na()", "\n", "\n") # Example dataset customer.x <- c("Miller", "Smith", NA,"Bird", NA) customer.y <- c("Miller", NA, "Doe", "Fish", NA) ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE) # Call function ds_merge <- t_merge_variables( dataset = ds_test, var1 = "customer.x", var2 = "customer.y", merged_var = "customer" ) # Dataset after function call ds_merge } } # [ Test Run ]-- t_test(do_test = t_do_test) # [ Clean up ]-- rm("t_do_test", "t_module_name", "t_version", "t_test") # EOF -- cut -- It delivers the customer name if there is one or they match. If they don't match it delivers 1. If both are missing it delivers 0. This solution is for my applications sufficient. Many thanks again for your help and giving me the ideas to solve my data transformation task. Kind regards Georg Von:PIKAL Petr An: "g.maub...@weinwolf.de" , "r-help@r-project.org" , Datum: 06.06.2016 15:04 Betreff:RE: [R] Merging variables Hi Not sure if this is the most effective or general solution but Here you get 2 if the value is same in both columns, 1 if it is only in one column and the other is NA and 0 if there is mismatch of values. temp <- (ds_test[,2] %in% ds_test[,1])+(ds_test[,1] %in% ds_test[,2]) here you get 0 if t
[R] Antwort: Re: Merging variables
Hi Michael, yes, I was astonished about this behaviour either. I have worked with SPSS a lot - and that works different. I would like to share some of my data. Can you tell me how I can dump a dataset in a way that I can post it here as text? Kind regards Georg Von:Michael Dewey An: g.maub...@weinwolf.de, r-help@r-project.org, Datum: 06.06.2016 15:45 Betreff:Re: [R] Merging variables X-Originating-<%= hostname %>-IP: [217.155.205.190] Dear Georg I find it a bit surprising that you end up with customer.x and customer.y. Can you share with us a toy example of two data.frames which exhibit this behaviour? On 06/06/2016 13:29, g.maub...@weinwolf.de wrote: > Hi All, > > I merged two datasets: > > ds_merge1 <- merge(x = ds_bw_customer_4_match, y = > ds_zww_customer_4_match, > by.x = "customer", by.y = "customer", > all.x = TRUE, all.y = FALSE) > > R created a new dataset with the variables customer.x and customer.y. I > would like to merge these two variable back together. I wrote a little > function (code can be run) for it: > > -- cut -- > > customer.x <- c("Miller", "Smith", NA,"Bird", NA) > customer.y <- c("Miller", NA, "Doe", "Fish", NA) > ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE) > > t_merge_variables <- > function(dataset, >var1, >var2, >merged_var) { > > # Initialize > dataset[[merged_var]] = rep(NA, nrow(dataset)) > dataset[["mismatch"]] = rep(NA, nrow(dataset)) > > for (i in 1:nrow(dataset)) { > > # Check 1: var1 missing, var2 missing > if (is.na(dataset[[i, var1]]) & > is.na(dataset[[i, var2]])) { > dataset[["mismatch"]] <- 1 # var1 & var2 are missing > > # Check 2: var1 filled, var2 missing > } else if (!is.na(dataset[[i, var1]]) & > is.na(dataset[[i, var2]])) { > dataset[[i, merged_var]] <- dataset[[i, var1]] > dataset[["mismatch"]] <- 0 > > # Check 3: var1 missing, var2 filled > } else if (is.na(dataset[[i, var1]]) & > !is.na(dataset[i, var2])) { > dataset[[i, merged_var]] <- dataset[[i, var2]] > dataset[["mismatch"]] <- 0 > > # Check 4: var1 == var2 > } else if (dataset[[i, var1]] == dataset[[i, var2]]) { > dataset[[i, merged_var]] <- dataset[[i, var1]] > dataset[["mismatch"]] <- 0 > > # Leftover: var1 != var2 > } else { > dataset[[i, merged_var]] <- NA > dataset[["mismatch"]] <- 2 # var1 != var2 > } # end if > } # end for > return(dataset) > } > > ds_var_merge1 <- t_merge_variables(dataset = ds_test, > var1 = "customer.x", > var2 = "customer.y", > merged_var = "customer") > > ds_var_merge1 > > -- cut -- > > It is executed without error but delivers the wrong values in the variable > "mismatch". This variable is always 1 although it should be NA, 1 or 2 > respectively. > > Can you tell me why the variable is not correctly set? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: RE: Merging variables
Hi Petr, I would like to describe the data situation in brief: I have an business warehouse dataset (referred to as BW data) containing sales and an ERP customer master data dataset with additional information (referred to as ERP data). Though customer IDs and customer names are identical due to the fact that the business warehouse data is derived from the ERP data. Due to selection criteria the BW data contains slightly more customers than the ERP data. So customer names and all other information is missing in the ERP data for some cases of the BW data. If I merge those by customer ID variable customer names are duplicated using customer.x and customer.y as variable names. As both fields contains the same contents I would have expected R to merge this into one variable, e. g. customer. But this is not the case. Can I adjust the below given merge statement - which looks almost the same in my script - that R does the merge of the variables if they are identical automatically? This is my code using left join: -- cut -- ds_merge1 <- merge(x = ds_bw_customer_4_match, y = ds_erp_customer_4_match, by.x = "CustID", by.y = "CustID", all.x = TRUE, all.y = FALSE) -- cut -- Kind regards Georg Von:PIKAL Petr An: Michael Dewey , "g.maub...@weinwolf.de" , "r-help@r-project.org" , Datum: 06.06.2016 17:04 Betreff:RE: [R] Merging variables Hi Michael it is simple set.seed(111) let=sample(letters[1:10],6, replace=T) dat1<-data.frame(let=let, customer=sample(1:10,6, replace=T)) let=sample(letters[1:10],6, replace=T) dat2<-data.frame(let=let, customer=sample(1:10,6, replace=T)) merge(dat1, dat2, by.x="let", by.y="let", all=T) Of course you could add customer variable to by parameter but sometimes it is necessary to leave it out. When you have two sets of analytical results and you have 2 variables operator but you want to merge those sets e.g. by date/hour of analysis. Regards Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael > Dewey > Sent: Monday, June 6, 2016 3:46 PM > To: g.maub...@weinwolf.de; r-help@r-project.org > Subject: Re: [R] Merging variables > > X-Originating-<%= hostname %>-IP: [217.155.205.190] > > Dear Georg > > I find it a bit surprising that you end up with customer.x and customer.y. Can > you share with us a toy example of two data.frames which exhibit this > behaviour? > > On 06/06/2016 13:29, g.maub...@weinwolf.de wrote: > > Hi All, > > > > I merged two datasets: > > > > ds_merge1 <- merge(x = ds_bw_customer_4_match, y = > > ds_zww_customer_4_match, > > by.x = "customer", by.y = "customer", > > all.x = TRUE, all.y = FALSE) > > > > R created a new dataset with the variables customer.x and customer.y. > > I would like to merge these two variable back together. I wrote a > > little function (code can be run) for it: > > > > -- cut -- > > > > customer.x <- c("Miller", "Smith", NA,"Bird", NA) > > customer.y <- c("Miller", NA, "Doe", "Fish", NA) > > ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = > > FALSE) > > > > t_merge_variables <- > > function(dataset, > >var1, > >var2, > >merged_var) { > > > > # Initialize > > dataset[[merged_var]] = rep(NA, nrow(dataset)) > > dataset[["mismatch"]] = rep(NA, nrow(dataset)) > > > > for (i in 1:nrow(dataset)) { > > > > # Check 1: var1 missing, var2 missing > > if (is.na(dataset[[i, var1]]) & > > is.na(dataset[[i, var2]])) { > > dataset[["mismatch"]] <- 1 # var1 & var2 are missing > > > > # Check 2: var1 filled, var2 missing > > } else if (!is.na(dataset[[i, var1]]) & > > is.na(dataset[[i, var2]])) { > > dataset[[i, merged_var]] <- dataset[[i, var1]] > > dataset[["mismatch"]] <- 0 > > > > # Check 3: var1 missing, var2 filled > > } else if (is.na(dataset[[i, var1]]) & > > !is.na(dataset[i, var2])) { > > dataset[[i, merged_var]] <- dataset[[i, var2]] > > dataset[["mismatch"]] <- 0 > > > > # Check 4: var1 == var2 > > } else if (dataset[[i, var1]] == dataset[[i, var2]]) { > > dataset[[i, merged_var]] <- dataset[[i, var1]] > > dataset[["mismatch"]] <- 0 > > > > # Leftover: var1 != var2 > > } else { > > dataset[[i, merged_var]] <- NA > > dataset[["mismatch"]] <- 2 # var1 != var2 > > } # end if > > } # end for > > return(dataset) > > } > > > > ds_var_merge1 <- t_merge_variables(dataset = ds_test, > > var1 = "customer.x", > > var2 = "customer.y", > > merged_var = "customer") > > > > ds_var_merge1 > > > > -- cut -- > > > > It is executed without error but delivers the wrong values in the > > variable "mismatch". This variable is always 1 although it should be > > NA, 1 or 2 respectively. > > > > Can you tell me why the variable is not correctly set? > > > > Kind regards > > > > Georg >
[R] Antwort: RE: Antwort: Re: Merging variables
Hi Petr, thanks for your reply. I prepared little example for you: -- cut -- ds_temp_1 <- structure(list( CustId = c(1001, 1002, 1003, 1004, 1005, 1006), CustName = c("Miller", "Smith", "Doe", "White", "Black", "Nobody"), sales = c(100, 500, 300, 50, 700, 10) ), .Names = c("CustId", "CustName", "sales"), row.names = c(NA, 6L), class = "data.frame") ds_temp_2 <- structure( list( CustId = c(1001, 1002, 1003), CustName = c("Miller", "Smith", "Doe"), CustGroup = c(1, 2, 3) ), .Names = c("CustId", "CustName", "CustGroup"), row.names = c(NA, 3L), class = "data.frame" ) ds_merge <- merge(ds_temp_1, ds_temp_2, by.x = "CustId", all.x = TRUE, by.y = "CustId", all.y = FALSE) ds_merge -- cut -- which gives ds_merge CustId CustName.x sales CustName.y CustGroup 1 1001 Miller 100 Miller 1 2 1002 Smith 500 Smith 2 3 1003Doe 300Doe 3 4 1004 White50 NA 5 1005 Black 700 NA 6 1006 Nobody10 NA where CustName is split into CustName.x and CustName.y. What I would like to have is: ds_merge CustId CustName sales CustGroup 1 1001 Miller 100 1 2 1002 Smith 500 2 3 1003Doe 300 3 4 1004 White50 NA 5 1005 Black 700 NA 6 1006 Nobody10 NA That is CustName in a single variable cause the values within that variable are identical. I guess because of NA for some cases in ds_temp_2 R generates CustName.x and CustName.y. Is there a simple way of merging a dataset and having R return a single variable is the values are identical or missing in either one of the datasets? Kind regards Georg Von:PIKAL Petr An: "g.maub...@weinwolf.de" , Kopie: "r-help@r-project.org" Datum: 07.06.2016 13:11 Betreff:RE: [R] Antwort: Re: Merging variables Hi > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@weinwolf.de > Sent: Tuesday, June 7, 2016 8:19 AM > To: Michael Dewey > Cc: r-help@r-project.org > Subject: [R] Antwort: Re: Merging variables > > Hi Michael, > > yes, I was astonished about this behaviour either. I have worked with SPSS a > lot - and that works different. If you want to join two data frames by common names you can use use merge(dat1, dat2, ) without specifing by. From help page: By default the data frames are merged on the columns with names they both have, but separate specifications of the columns can be given by by.x and by.y. The rows in the two data frames that match on the specified columns are extracted, and joined together. > > I would like to share some of my data. Can you tell me how I can dump a > dataset in a way that I can post it here as text? copy result of dput directly to your mail dput(dat) structure(list(hz = c(0, 25, 50), vykon = c(0, 11.6, 22.6)), .Names = c("hz", "vykon"), row.names = c(NA, -3L), class = "data.frame") We can use dat <- structure(list(hz = c(0, 25, 50), vykon = c(0, 11.6, 22.6)), .Names = c("hz", "vykon"), row.names = c(NA, -3L), class = "data.frame") to reconstruct the object. Regards Petr > > Kind regards > > Georg > > > > > Von:Michael Dewey > An: g.maub...@weinwolf.de, r-help@r-project.org, > Datum: 06.06.2016 15:45 > Betreff:Re: [R] Merging variables > > > > X-Originating-<%= hostname %>-IP: [217.155.205.190] > > Dear Georg > > I find it a bit surprising that you end up with customer.x and customer.y. Can > you share with us a toy example of two data.frames which exhibit this > behaviour? > > On 06/06/2016 13:29, g.maub...@weinwolf.de wrote: > > Hi All, > > > > I merged two datasets: > > > > ds_merge1 <- merge(x = ds_bw_customer_4_match, y = > > ds_zww_customer_4_match, > > by.x = "customer", by.y = "customer", > > all.x = TRUE, all.y = FALSE) > > > > R created a new dataset with the variables customer.x and customer.y. > > I would like to merge these two variable back together. I wrote a > > little function (code can be run) for it: > > > > -- cut -- > > > > customer.x <- c("Miller", "Smith", NA,"Bird", NA) > > customer.y <- c("Miller", NA, "Doe", "Fish", NA) > > ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = > > FALSE) > > > > t_merge_variables <- > > function(dataset, > >var1, > >var2, > >merged_var) { > > > > # Initialize > > dataset[[merged_var]] = rep(NA, nrow(dataset)) > > dataset[["mismatch"]] = rep(NA, nrow(dataset)) > > > > for (i in 1:nrow(dataset)) { > > > > # Check 1: var1 missing, var2 missing > > if (is.na(dataset[[i, var1]]) & > > is.na(dataset[[i, var2]])) { > > dataset[["mismatch"]] <- 1 # var1 & var2 are missing
[R] Warning message in openxlsx
Hi All, I get the warning message Warning message: In styles$font : partial match of 'font' to 'fonts' when executing > xls_workbook <- t_create_workbook() > xls_sheetname <- "Kunden" > xls_ds_to_save <- ds_merge1 > xls_filename <- paste0(data_created, "_Merge1_BW-SAP-Kunden_cleaned.xlsx") > t_add_sheet(workbook = xls_workbook, + sheetname = xls_sheetname, + dataset = xls_ds_to_save) > t_write_xlsx(workbook = xls_workbook, + path = path_output, + filename = xls_filename, + overwrite = TRUE) where t_create_workbook() is return(createWorkbook()) and t_add_sheet() is addWorksheet(workbook, sheetName = sheetname) writeDataTable(workbook, sheet = sheetname, x = dataset) ### writeDataTable writes data to a sheet an adds ### autofilter to the first line if (freeze_row <= 1 | freeze_col <= 1) { NULL # do nothing } else { freezePane(workbook, sheet = sheetname, firstActiveRow = freeze_row, firstActiveCol = freeze_col) } setColWidths(workbook, sheet = sheetname, cols = 1:ncol(dataset), widths = "auto") and t_write_xlsx is saveWorkbook(workbook, file = file.path(path, filename), overwrite = overwrite) I am woundring what "partial match of 'font' to 'fonts'" means cause I do not call it in the functions calls. I use these calls a lot in my programs but never got this message before. What does this message mean? How can I avoid this message? Kind regards Georg Maubach PS: You can find more information about the used functions by going to https://sourceforge.net/projects/r-project-utilities/files/?source=navbar . __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installation of package "rio" broken
Hi all, today I wanted to install package "rio". As it depends on package "feather" which is only available as source I have chosen to install "rio" from source. The installations fails with the following messages: -- cut -- * installing *source* package 'feather' ... ** Paket 'feather' erfolgreich entpackt und MD5 Summen überprüft ** libs *** arch - i386 g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c RcppExports.cpp -o RcppExports.o g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c feather-read.cpp -o feather-read.o g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c feather-types.cpp -o feather-types.o g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c feather-write.cpp -o feather-write.o g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c feather/buffer.cc -o feather/buffer.o g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c feather/feather-c.cc -o feather/feather-c.o g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c feather/io.cc -o feather/io.o feather/io.cc:18:0: warning: "NOMINMAX" redefined [enabled by default] c:\program files\rtools\gcc-4.6.3\bin\../lib/gcc/i686-w64-mingw32/4.6.3/../../../../include/c++/4.6.3/i686-w64-mingw32/bits/os_defines.h:46:0: note: this is the location of the previous definition g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I. -I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" -I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall -mtune=core2 -c feather/metadata.cc -o feather/metadata.o feather/metadata.cc:29:7: error: expected nested-name-specifier before 'FBString' feather/metadata.cc:29:7: error: 'FBString' has not been declared feather/metadata.cc:29:16: error: expected ';' before '=' token feather/metadata.cc:29:16: error: expected unqualified-id before '=' token feather/metadata.cc:32:7: error: expected nested-name-specifier before 'ColumnVector' feather/metadata.cc:32:7: error: 'ColumnVector' has not been declared feather/metadata.cc:32:20: error: expected ';' before '=' token feather/metadata.cc:32:20: error: expected unqualified-id before '=' token feather/metadata.cc:178:3: error: 'ColumnVector' does not name a type feather/metadata.cc: In member function 'feather::Status feather::metadata::TableBuilder::Impl::Finish()': feather/metadata.cc:146:5: error: 'FBString' was not declared in this scope feather/metadata.cc:146:14: error: expected ';' before 'desc' feather/metadata.cc:148:7: error: 'desc' was not declared in this scope feather/metadata.cc:154:9: error: 'desc' was not declared in this scope feather/metadata.cc:156:27: error: 'columns_' was not declared in this scope feather/metadata.cc:157:34: error: unable to deduce 'auto' from '' feather/metadata.cc: In member function 'void feather::metadata::TableBuilder::Impl::add_column(const flatbuffers::Offset&)': feather/metadata.cc:173:5: error: 'columns_' was not declared in this scope feather/metadata.cc: In constructor 'feather::metadata::TableBuilder::TableBuilder()': feather/metadata.cc:190:5: error: type 'feather::metadata::TableBuilder' is not a direct base of 'feather::metadata::TableBuilder' make: *** [feather/metadata.o] Error 1 Warnung: Ausführung von Kommando 'make -f "Makevars" -f "C:/PROGRA~1/R/R-32~1.2/etc/i386/Makeconf" -f "C:/PROGRA~1/R/R-32~1.2/share/make/winshlib.mk" CXX='$(CXX1X) $(CXX1XSTD)' CXXFLAGS='$(CXX1XFLAGS)' CXXPICFLAGS='$(CXX1XPICFLAGS)' SHLIB_LDFLAGS='$(SHLIB_CXX1XLDFLAGS)' SHLIB_LD='$(SHLIB_CXX1XLD)' SHLIB="feather.dll" OBJECTS="RcppExports.o feather-read.o feather-types.o feather-write.o"' ergab Status 2 ERROR: compilation failed for package 'feather' * removing 'C:/Users/admin/Documents/R/win-library/3.2/feather' Warning in install.packages : running command '"C:/PROGRA~1/R/R-32~1.2/bin/x64/R" CMD INSTALL -l "C:\Users\admin\Documents\R\win-library\3.2" C:\Users\admin\AppData\Local\
[R] Building a binary vector out of dichotomous variables
Hi All, I need to build a binary vector made of a set of dichotomous variables. What I have so far is: -- cut -- ds_example <- structure( list( year2013 = c(0, 0, 0, 1, 1, 1, 1, 0), year2014 = c(0, 0, 1, 1, 0, 0, 1, 1), year2015 = c(0, 1, 1, 1, 0, 1, 0, 0) ), .Names = c("year2013", "year2014", "year2015"), row.names = c(NA, 8L), class = "data.frame" ) attach(ds_example) base <- 1000 binary_vector <- base + year2013 * 100 + year2014 * 10 + year2015 detach(ds_example) binary_vector ds_example <- cbind(ds_example, binary_vector) varlist <- c("year2013", "year2014", "year2015") base <- 10^length(varlist) binary_vector <- NULL for (i in 1:3) { binary_vector <- base + ds_example [[varlist[i]]] * base / (10 ^ i) } ds_example <- cbind(ds_example, binary_vector) message("Wrong result!") ds_example -- cut -- How do I get vectors like 1000 1001 1011 1100 1101 1110 1010 for each case? Is there a better approach than mine? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: Aw: Re: Building a binary vector out of dichotomous variables
> Hi Tom, > > thanks for your reply. > > Yes, that's exactly what I am looking for. I did not know about the automatic > type conversion in R. > > #-- cut -- > ds_example <- > structure( > list( > year2013 = c(0, 0, 0, 1, 1, 1, 1, 0), > year2014 = c(0, >0, 1, 1, 0, 0, 1, 1), > year2015 = c(0, 1, 1, 1, 0, 1, 0, 0) > ), > .Names = c("year2013", >"year2014", "year2015"), > row.names = c(NA, 8L), > class = "data.frame" > ) > > #-- Proposal: works! > as.numeric(with(ds_example,paste(1,year2013,year2014,year2015,sep=''))) > > # I store my know-how about R in functions for later use. > > #--´ Putting it in a function - does not work! > t_make_binary_vector <- function(dataset, > input_variables, > output_variable = "binary_vector") { > dataset[output_variable] <- "1" > print(dataset[output_variable]) > > for (variable in input_variables) { > print(variable) > dataset[output_variable] <- paste(dataset[output_variable], > dataset[variable], > sep='') > } > > # print(dataset[output_variable]) > > dataset[output_variable] <- as.integer(dataset[output_variable]) > > return(dataset) > } > > t_make_binary_vector(dataset = ds_example, > input_variables = c("year2013", "year2014", "year2015"), > output_variable = "binary_vector") > > > #-- Doesn't work either. > t_make_binary_vector <- function(dataset, > input_variables, > output_variable = "binary_vector") { > dataset[output_variable] <- as.integer(paste(1, dataset[ , > input_variables], sep = '')) > > return(dataset) > } > > t_make_binary_vector(dataset = ds_example, > input_variables = c("year2013", "year2014", "year2015"), > output_variable = "binary_vector") > > #-- cut -- > > Why is R taking the parameter value itself to paste it together instead of > referencing the variable within the dataset? > > What did I get wrong about R? How can I fix it? > > Kind regards > > Georg > > > > Gesendet: Donnerstag, 16. Juni 2016 um 16:13 Uhr > > Von: "Tom Wright" > > An: g.maub...@weinwolf.de > > Cc: "R. Help" > > Betreff: Re: [R] Building a binary vector out of dichotomous variables > > > > Does this do what you want? > > > > as.numeric(with(ds_example,paste(1,year2013,year2014,year2015,sep=''))) > > > > On Thu, Jun 16, 2016 at 8:57 AM, wrote: > > > Hi All, > > > > > > I need to build a binary vector made of a set of dichotomous variables. > > > > > > What I have so far is: > > > > > > -- cut -- > > > > > > ds_example <- > > > structure( > > > list( > > > year2013 = c(0, 0, 0, 1, 1, 1, 1, 0), > > > year2014 = c(0, > > >0, 1, 1, 0, 0, 1, 1), > > > year2015 = c(0, 1, 1, 1, 0, 1, 0, 0) > > > ), > > > .Names = c("year2013", > > >"year2014", "year2015"), > > > row.names = c(NA, 8L), > > > class = "data.frame" > > > ) > > > > > > attach(ds_example) > > > base <- 1000 > > > binary_vector <- base + year2013 * 100 + year2014 * 10 + year2015 > > > detach(ds_example) > > > > > > binary_vector > > > > > > ds_example <- cbind(ds_example, binary_vector) > > > > > > varlist <- c("year2013", "year2014", "year2015") > > > > > > base <- 10^length(varlist) > > > > > > binary_vector <- NULL > > > > > > for (i in 1:3) { > > > binary_vector <- > > >base + > > >ds_example [[varlist[i]]] * base / (10 ^ i) > > > } > > > > > > ds_example <- cbind(ds_example, binary_vector) > > > > > > message("Wrong result!") > > > ds_example > > > > > > -- cut -- > > > > > > How do I get vectors like 1000 1001 1011 1100 1101 1110 1010 for > > > each case? > > > > > > Is there a better approach than mine? > > > > > > Kind regards > > > > > > Georg > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
[R] (Off-Topic] Introducing a new R Blog
Hi All, today I would like to announce a now R blog. I contains a few entries about the findings during my course of studies and my daily work: https://github.com/gmaubach/R-Know-How/wiki/R-Blog I hope you'll find my hints usefull. In addition you could have a look at a small R collection of functions I found usefull when working with my data: https://github.com/gmaubach/R-Project-Utilities Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subscripting problem with is.na()
Hi All, I would like to recode my NAs to 0. Using a single vector everything is fine. But if I use a data.frame things go wrong: -- cut -- var1 <- c(1:3, NA, 5:7, NA, 9:10) var2 <- c(1:3, NA, 5:7, NA, 9:10) ds_test <- data.frame(var1, var2) test <- var1 test[is.na(test)] <- 0 test # NA recoded OK # First try ds_test[is.na(ds_test$var1)] <- 0 # duplicate subscripts WRONG # Second try ds_test[is.na("var1")] <- 0 ds_test$var1 # not recoded WRONG # Third try: to me the most intuitive approach is.na(ds_test["var1"]) <- 0 # attempt to select less than one element in integerOneIndex WRONG # Fourth try ds_test[is.na(var1)] <- 0 # duplicate subscripts for columns WRONG -- cut -- How can I do it correctly? Where could I have found something about it? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] r_toolbox: Update
Hi folks, I have updated the functions of the r_toolbox.R set of utilities: https://sourceforge.net/projects/r-project-utilities/files/?source=navbar Naming was changed with some functions to reflect similar functions in SAS or SPSS, e. g. t_n_miss, t_n_valid. In addition I added functions for reporting memory usage, selecting variables by type and getting an overview over the levels of factors. I hope you find these functions useful. Please get back to me if you have suggestions or encounter any difficulties. Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subscripting problem with is.na()
Hi Bert, many thanks for all your help and your comments. I learn at lot this way. My question was about is.na() at the first sight but the actual task looks like this: I have two variables in my customer data that signal if the customer accout was closed by master data management or by sales. Say these variables are closed_mdm and closed_sls. They contain NA if the customer account is still open or a closing code from "01" to "08" if the customer account was closed and why. For my analysis I need a variable that combines the two variables closed_mdm and closed_sls to set a filter easily on those who are closed not matter what the reason was nor who closed the account. As I always encounter problems when dealing with ifelse statements and NA I decided to merge these two variables to one variable containing 0 = not closed and 1 = closed. In my context this seems to be - at least to me - a reasonable approach. Replacement of missing values and merging the variables is the easiest way for me. -- cut -- cust_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20) closed_mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, "04", NA, NA, NA, NA, NA, NA, NA) closed_sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, NA, NA, "05", NA, NA, NA, NA, NA) # 1st try ds_temp1 <- data.frame(cust_id, closed_mdm, closed_sls) ds_temp1 ds_temp1$closed <- closed_mdm | closed_sls # WRONG # 2nd try closed_mdm_fac1 <- as.factor(closed_mdm) closed_sls_fac1 <- as.factor(closed_sls) ds_temp2 <- data.frame(cust_id, closed_mdm_fac1, closed_sls_fac1) ds_temp2 ds_temp2$closed <- ds_temp$closed_mdm_fac1 | ds_temp$closed_sls_fac1 # WRONG # 3rd try closed_mdm_num1 <- as.numeric(closed_mdm) # OK closed_sls_num1 <- as.numeric(closed_sls) # OK ds_temp3 <- data.frame(cust_id, closed_mdm_num1, closed_sls_num1) ds_temp3 ds_temp3$closed <- ds_temp$closed_mdm_num1 | ds_temp$closed_sls_num1 # WRONG # 4th try ds_temp4 <- ds_temp3 ds_temp4 # Does not run due to not allowed NA in subscripts ds_temp4[is.na(ds_temp4$closed_mdm_num1), ds_temp4$closed_mdm_num1] <- 0 ds_temp4[is.na(ds_temp4$closed_sls_num1), ds_temp4$closed_sls_num1] <- 0 # 5th try ds_temp4$closed_mdm_num1 <- ifelse(is.na(ds_temp4$closed_mdm_num1), 1, 0) ds_temp4$closed_sls_num1 <- ifelse(is.na(ds_temp4$closed_sls_num1), 1, 0) ds_temp4 ds_temp4$closed <- ifelse(ds_temp4$closed_mdm_num1 == 1 | ds_temp4$closed_sls_num1 == 1, 1, 0) ds_temp4 -- cut -- Is there a better way to do it? Kind regards Georg > Gesendet: Donnerstag, 23. Juni 2016 um 23:55 Uhr > Von: "Bert Gunter" > An: "David L Carlson" > Cc: "R Help" > Betreff: Re: [R] Subscripting problem with is.na() > > ... actually, FWIW, I would say that this little discussion mostly > demonstrates why the OP's request is probably not a good idea in the > first place. Usually, NA's should be left as NA's to be dealt with > properly by R and packages. In biological measurements, for example, > NA's often mean "below the ability to reliably measure." Biologists > with whom I've worked over many years often want to convert these to 0 > or omit the cases, both of which lead to biased estimates and/or > underestimates of variability and excess claims of "statistical > significance" (for those who belong to this religious persuasion). One > should never say never, but I suspect that there are relatively few > circumstances where the conversion the OP requested is actually wise. > > Feel free to ignore/reject such extraneous comments of course. > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson wrote: > > Good point. I did not think about factors. Also your example raises another > > issue since column c is logical, but gets silently converted to numeric. > > This would seem to get the job done assuming the conversion is intended for > > numeric columns only: > > > >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > >> sapply(test, class) > > a b c > > "numeric" "factor" "logical" > >> num <- sapply(test, is.numeric) > >> test[, num][is.na(test[, num])] <- 0 > >> test > > ab c > > 1 1A NA > > 2 0b NA > > 3 2 NA > > > > David C > > > > -Original Message- > > From: Bert Gunter [mailto:bgunter.4...@gmail.com] > > Sent: Thursday, June 23, 2016 1:48 PM > > To: David L Carlson > > Cc: Ivan Calandra; R Help > > Subject: Re: [R] Subscripting problem with is.na() > > > > Not in general, David: > > > > e.g. > > > >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > > > >> is.na(test) > > a bc > > [1,] FALSE FALSE TRUE > > [2,] TRUE FALSE TRUE > > [3,] FALSE TRUE TRUE > > > >> test[is.na(test)] > > [1] NA NA NA NA NA > > > >> test[is.na(test)
[R] Antwort: Fw: Re: Subscripting problem with is.na()
Hi David, Hi Bert, many thanks for the valuable discussion on NA in R (please see extract below). I follow your arguments leaving NA as they are for most of the time. In special occasions however I want to replace the NA with another value. To preserve the newly acquired knowledge for me I wrote this function: -- cut -- t_replace_na <- function(dataset, variable, value) { if(inherits(dataset[[variable]], "factor") == TRUE) { dataset[variable] <- as.character(dataset[variable]) print(class(dataset[variable])) dataset[, variable][is.na(dataset[, variable])] <- value dataset[variable] <- as.factor(dataset[variable]) print(class(dataset[variable])) } else { dataset[, variable][is.na(dataset[, variable])] <- value } return(dataset) } ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA)) print(sapply(ds_test, class)) t_replace_na(ds_test, "a", value = -1) t_replace_na(ds_test, "b", value = -2) t_replace_na(ds_test, "c", value = -3) -- cut -- Unfortunately the if-statement does not work due to a wrong class definition within the function. When finding out what is going on I did this: -- cut -- test_class <- function(dataset, variable) { if(inherits(dataset[, variable], "factor") == TRUE) { return(c(class(dataset[variable]), TRUE)) } else { return(c(class(dataset[variable]), FALSE)) } } ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA)) print(sapply(ds_test, class)) # -- Test a -- class(ds_test[, "a"]) if(inherits(ds_test[, "a"], "factor")) { print(c(class(ds_test[, "a"]), "TRUE")) } else { print(c(class(ds_test[, "a"]), "FALSE")) } test_class(ds_test, "a") warning("'a' should be numeric NOT data.frame!") # -- Test b -- if(inherits(ds_test[, "b"], "factor")) { print(c(class(ds_test[, "b"]), "TRUE")) } else { print(c(class(ds_test[, "b"]), "FALSE")) } class(ds_test[, "b"]) test_class(ds_test, "b") warning("'b' should be logical NOT data.frame!") # -- Test c -- if(inherits(ds_test[, "c"], "factor")) { print(c(class(ds_test[, "c"]), "TRUE")) } else { print(c(class(ds_test[, "c"]), "FALSE")) } class(ds_test[, "c"]) test_class(ds_test, "c") warning("'c' should be factor NOT data.frame. In addition data.frame != factor") -- cut -- Why do I get different results for the same function if it is inside or outside my own function definition? Kind regards Georg > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr > Von: "David L Carlson" > An: "Bert Gunter" > Cc: "R Help" > Betreff: Re: [R] Subscripting problem with is.na() > > Good point. I did not think about factors. Also your example raises another issue since column c is logical, but gets silently converted to numeric. This would seem to get the job done assuming the conversion is intended for numeric columns only: > > > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > > sapply(test, class) > a b c > "numeric" "factor" "logical" > > num <- sapply(test, is.numeric) > > test[, num][is.na(test[, num])] <- 0 > > test > ab c > 1 1A NA > 2 0b NA > 3 2 NA > > David C __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()
Hi Petr, many thanks for your reply and the examples. My subscripting problems drive me nuts. I have understood that dataset[variable] is semantically identical to dataset[, variable] cause dataset[variable] takes all cases because no other subscripts are given. Where can I lookup the rules when to use the comma and when not? Kind regards Georg Von:PIKAL Petr An: "g.maub...@weinwolf.de" , Kopie: "r-help@r-project.org" Datum: 27.06.2016 11:03 Betreff:RE: [R] Antwort: Fw: Re: Subscripting problem with is.na() Hi see in line > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@weinwolf.de > Sent: Monday, June 27, 2016 10:45 AM > To: David L Carlson ; Bert Gunter > > Cc: r-help@r-project.org > Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na() > > Hi David, > Hi Bert, > > many thanks for the valuable discussion on NA in R (please see extract > below). I follow your arguments leaving NA as they are for most of the > time. In special occasions however I want to replace the NA with another > value. To preserve the newly acquired knowledge for me I wrote this > function: > > -- cut -- > t_replace_na <- function(dataset, variable, value) { > if(inherits(dataset[[variable]], "factor") == TRUE) { >dataset[variable] <- as.character(dataset[variable]) >print(class(dataset[variable])) >dataset[, variable][is.na(dataset[, variable])] <- value >dataset[variable] <- as.factor(dataset[variable]) >print(class(dataset[variable])) > } else { >dataset[, variable][is.na(dataset[, variable])] <- value > } > return(dataset) > } > > class(ds_test[, "c"]) > test_class(ds_test, "c") > warning("'c' should be factor NOT data.frame. > In addition data.frame != factor") > -- cut -- > > Why do I get different results for the same function if it is inside or > outside my own function definition? Because you still are missing the way how to subscript data frames. test_class <- function(dataset, variable) { if(inherits(dataset[, variable], "factor") == TRUE) { return(c(class(dataset[,variable]), TRUE)) } else { return(c(class(dataset[,variable]), FALSE)) ## } } > test_class(ds_test, "a") [1] "numeric" "FALSE" > test_class(ds_test, "c") [1] "factor" "TRUE" > If you properly arrange commas in your function you get desired result p_replace_na <- function(dataset, variable, value) { if(inherits(dataset[,variable], "factor") == TRUE) { dataset[,variable] <- as.character(dataset[,variable]) print(class(dataset[,variable])) dataset[, variable][is.na(dataset[, variable])] <- value dataset[, variable] <- as.factor(dataset[, variable]) print(class(dataset[, variable])) } else { dataset[, variable][is.na(dataset[, variable])] <- value } return(dataset) } > p_replace_na(ds_test, "c", value = -3) [1] "character" [1] "factor" a b c 1 1 NA A 2 NA NA b 3 2 NA -3 > t_replace_na(ds_test, "c", value = -3) [1] "data.frame" Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? > Cheers Petr > > Kind regards > > Georg > > > > > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr > > Von: "David L Carlson" > > An: "Bert Gunter" > > Cc: "R Help" > > Betreff: Re: [R] Subscripting problem with is.na() > > > > Good point. I did not think about factors. Also your example raises > another issue since column c is logical, but gets silently converted to > numeric. This would seem to get the job done assuming the conversion is > intended for numeric columns only: > > > > > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > > > sapply(test, class) > > a b c > > "numeric" "factor" "logical" > > > num <- sapply(test, is.numeric) > > > test[, num][is.na(test[, num])] <- 0 > > > test > > ab c > > 1 1A NA > > 2 0b NA > > 3 2 NA > > > > David C > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jak�koliv k n�mu p�ipojen� dokumenty jsou d�v�rn� a jsou ur�eny pouze jeho adres�t�m. Jestli�e jste obdr�el(a) tento e-mail omylem, informujte laskav� neprodlen� jeho odes�latele. Obsah tohoto emailu i s p��lohami a jeho kopie vyma�te ze sv�ho syst�mu. Nejste-li zam��len�m adres�tem tohoto emailu, nejste opr�vn�ni tento email jakkoliv u��vat, roz�i�ovat, kop�rovat �i zve�ej�ovat. Odes�latel e-mailu neodpov�d� za eventu�ln� �kodu zp�sobenou modifikacemi �i zpo�d�n�m p�enosu e-mailu. V p��pad�, �e je tento e-mail sou��st� obchodn�ho jedn�n�: - vyhrazuje si odes�latel pr�vo ukon�it kd
[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()
Hi All, Petr, Bert, David, Ivan, Duncan and Rui helped me to develop a function able to replace NA's in variables IF NEEDED: #--- # Module: t_replace_na.R # Author: Georg Maubach # Date : 2016-06-27 # Update: 2016-06-27 # Description : Replace NA with another value # Source System : R 3.3.0 (64 Bit) # Target System : R 3.3.0 (64 Bit) # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #1-2-3-4-5-6-7-8 t_version = "2016-06-27" t_module_name = "t_replace_na.R" cat( paste0("\n", t_module_name, " (Version: ", t_version, ")", "\n", "\n", "This software comes with ABSOLUTELY NO WARRANTY.", "\n", "\n")) # If do_test is not defined globally define it here locally by un-commenting it t_do_test <- FALSE # [ Function Defintion ] t_replace_na <- function(dataset, variables, value) { # Replace NA with another given value # # Args: # dataset (data frame, data table): # Object with dimnames, e.g. data frame, data table. # variables (character vector): # List of variable names. # # Operation: # NA is replaced by the value given with the parameter "value". # # A factor is converted explicitly with as.character(), the missing value # replacement is done and then the character vector is converted back with # as.factor(). Thus NA becomes a category of the new factor variable. # # Caution: # Please check your data in case you replace NA within factors due to # explicit type conversion. Tests were done only for the below given # dataset. # # Returns: # Original dataset. # # Error handling: # None. # # Credits: https://www.mail-archive.com/r-help@r-project.org/msg236537.html for (variable in variables) { if (inherits(dataset[, variable], "factor") == TRUE) { dataset[, variable] <- as.character(dataset[, variable]) print(class(dataset[, variable])) dataset[, variable][is.na(dataset[, variable])] <- value dataset[, variable] <- as.factor(dataset[, variable]) print(class(dataset[, variable])) } else { dataset[, variable][is.na(dataset[, variable])] <- value } } return(dataset) } # [ Test Defintion ] t_test <- function(do_test = FALSE) { if (do_test == TRUE) { cat("\n", "\n", "Test function t_count_na()", "\n", "\n") # Example dataset ds_example <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA)) cat("\n", "\n", "Example dataset before function call", "\n", "\n") cat("Variables and their classes:\n") print(sapply(ds_example, class)) cat("Dataset:\n") print(ds_example) cat("\n", "\n", "Function call", "\n", "\n") ds_result <- t_replace_na(ds_example, "a", value = -1) cat("\n", "\n", "Dataset after function call", "\n", "\n") print(ds_result) cat("\n", "\n", "Function call", "\n", "\n") ds_result <- t_replace_na(ds_example, "b", value = -2) cat("\n", "\n", "Example dataset after function call", "\n", "\n") print(ds_result) cat("\n", "\n", "Function call", "\n", "\n") ds_result <- t_replace_na(ds_example, "c", value = -3) cat("\n", "\n", "Example dataset after function call", "\n", "\n") print(ds_result) } } # [ Test Run ]-- t_test(do_test = t_do_test) # [ Clean up ]-- rm("t_module_name", "t_version", "t_do_test", "t_test") # EOF . Please note: R has capabilities to handle NA correctly. There is often no need to recode NA. Also NA might or might not have meaning. You have to decide with regard to the meaning of the original data and the business problem. Kind regards Georg Von:PIKAL Petr An: "g.maub...@weinwolf.de" , Kopie: "r-help@r-project.org" Datum: 27.06.2016 11:03 Betreff:RE: [R] Antwort: Fw: Re: Subscripting problem with is.na() Hi see in line > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@weinwolf.de > Sent: Monday, June 27, 2016 10:45 AM > To: David L Carlson ; Bert Gunter > > Cc: r-help@r-project.org > Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na() > > Hi David, > Hi Bert, > > many thanks for the valuable discussion on NA in R (please see extract > below). I follow your arguments leaving NA as they are for most of the > time. In special occasions however I want to replace the NA with another > value. To preserve the newly acquired knowledge for me I wrote this > function: > > --
[R] Installing from source on Windows 7: tibble
Hi All, I would like to install R packages from source on Windows 7 64-Bit. Currently my settings are: -- cut -- > sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.3.0 -- cut -- The environment variable PATH on Windows 7 is set to: C:\R-Project\Rtools\mingw_32\bin;C:\R-Project\Rtools\mingw_64\bin;C:\R-Project\Rtools\bin;C:\R-Project\Rtools\gcc-4.6.3\bin;C:\Program Files\Python 3.5\Scripts\;C:\Program Files\Python 3.5\;C:\Python27\;C:\Python27\Scripts; etc. etc. RTools is installed in C:\R-Project\RTools The call of C:\R-Project\Rtools\mingw_64\bin\g++.exe --version results in g++ (x86_64-posix-seh, Built by MinGW-W64 project) 4.9.3 If I do > install.packages("tibble", type = "source") I get -- cut -- trying URL 'https://cran.uni-muenster.de/src/contrib/tibble_1.0.tar.gz' Content type 'application/x-gzip' length 38038 bytes (37 KB) downloaded 37 KB * installing *source* package 'tibble' ... ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft ** libs *** arch - i386 c:/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG -I"C:/R-Project/R-3.3.0/library/Rcpp/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c RcppExports.cpp -o RcppExports.o c:/Rtools/mingw_32/bin/g++: not found make: *** [RcppExports.o] Error 127 Warnung: Ausführung von Kommando 'make -f "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab Status 2 ERROR: compilation failed for package 'tibble' * removing 'C:/R-Project/R-3.3.0/library/tibble' * restoring previous 'C:/R-Project/R-3.3.0/library/tibble' Warning in install.packages : running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l "C:\R-Project\R-3.3.0\library" C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\Rtmp23SQxM/downloaded_packages/tibble_1.0.tar.gz' had status 1 Warning in install.packages : installation of package ‘tibble’ had non-zero exit status -- cut -- There is no make.conf in "C:\R-Project\Rtools\mingw_64\etc". I found " Makeconf" in "C:\R-Project\R-3.3.0\etc\x64". Do I need it? How do I need to configure the settings in this file? I searched old aunt Google but did not understand what to do and how to configure R environment variables correctly. What do I need to do to install packages from source? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Installing from source on Windows 7: tibble
Hi Duncan, many thanks for your reply. I did insert die paths to the g++ compiler because I got the message about the not existent compiler. I took the directories for the compiler out again: C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program Files\Python 3.5\Scripts\;C:\Program Files\Python 3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc. Calling install.packages("tibble", type = "source") gives this message: -- cut -- * installing *source* package 'tibble' ... ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft ** libs *** arch - i386 c:/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG -I"C:/R-Project/R-3.3.0/library/Rcpp/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c RcppExports.cpp -o RcppExports.o c:/Rtools/mingw_32/bin/g++: not found make: *** [RcppExports.o] Error 127 Warnung: Ausführung von Kommando 'make -f "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab Status 2 ERROR: compilation failed for package 'tibble' * removing 'C:/R-Project/R-3.3.0/library/tibble' * restoring previous 'C:/R-Project/R-3.3.0/library/tibble' Warning in install.packages : running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l "C:\R-Project\R-3.3.0\library" C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\RtmpGqOlOW/downloaded_packages/tibble_1.0.tar.gz' had status 1 Warning in install.packages : installation of package ‘tibble’ had non-zero exit status -- cut -- What else could I do? Kind regards Georg Von:Duncan Murdoch An: g.maub...@weinwolf.de, r-help@r-project.org, Datum: 29.06.2016 13:07 Betreff:Re: [R] Installing from source on Windows 7: tibble On 29/06/2016 5:49 AM, g.maub...@weinwolf.de wrote: > Hi All, > > I would like to install R packages from source on Windows 7 64-Bit. > Currently my settings are: > > -- cut -- >> sessionInfo() > R version 3.3.0 (2016-05-03) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > [5] LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] tools_3.3.0 > -- cut -- > > The environment variable PATH on Windows 7 is set to: > > C:\R-Project\Rtools\mingw_32\bin;C:\R-Project\Rtools\mingw_64\bin;C:\R-Project\Rtools\bin;C:\R-Project\Rtools\gcc-4.6.3\bin;C:\Program > Files\Python 3.5\Scripts\;C:\Program Files\Python > 3.5\;C:\Python27\;C:\Python27\Scripts; etc. etc. Take the mingw_32, mingw_64 and gcc-4.6.3 directories off your path. They aren't needed; the first two could conceivably be harmful. > > RTools is installed in C:\R-Project\RTools > > The call of > > C:\R-Project\Rtools\mingw_64\bin\g++.exe --version > > results in > > g++ (x86_64-posix-seh, Built by MinGW-W64 project) 4.9.3 > > If I do > > >> install.packages("tibble", type = "source") > > I get > > -- cut -- > trying URL 'https://cran.uni-muenster.de/src/contrib/tibble_1.0.tar.gz' > Content type 'application/x-gzip' length 38038 bytes (37 KB) > downloaded 37 KB > > * installing *source* package 'tibble' ... > ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft > ** libs > > *** arch - i386 > c:/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG > -I"C:/R-Project/R-3.3.0/library/Rcpp/include" > -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c > RcppExports.cpp -o RcppExports.o > c:/Rtools/mingw_32/bin/g++: not found > make: *** [RcppExports.o] Error 127 > Warnung: Ausführung von Kommando 'make -f > "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f > "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" > SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' > SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab > Status 2 > ERROR: compilation failed for package 'tibble' > * removing 'C:/R-Project/R-3.3.0/library/tibble' > * restoring previous 'C:/R-Project/R-3.3.0/library/tibble' > Warning in install.packages : > running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l > "C:\R-Project\R-3.3.0\library" > C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\Rtmp23SQxM/downloaded_packages/tibble_1.0.tar.gz' > had status 1 > Warning in install.packages : > installation of package ‘tibble’ had non-zero exit status > -- cut -- > > There is no make.conf in "C:\R-Project\Rtools\mingw_64\etc". I found " > Makeconf" in "C:\R-Project\R-3.3.0\etc\x64". Do I need it? How do I need > to configure the settings in this file? Yes, since you haven't installed Rtools in the default location, you should edit two Makeconf files. In C:\R-Project\R-3.3.0\etc\x64
[R] Antwort: Re: Antwort: Re: Installing from source on Windows 7: tibble [SOLVED]
Hi Duncan, indeed, I did not see the other part of your message. I did BINPREF ?= C:/R-Project/Rtools/mingw_32/bin/ COMPILED_BY = g++ # instead of gcc-4.9.3 in "C:\R-Project\R-3.3.0\etc\i386\Makeconf" and BINPREF ?= C:/R-Project/Rtools/mingw_64/bin/ COMPILED_BY = g++ # instead of gcc-4.9.3 in "C:\R-Project\R-3.3.0\etc\x64\Makeconf" Now I could compile the package with no futher errors. Messages are -- cut -- * installing *source* package 'tibble' ... ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft ** libs *** arch - i386 C:/R-Project/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c RcppExports.cpp -o RcppExports.o C:/R-Project/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c matrixToDataFrame.cpp -o matrixToDataFrame.o C:/R-Project/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o tibble.dll tmp.def RcppExports.o matrixToDataFrame.o -Ld:/Compiler/gcc-4.9.3/local330/lib/i386 -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/i386 -lR installing to C:/R-Project/R-3.3.0/library/tibble/libs/i386 *** arch - x64 C:/R-Project/Rtools/mingw_64/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c RcppExports.cpp -o RcppExports.o C:/R-Project/Rtools/mingw_64/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c matrixToDataFrame.cpp -o matrixToDataFrame.o C:/R-Project/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o tibble.dll tmp.def RcppExports.o matrixToDataFrame.o -Ld:/Compiler/gcc-4.9.3/local330/lib/x64 -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/x64 -lR installing to C:/R-Project/R-3.3.0/library/tibble/libs/x64 ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded *** arch - i386 *** arch - x64 * DONE (tibble) -- cut -- So - complete success. Many thanks for your help. One last questions: Why did Rtools.exe not create a directory named "gcc-4.9.3" in "C:\R-Project\Rtools" and putting " C:\R-Project\Rtools\mingw_32" and "C:\R-Project\Rtools\mingw_64" directly in "C:\R-Project\Rtools\"? gcc-4.6.3 was installed that way. Kind regards Georg Von:Duncan Murdoch An: g.maub...@weinwolf.de, Kopie: r-help@r-project.org Datum: 29.06.2016 16:21 Betreff:Re: Antwort: Re: [R] Installing from source on Windows 7: tibble On 29/06/2016 10:17 AM, g.maub...@weinwolf.de wrote: > Hi Duncan, > > many thanks for your reply. > > I did insert die paths to the g++ compiler because I got the message about > the not existent compiler. > > I took the directories for the compiler out again: > > C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program > Files\Python 3.5\Scripts\;C:\Program Files\Python > 3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc. > > Calling > > install.packages("tibble", type = "source") > > > gives this message: > > -- cut -- > * installing *source* package 'tibble' ... > ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft > ** libs > > *** arch - i386 > c:/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG > -I"C:/R-Project/R-3.3.0/library/Rcpp/include" > -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c > RcppExports.cpp -o RcppExports.o > c:/Rtools/mingw_32/bin/g++: not found > make: *** [RcppExports.o] Error 127 > Warnung: Ausführung von Kommando 'make -f > "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f > "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" > SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' > SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab > Status 2 > ERROR: compilation failed for package 'tibble' > * removing 'C:/R-Project/R-3.3.0/library/tibble' > * restoring previous 'C:/R-Project/R-3.3.0/library/tibble' > Warning in install.packages : >running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l > "C:\R-Project\R-3.3.0\library" > C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\RtmpGqOlOW/downloaded_packages/tibble_1.0.tar.gz' > had status 1 > Warning in install.packages : >installation of package ‘tibble’ had non-zero exit status > -- cut -- > > What else could I do? You seem to have missed the second part of my advice, describing what to do with the two Makeconf files. Duncan Murdoch > > Kind regards > > Georg > > > > > > Von:Duncan Murdoch > An: g.maub...@weinwolf.de, r-help@r-project.org, > Datum: 29.06.2016 13:07 >
[R] Antwort: Re: Antwort: Re: Antwort: Re: Installing from source on Windows 7: tibble [RE OPENED]
Hi Duncan, I would not have changed the COMPILED_BY option unless I thought I have to. In my "C:\R-Project\Rtools\mingw_32\bin" I have c++.exe g++.exe gcc.exe i686-w64-mingw32-c++.exe i686-w64-mingw32-g++.exe i686-w64-mingw32-gcc-4.9.3.exe i686-w64-mingw32-gcc.exe In my "C:\R-Project\Rtools\mingw_64\bin" I have c++.exe cpp.exe g++.exe gcc.exe x86_64-w64-mingw32-c++.exe x86_64-w64-mingw32-g++.exe x86_64-w64-mingw32-gcc-4.9.3.exe x86_64-w64-mingw32-gcc.exe Which one should I configure and use? Kind regards Georg Von:Duncan Murdoch An: g.maub...@weinwolf.de, Kopie: r-help@r-project.org Datum: 29.06.2016 17:34 Betreff:Re: Antwort: Re: Antwort: Re: [R] Installing from source on Windows 7: tibble [SOLVED] On 29/06/2016 10:48 AM, g.maub...@weinwolf.de wrote: > Hi Duncan, > > indeed, I did not see the other part of your message. > > I did > > BINPREF ?= C:/R-Project/Rtools/mingw_32/bin/ > COMPILED_BY = g++ # instead of gcc-4.9.3 I wouldn't change the COMPILED_BY; some packages use it to configure themselves for gcc-4.9.3, as opposed to the previous version gcc-4.6.3. > > in "C:\R-Project\R-3.3.0\etc\i386\Makeconf" > > and > > BINPREF ?= C:/R-Project/Rtools/mingw_64/bin/ > COMPILED_BY = g++ # instead of gcc-4.9.3 > > in "C:\R-Project\R-3.3.0\etc\x64\Makeconf" > > Now I could compile the package with no futher errors. > > Messages are > > -- cut -- > * installing *source* package 'tibble' ... > ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft > ** libs > > *** arch - i386 > C:/R-Project/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" > -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" > -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c > RcppExports.cpp -o RcppExports.o > C:/R-Project/Rtools/mingw_32/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" > -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" > -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c > matrixToDataFrame.cpp -o matrixToDataFrame.o > C:/R-Project/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o > tibble.dll tmp.def RcppExports.o matrixToDataFrame.o > -Ld:/Compiler/gcc-4.9.3/local330/lib/i386 > -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/i386 -lR > installing to C:/R-Project/R-3.3.0/library/tibble/libs/i386 > > *** arch - x64 > C:/R-Project/Rtools/mingw_64/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" > -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" > -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c > RcppExports.cpp -o RcppExports.o > C:/R-Project/Rtools/mingw_64/bin/g++ -I"C:/R-PROJ~1/R-33~1.0/include" > -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" > -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall -mtune=core2 -c > matrixToDataFrame.cpp -o matrixToDataFrame.o > C:/R-Project/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o > tibble.dll tmp.def RcppExports.o matrixToDataFrame.o > -Ld:/Compiler/gcc-4.9.3/local330/lib/x64 > -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/x64 -lR > installing to C:/R-Project/R-3.3.0/library/tibble/libs/x64 > ** R > ** inst > ** preparing package for lazy loading > ** help > *** installing help indices > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded > *** arch - i386 > *** arch - x64 > * DONE (tibble) > -- cut -- > > So - complete success. > > Many thanks for your help. > > One last questions: Why did Rtools.exe not create a directory named > "gcc-4.9.3" in "C:\R-Project\Rtools" and putting" > C:\R-Project\Rtools\mingw_32" and "C:\R-Project\Rtools\mingw_64" directly > in "C:\R-Project\Rtools\"? gcc-4.6.3 was installed that way. The 4.6.3 compiler was compiled for "multilib" operation: the same compiler took command line options to distinguish between 32 bit and 64 bit compiles. The newer version doesn't support that, so we need two separate installs. Duncan Murdoch > Kind regards > > Georg > > > > > > Von:Duncan Murdoch > An: g.maub...@weinwolf.de, > Kopie: r-help@r-project.org > Datum: 29.06.2016 16:21 > Betreff:Re: Antwort: Re: [R] Installing from source on Windows 7: > tibble > > > > On 29/06/2016 10:17 AM, g.maub...@weinwolf.de wrote: > > Hi Duncan, > > > > many thanks for your reply. > > > > I did insert die paths to the g++ compiler because I got the message > about > > the not existent compiler. > > > > I took the directories for the compiler out again: > > > > C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program > > Files\Python 3.5\Scripts\;C:\Program Files\Python > > 3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc. > > > > Calling > > > > install.packages("tibble", type = "source") > > > > > > gives this message: > > > > -- cut -- > > * installing *source* package 'tibble' ... > > ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft > > ** libs > > > > *** arch - i386 > > c:/Rt
[R] Writing a formula to Excel
Hi All, I am using excel.link to work seemslessly with Excel. In addition to values, like numbers and strings, I would like to insert a full operational formula into a cell. xlc["G14"] <- print(paste("=G9*100/G6"), quote = FALSE) The strings is put into the cell, but the cell is not evaluated. Thus the string is show as result of the computation. If I open that cell b pressing "F2" or by double-clicking the cell and pressing RETURN will start the evaluation of the expession. xlc["G14"] <- parse("=G9*100/G6") # does not run How can I put a formula into Excel that is evaluated right away? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Documenting data
Hi Pito, Dear Readers, as other have already mentioned, there are good practices for documenting code and data. I would like to summarize them and add a few not mentioned earlier: 1. You should have always two things: your raw data and your R script/s. The raw data is immutable whereas the R script/s produce the results. 2. You might want to distinguish between documentating your CODE and documenting your DATA. Documenting code is similar to what you already know from your programmng experiences. Documenting data is somewhat different cause you store information about the meaning of you data directly in your data. Example You have a variable with codes ranging from 1 to 5. But what do they mean? Perhaps it could be 1 = Strongly agree 2 = Agree 3 = Neither agree/nor disagree 4 = Disagree 5 = Strongly Disagree But it could also be the other way round: 1 = Strongly Disagree 2 = Disagree 3 = Nether agree/nor disagree 4 = Agree 5 = Strongly Agree What the codes in your variable means depends on the systems oder processes you derived your data from. Within R there are some limitations for storing the informtation about what a variable or a value within a variable means. Possibilities to store this information is in other software packages like SAS or SPSS much broader implemented. In R you can work with meaningful variable names and the data type/class factor which can store mappings between values and value descriptions. Example -- cut -- var1 <- c(rep(1:5, 3)) ds_example <- data.frame(var1) var1_labels <- c("1 = Strongly Agree", "2 = Agree", "3 = Neither agree/nor disagree", "4 = Disagree", "5 = Strongly disagree") ds_example[["var1"]] <- factor(ds_example[["var1"]], levels = c(1, 2, 3, 4, 5), labels = var1_labels) summary(ds_example["var1"]) -- cut -- In addition you find methods to work with variable labels and value labels in the pacakges Hmisc and memisc. They can also produce a thing called codebook which contains all variable names, variable labels, values, value labels and summaries of the distribution of values within the variables. 3. In addition to this you could structure your script in a modular way according to the analysis process, e. g. importing, cleaning, preparation for analysis, analysis, reporting. Other structure may be more sufficient in your case. These modules could have a number in the file name indicating in which sequence the scripts should be run. 4. I find it valuable to use a software repository like Github, Sourceforge or others to keep the revisions save and seucre in case you would like to go back to a version with code you deleted before and figure out that you need it now again. The R Studio IDE has an interface to git if you like to go with that. Good commit message can help you track what has changed. Commits also help you to prepare precise steps when developing your scripts. 5. I have no experience with Sweave or knitr but you could also compile a simple documentation through copying comments to an Excel sheet using R-2-Excel libraries like excel.link or others. Example install.packages("excel.link") library(excel.link) xlc["A1"] <- "Project Documentation" xlc["A2"] <- "Step XY" xlc["A3"] <- "Some explanation about step xy" This way you have the documentation in your code and in an external source. Which approach you chose depends on your experience with R and its libraries as well as the size of your project and the need for documentation. 6. It can be helpful to store interim results in a format that can be read by non-R-users, e. g. Excel. 7. Documenting code can be done using roxygen2. If there are different opinions to my suggestions please say so. Kind regards Georg > Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr > Von: "Pito Salas" > An: r-help@r-project.org > Betreff: [R] Documenting data > > I am studying statistics and using R in doing it. I come from software > development where we document everything we do. > > As I “massage” my data, adding columns to a frame, computing on other data, > perhaps cleaning, I feel the need to document in detail what the meaning, or > background, or calculations, or whatever of the data is. After all it is now > derived from my raw data (which may have been well documented) but it is > “new.” > > Is this a real problem? Is there a “best practice” to address this? > > Thanks! > > Pito Salas > Brandeis Computer Science > Feldberg 131 > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE
Re: [R] Documenting data
Hi Bert, Hi Readers, I did not know much about attributes in R and how to use them. If it is that flexible you are right and I have learnt something. Kind regards Georg > Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr > Von: "Bert Gunter" > An: g.maub...@gmx.de > Cc: "Pito Salas" , "R Help" > Betreff: Re: [R] Documenting data > > I believe Georg's pronouncements are wrong. See inline below. > > -- Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > "... > > > Within R there are some limitations for storing the informtation about what > > a variable or a value within a variable means. > > That is FALSE. There are no limitations. For example, just attach a > "doc" attribute to your data that says whatever you wish to about > them. e.g. > > > somedata <- runif(10) > > attr(somedata,"doc") <- "Anything you want to say about the data" > > > attr(somedata,"doc") > [1] "Anything you want to say about the data" > > > You can go as crazy as you want to with this, e.g. creating a (S3 or > S4 )class "documented" with appropriate methods for printing it from > classes that inherit from data frames, lists, etc. See also the > roxygen2 package for data documentation and R's ?promptData function > for data documentation file in Rd format. > > R is Turing complete -- so it can do anything any other programming > language can do. You could program SAS in R if you wanted. The > difference is that SAS has pre-programmed some capabilities that R > leaves for users, including contributed packages -- like Sweave, > knitr, etc. You may or may not like this extra flexibility (and extra > work, depending on whether someone else has already done the work for > you), and efficiency may or may not be an issue; but to say that R has > "limitations" is a gross misrepresentation, imho. > > > > Possibilities to store this information is in other software packages > like SAS or SPSS much broader implemented. In R you can work with > meaningful variable names and the data type/class factor which can > store mappings between values and value descriptions. > > > > Example > > -- cut -- > > var1 <- c(rep(1:5, 3)) > > ds_example <- data.frame(var1) > > > > var1_labels <- c("1 = Strongly Agree", > > "2 = Agree", > > "3 = Neither agree/nor disagree", > > "4 = Disagree", > > "5 = Strongly disagree") > > > > ds_example[["var1"]] <- factor(ds_example[["var1"]], > >levels = c(1, 2, 3, 4, 5), > >labels = var1_labels) > > > > summary(ds_example["var1"]) > > -- cut -- > > > > In addition you find methods to work with variable labels and value labels > > in the pacakges Hmisc and memisc. They can also produce a thing called > > codebook which contains all variable names, variable labels, values, value > > labels and summaries of the distribution of values within the variables. > > > > 3. In addition to this you could structure your script in a modular way > > according to the analysis process, e. g. > > importing, cleaning, preparation for analysis, analysis, reporting. Other > > structure may be more sufficient in your case. These modules could have a > > number in the file name indicating in which sequence the scripts should be > > run. > > > > 4. I find it valuable to use a software repository like Github, Sourceforge > > or others to keep the revisions save and seucre in case you would like to > > go back to a version with code you deleted before and figure out that you > > need it now again. The R Studio IDE has an interface to git if you like to > > go with that. Good commit message can help you track what has changed. > > Commits also help you to prepare precise steps when developing your scripts. > > > > 5. I have no experience with Sweave or knitr but you could also compile a > > simple documentation through copying comments to an Excel sheet using > > R-2-Excel libraries like excel.link or others. > > > > Example > > install.packages("excel.link") > > library(excel.link) > > xlc["A1"] <- "Project Documentation" > > xlc["A2"] <- "Step XY" > > xlc["A3"] <- "Some explanation about step xy" > > > > This way you have the documentation in your code and in an external source. > > > > Which approach you chose depends on your experience with R and its > > libraries as well as the size of your project and the need for > > documentation. > > > > 6. It can be helpful to store interim results in a format that can be read > > by non-R-users, e. g. Excel. > > > > 7. Documenting code can be done using roxygen2. > > > > If there are different opinions to my suggestions please say so. > > > > Kind regards > > > > Georg > > > > > >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr > >> Von: "Pito Salas" > >> An: r-help@r-project.org > >> Betreff
[R] Dump of new Methods
Dear Readers, Hi All, to drive my R knowlegde a bit further I followed the advice of some of you by reading Chambers: Programming with data. I tried some examples from the book: -- cut -- setClass("track", representation (x = "numeric", y = "numeric")) track <- function(x, y) { # an object representing measurements 'y', tracked at positions 'x' x <- as(x, "numeric") y <- as(y, "numeric") if(length(x) != length(y)) { stop("x, y should have equal length!") } new("track", x = x, y = y) } dumpMethod("track", "track") setMethod("show", "track", function(object) { xy = rbind(object@x, object@y) dimanmes(xy) = list(c("x", "y"), 1:ncol(y)) show(xy) }) setMethod("plot", signature(x = "track", y = "missing"), function(x, y, ...) plot(unclass(x), xlab = "Position", ylab = "Value", ...) ) dumpMethod("plot", "track") -- cut -- Where do I find the dumped data? Is it in a single file or is every dump stored in a separate file? Where is it stored on my drive? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Dump of new Methods (SOLVED)
Hi Bert, many thanks. Found them. Kind regards Georg Von:Bert Gunter An: g.maub...@weinwolf.de, Datum: 04.07.2016 16:43 Betreff:Re: [R] Dump of new Methods ?getwd Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jul 4, 2016 at 1:34 AM, wrote: > Dear Readers, > Hi All, > > to drive my R knowlegde a bit further I followed the advice of some of you > by reading Chambers: Programming with data. > > I tried some examples from the book: > > -- cut -- > > setClass("track", representation (x = "numeric", > y = "numeric")) > > track <- function(x, y) { > # an object representing measurements 'y', tracked at positions 'x' > x <- as(x, "numeric") > y <- as(y, "numeric") > if(length(x) != length(y)) { > stop("x, y should have equal length!") > } > new("track", x = x, y = y) > } > > dumpMethod("track", "track") > > setMethod("show", "track", > function(object) { > xy = rbind(object@x, object@y) > dimanmes(xy) = list(c("x", "y"), > 1:ncol(y)) > show(xy) > }) > > setMethod("plot", > signature(x = "track", y = "missing"), > function(x, y, ...) > plot(unclass(x), xlab = "Position", ylab = "Value", ...) > ) > > dumpMethod("plot", "track") > > -- cut -- > > Where do I find the dumped data? Is it in a single file or is every dump > stored in a separate file? Where is it stored on my drive? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: dplyr : row total for all groups in dplyr summarise
Hi guys, I checked out your example but I can't follow the results.: > mtcars %>% + group_by (am, gear) %>% + summarise (n=n()) %>% + mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% + ungroup() %>% + mutate(row.tot = sum(n)) Source: local data frame [4 x 5] am gear n rel.freq row.tot (dbl) (dbl) (int)(chr) (int) 1 0 315 79% 32 2 0 4 4 21% 32 3 1 4 8 62% 32 4 1 5 5 38% 32 We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %. The same with the other columns. How is 79 % calculated? When searching the web I saw this example: -- cut -- #-- not run -- url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv"; response <- GET(url) Hollywoodmovies2011 <- content(x = GET(url), as = data.frame) #-- end not run Hollywoodmovies2011 %>% group_by(genre) %>% summarize(count = n()) %>% mutate(rf = count / sum(count)) -- cut -- which gives Source: local data frame [9 x 3] Genre count % (fctr) (int) (dbl) 1Action32 0.235294118 2 Adventure 1 0.007352941 3 Animation12 0.088235294 4Comedy27 0.198529412 5 Drama21 0.154411765 6 Fantasy 2 0.014705882 7Horror17 0.12500 8 Romance11 0.080882353 9 Thriller13 0.095588235 Here the % correspond to the count and the sum of count, e. g. sum = 136 and 32 / 136 = 0,2352941. What is the difference when counting? What do the relative counts in the first example mean? Kind regards Georg Von:Ulrik Stervbo An: David Winsemius , Kopie: r-help@r-project.org, mai...@infomed.sld.cu Datum: 05.07.2016 06:06 Betreff:Re: [R] dplyr : row total for all groups in dplyr summarise Gesendet von: "R-help" That will give you the wrong result when used on summarised data David Winsemius schrieb am Di., 5. Juli 2016 02:10: > I thought there was an nrow() function? > > Sent from my iPhone > > On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo wrote: > > If you want the total number of rows in the original data.frame after > counting the rows in each group, you can ungroup and sum the row counts, > like: > > library("dplyr") > > > mtcars %>% >group_by (am, gear) %>% >summarise (n=n()) %>% >mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% >ungroup() %>% >mutate(row.tot = sum(n)) > > HTH > Ulrik > > On Mon, 4 Jul 2016 at 18:23 David Winsemius > wrote: > >> >> > On Jul 4, 2016, at 6:56 AM, mai...@infomed.sld.cu wrote: >> > >> > Hello, >> > How can I aggregate row total for all groups in dplyr summarise ? >> >> Row total … of what? Aggregate … how? What is the desired answer? >> >> >> >> > library(dplyr) >> > mtcars %>% >> > group_by (am, gear) %>% >> > summarise (n=n()) %>% >> > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) >> > >> > best regard >> > Maicel Monzon >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > Este mensaje le ha llegado mediante el servicio de correo electronico >> que ofrece Infomed para respaldar el cumplimiento de las misiones del >> Sistema Nacional de Salud. La persona que envia este correo asume el >> compromiso de usar el servicio a tales fines y cumplir con las regulaciones >> establecidas >> > >> > Infomed: http://www.sld.cu/ >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] WG: Fw: Re: dplyr : row total for all groups in dplyr summarise
Hi All, if I run the suggested code mtcars %>% group_by (am, gear) %>% summarise (n = n()) %>% mutate(rel.freq = paste0(round(100 * n / sum(n), 0), "%")) %>% ungroup() %>% plyr::rbind.fill(data.frame(n = nrow(mtcars), rel.freq = "100%”)) I get > mtcars %>% + group_by (am, gear) %>% + summarise (n = n()) %>% + mutate(rel.freq = paste0(round(100 * n / sum(n), 0), "%")) %>% + ungroup() %>% + plyr::rbind.fill(data.frame(n = nrow(mtcars), rel.freq = + "100%”)) + R stops execution cause something within the prgram syntax is missing. What has to be changed to be able to run the code? Kind regards Georg Maubach > Gesendet: Dienstag, 05. Juli 2016 um 18:30 Uhr > Von: "David Winsemius" > An: mai...@infomed.sld.cu > Cc: r-help@r-project.org > Betreff: Re: [R] dplyr : row total for all groups in dplyr summarise > > > > mtcars %>% >group_by (am, gear) %>% >summarise (n=n()) %>% >mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% >ungroup() %>% plyr::rbind.fill(data.frame( n=nrow(mtcars),rel.freq="100%”)) > > > > On Jul 5, 2016, at 4:47 AM, mai...@infomed.sld.cu wrote: > > > > Sorry, what I wanted to do was to add a total row at the end of the summary. The marginal totals by columns correspond to 100% and the sum of levels. > > best reagard > > Maicel Monzon > > > > > > Ulrik Stervbo escribió: > > > >> Yes. But in the sample code the data is summarised. In which case you get 4 > >> rows and not the correct 32. > >> > >> On Tue, 5 Jul 2016, 07:48 David Winsemius, wrote: > >> > >>> nrow(mtcars) > >>> > >>> > >>> Sent from my iPhone > >>> > >>> On Jul 4, 2016, at 9:03 PM, Ulrik Stervbo wrote: > >>> > >>> That will give you the wrong result when used on summarised data > >>> > >>> David Winsemius schrieb am Di., 5. Juli 2016 > >>> 02:10: > >>> > I thought there was an nrow() function? > > Sent from my iPhone > > On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo > wrote: > > If you want the total number of rows in the original data.frame after > counting the rows in each group, you can ungroup and sum the row counts, > like: > > library("dplyr") > > > mtcars %>% > group_by (am, gear) %>% > summarise (n=n()) %>% > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>% > ungroup() %>% > mutate(row.tot = sum(n)) > > HTH > Ulrik > > On Mon, 4 Jul 2016 at 18:23 David Winsemius > wrote: > > > > > > On Jul 4, 2016, at 6:56 AM, mai...@infomed.sld.cu wrote: > > > > > > Hello, > > > How can I aggregate row total for all groups in dplyr summarise ? > > > > Row total ? of what? Aggregate ? how? What is the desired answer? > > > > > > > > > library(dplyr) > > > mtcars %>% > > > group_by (am, gear) %>% > > > summarise (n=n()) %>% > > > mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) > > > > > > best regard > > > Maicel Monzon > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Este mensaje le ha llegado mediante el servicio de correo electronico > > que ofrece Infomed para respaldar el cumplimiento de las misiones del > > Sistema Nacional de Salud. La persona que envia este correo asume el > > compromiso de usar el servicio a tales fines y cumplir con las regulaciones > > establecidas > > > > > > Infomed: http://www.sld.cu/ > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > >> > > > > > > > > > > This message was sent using IMP, the Internet Messaging Program. > > > > > > > > -- > > Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas > > > > Infomed: http://www.sld.cu/ > > >
[R] Formatting ggplot2 graph
Hi All, my current code looks lke this: freq_ls <- structure(list(Var1 = c("zldkkd", "aakdkdk", "aaakdkd", "aaieiwo", "vöalsl", "ssddkdk", "glowowp", "laoiw", "ruklow", "rolsl", "delk", "inslvnz"), Anzahl = c(1772L, 761L, 536L, 317L, 197L, 160L, 30L, 20L, 10L, 6L, 6L, 1L), Prozent = c(46.4, 19.9, 14, 8.3, 5.2, 4.2, 0.8, 0.5, 0.3, 0.2, 0.2, 0)), .Names = c("Var1", "Anzahl", "Prozent"), class = c("tbl_df", "data.frame"), row.names = c(NA, -12L)) ggplot(freq_ls) + geom_bar(aes(x = Var1, y = Anzahl), stat = "identity", fill = "gray") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ggtitle("Title of the Plot") I would like to add the abolute and relative frequencies on top of the bars. In addition I want the values printed in descending ording according to the data. I searched the web and found: geom_text(stat='bin',aes(label=..count..),vjust=-1) (Source: http://stackoverflow.com/questions/26553526/how-to-add-frequency-count-labels-to-the-bars-in-a-bar-graph-using-ggplot2 ) but this does not work in my case. Inserting the code ggplot(freq_ls) + geom_bar(aes(x = Var1, y = Anzahl), stat = "identity", fill = "gray") + geom_text(stat='bin',aes(label=..count..),vjust=-1) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ggtitle("Title of the Plot") results in `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: Removed 1 rows containing missing values (geom_text). I looked in the book Wickhan: ggplot2 but could find an answer to the question: - How to show number if tey are pre-calculated? - How to sort the bars according to the sequence of values in descending order or if - pre-ordered - in the given order? What do I have to change in my code to do it? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Choropleth: Turnover by ZipCode
Hi All, Dear Readers, I need to create a choropleth graph with turnover by zipcode. This is what I have so far: # Not run (Begin) # Install packages if needed # install.packages(pkgs = c("maptools", "rgdal", "RColorBrewer", "grDevices")) # Not run (End) # Load libraries library(maptools); library(rgdal); library(RColorBrewer); library(grDevices) # Configuration # Adjust if needed! file_path <- file.path("C:", "temp") # Read data # Source: http://arnulf.us/PLZ url <- "http://www.metaspatial.net/download/plz.tar.gz"; file_name_gzip <- basename(url) file_name_extract <- "post_pl.shp" download.file(url, file.path(file_path, file_name_gzip)) untar(tarfile = file.path(file_path, file_name_gzip), compressed = "gzip", exdir = file_path) # Dataset # I have the data for all zipcodes available in my region ds_temp <- structure( list( ZipCode = c(1099, 10178, 13125, 21406, 32429, 41569), Sales = c(4, 2, 9, 5, 7, 3), Revenue = c(12, 9, 100, 80, 90, 25) ), .Names = c("ZipCode", "Sales", "Revenue"), row.names = c(NA, 6L), class = "data.frame" ) print(ds_temp) # Prepare graphic file_name_pdf <- file.path(file_path, "sales-and-revenue-by-zipcodes.pdf") cairo_pdf(bg = "grey98", file_name_pdf, width = 16, height = 9) y <- readShapeSpatial(file.path(file_path, file_name_extract), proj4string = CRS("+proj=longlat")) x <- spTransform(y,CRS=CRS("+proj=merc")) # How do I need to change this line? # Needs to be replaced by turnover from ds_temp color <- sample(1:7, length(x), replace=T) # Create graphic plot(x, col = brewer.pal(7, "Oranges")[color], border = F) # How to I tell R to plot turnover from ds_temp? # Title mtext( "Turnover by Zipcodes", side = 3, line = -4, adj = 0, cex = 1.7 ) # Write to disc dev.off() # Cleanup rm("ds_temp", "color", "file_name_extract", "file_name_gzip", "file_name_pdf", "file_path", "url", "x", "y") unlink(file.path(file_path, "plz.tar.gz")) unlink(file.path(file_path, "post_pl.dbf")) unlink(file.path(file_path, "post_pl.shp")) unlink(file.path(file_path, "post_pl.shx")) # unlink(file.path(file_path, "sales-and-revenue-by-zipcodes.pdf")) What do I need to do to color the amount of turnover or the frequencies of sales from the ds_temp dataset in the graph? Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Toolbox (Release 2 of 2016-07-21)
Hi All, I have uploaded a new release of the R Toolbox. R Toolbox is a collection of simple but useful functions which I developed for myself to shorten the develoment process. Currently all functions use base R. No other packages are needed. One exception is "t_openxlsx" cause this module deals explicitly with the openxlsx package. It is simple to install the functions. Just copy them to an appropriety place on your hard disk and adjust the variable "t_toolbox_location" to the place you stored the toolbox in. Running "r_toolbox.R" from that location will load all modules. In addition to new functions (see Release Comparison below) some functions were improved. The are called with their package names, e. g. openxlsx::read.xlsx() instead of "read.xlsx()". This way confusion with functions having the same name but comming from other packages is avoided. Pleae be aware that I have include some not tested function in this release. All modules have a variable "t_status" now, stating the development status, e. g. "development", "testing", "release". Here is a Releae Comparison: -- cut -- release_comparison <- structure(list(Module = c("r_toolbox.R", "t_adjust_packages.R", "t_conventions.r", "t_create_variable.R", "t_definitions.R", "t_find_originals_and_duplicates.R", "t_get_factor_levels.R", "t_merge_variables.R", "t_n_miss.R", "t_n_valid.R", "t_openxlsx_shortcuts.r", "t_rename_variables.R", "t_replace_na.R", "t_report_memory.R", "t_select_vars_by_type.R"), Release1 = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE), Release2 = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)), .Names = c("Module", "Release1", "Release2"), row.names = c(NA, 15L), class = "data.frame") edit(release_comparison) -- cut --- Release 1 is of 2016-05-31, Releae 2 of 2016-06-21. You can download the toolbox from https://sourceforge.net/projects/r-project-utilities/ Kind regards Georg Maubach __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error when installing packages
Hi All, I try to install packages on Debian GNU Linux 8 (Kernel 3.16.0-4-amd64). My sessionInfo() is R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie) locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=de_DE.UTF-8LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.3.1 Installing the following packages Warning in install.packages : packages ‘excel.link’, ‘installr’ are not available (for R version 3.3.1) Warning in install.packages : dependencies ‘latticist’, ‘graph’, ‘RBGL’, ‘pkgDepTools’, ‘Rgraphviz’ are not available also installing the dependencies ‘RCurl’, ‘RWekajars’ results in the following messages: (1) * installing *source* package ‘RCurl’ ... checking for curl-config... no Cannot find curl-config (2) * installing *source* package ‘RWekajars’ ... ./configure: 1: ./configure: /usr/lib/jvm/default-java/jre/bin/java: not found ./configure: 50: test: -ge: unexpected operator ./configure: 51: test: -eq: unexpected operator Need at least Java version 1.6/6.0. ERROR: configuration failed for package ‘RWekajars’ Annotation: I have openjdk-8-jre installed. (3) * installing *source* package ‘cairoDevice’ ... ERROR: gtk+2. not found by pkg-config. ERROR: configuration failed for package ‘cairoDevice’ (4) * installing *source* package ‘rgdal’ ... configure: CC: gcc -std=gnu99 configure: CXX: g++ configure: rgdal: 1.1-10 checking for /usr/bin/svnversion... no configure: svn revision: 622 checking for gdal-config... no no configure: error: gdal-config not found or not executable. ERROR: configuration failed for package ‘rgdal’ (5) * installing *source* package ‘rgeos’ ... configure: CC: gcc -std=gnu99 configure: CXX: g++ configure: rgeos: 0.3-19 checking for /usr/bin/svnversion... no configure: svn revision: 524 checking for geos-config... no no configure: error: geos-config not found or not executable. ERROR: configuration failed for package ‘rgeos’ ... and much more. Do all these error messages have something in common? How could I fix the installation? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spread data.frame on 2 variables
Hi All, I need to spread a data.frame on 2 variables, e. g. "channel" and "unit". If I do it in two steps spreads keeps all cases that does not look like the one before although it contains the same values for a specific case. Here is what I have right now: -- cut -- test1$dummy <- 1 test2 <- spread(data = test1, key = 'channel', value = "dummy") test2 cat("First spread is OK!") test2$dummy <- 1 test3 <- spread(data = test2, key = 'unit', value = 'dummy') test1 # test2 test3 warning(paste0("Second spread is not OK cause spread does not merge cases\n", "with CustID 700 and 800 into one case,\n", "cause they have values on different variables,\n", "although the corresponding values of the cases with", "custID 700 and 800 are missing.")) cat("What I would like to have is:\n") target4 <- structure(list(custID = c(100, 200, 300, 500, 600, 700, 800, 900), `10` = c(1, NA, NA, NA, NA, NA, NA, NA), `20` = c(1, NA, NA, NA, NA, NA, NA, NA), `30` = c(NA, NA, NA, NA, NA, NA, 1, 1), `40` = c(NA, NA, NA, NA, 1, NA, 1, 1), `50` = c(NA, NA, 1, NA, NA, NA, 1, 1), `60` = c(NA, NA, NA, NA, NA, 1, NA, NA), `70` = c(NA, NA, NA, NA, NA, 1, NA, NA), `99` = c(NA, 1, NA, 1, NA, NA, NA, NA), `1000` = c(1, NA, NA, NA, NA, NA, 1, 1), `2000` = c(NA, NA, NA, NA, 1, 1, 1, NA), `3000` = c(NA, NA, 1, NA, NA, 1, NA, NA), `4000` = c(NA, NA, 1, NA, NA, NA, NA, NA), `6000` = c(NA, NA, NA, NA, 1, NA, NA, NA), `` = c(NA, 1, NA, 1, NA, NA, NA, NA)), .Names = c("custID", "10", "20", "30", "40", "50", "60", "70", "99", "1000", "2000", "3000", "4000", "6000", ""), row.names = c(NA, 8L), class = "data.frame") target4 cat("What would be a proper way to create target4 from test1?") -- cut -- What would be the proper way to create target4 from test1? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Re: Spread data.frame on 2 variables (SOLVED)
Hi Ulrik, many thanks for your help. The problem was that R regards a dataset with a combination like caseID custID channel unit 1 100010 10 2 100020 10 3 100020 30 as two diffrenet sets of cases: 1 set = case 1, 2 set = case 2 and 3 due to the different values of unit in case 3 value 30, althought all cases should be restructured based just on custID. To get a dataset like caseID custID channel -10 channel-20 unit-10 unit-30 1 10001 1 1 1 instead of caseID custID channel -10 channel-20 unit-10 unit-30 1 10001 1 1 NA 2 1000NA 1 NA 1 I used the approach you suggested: 1. I created a subset of my data with the first variable to be restructured: d_temp1 <- dataset[ , c("custID", "channel")) 2. I deleted all the cases the were dupliates d_temp1 <- duplicated(d_temp1, c("custID", "channel") 3. I introduced a dummy variable delivering the values for the new variables created by dplyr:spread() d_temp1$dummy <- 1 4. Then I restructured the subset d_temp1 <- dplyr::spread(d_temp1, key_variable = "channel", value = d_temp1$dummy) 5. I repeaed steps 1 to 4 with the other variable "unit" (instead of "channel") creating a new dataset named d_temp2. 6. I deleted the variables used for restructuring in steps 1 to 5 "channel" and "unit" from the original dataset "dataset". dataset$channel <- NULL dataset$unit <- NULL 7. I checked if I still had duplicates duplicates <- duplicated(dataset, key_variable = c("Debitor")) sum(duplicates) # was 0 it this time 8. I merged the datasets back together dataset_2 <- merge(x = dataset, y = d_temp1, by.x = "Debitor", by.y = "Debitor", all.x = TRUE, all.y = TRUE) # leaving out all.y would be fine dataset_2 <- merge(x = dataset2, y = d_temp2, by.x = "Debitor", by.y = "Debitor", all.x = TRUE, all.y = TRUE) # leaving out all.y would be fine There might be a combination of commands and functions doing the same thing in one step but I find that this is clear, comprehensible and reproducable even at a later date or by other readers willing to use base R for their work. Many thanks again for your help. Kind regards Georg Von:Ulrik Stervbo An: g.maub...@weinwolf.de, R-help , Datum: 28.07.2016 14:20 Betreff:Re: Re: [R] Spread data.frame on 2 variables Hi Georg, it is difficult to figure out what happens between your expectation and the outcome if we cannot see a minimal dataset. Based on your description I did this library(tidyr) library(dplyr) test_df <- data_frame(channel = LETTERS[1:5], unit = letters[1:5], custID = c(1:5), dummy = 1) test_df %>% spread(channel, dummy) %>% mutate(dummy = 1) %>% spread(unit, dummy) which seems to be working fine as I get wide data. If a combination is missing in the long form it will also be missing in the wide form. Maybe you are looking for something like this: channel_wide <- test_df %>% select(channel, custID) %>% spread(channel, custID) unit_wide <- test_df %>% select(unit, custID) %>% spread(unit, custID) bind_cols(channel_wide, unit_wide) Apologies for the HTML - it's gmail Best wishes, Ulrik On Thu, 28 Jul 2016 at 13:54 wrote: Hi Ulrik, I have included a reproducable example. I ran the code and it did exactly what I wanted to show you. You are right: the solution shall merge cases in the end cause the values on the variables are either missing or the same. Example 1: Values are the same If you look at 6 and 7 and variable 70 the value is 1 in both cases. This is in this context the same information and cases 6 and 7 with custID can be merged to 1 for variable 70. Example 2: Values are missing and not missing If you look at cases 8 and 9 the value for case 8 at variable 40, 50 and 2000 is missing whereas the variables 40, 50 and 2000 have all 1 for case 9. Case 8 and 9 could be merged together cause the missing values are overwritten what is correct in this case. The solution I am looking for is to transform the data from long into wide form and keep all but missing value information. Did I explain my problem in a comprehensible way? Are there any further questions? Kind regards Georg Von:Ulrik Stervbo An: g.maub...@weinwolf.de, r-help@r-project.org, Datum: 28.07.2016 12:59 Betreff:Re: [R] Spread data.frame on 2 variables Hi Georg, it's hard to tell without a reproducible example. Should spread really merge elements? Does spread know anything about CustID? Maybe you need to make a useful key of the CustIDs first and spread on that? Maybe I'm all off, because I'm really just guessing. Best, Ulrik On Thu, 28 Jul 2016 at 12:36 wrote: Hi All, I need to spread a data.frame o
[R] Accessing an object using a string
Hi All, I would like to access an object using a sting. # Create example dataset var1 <- c(1, 2, 3) var2 <- c(4, 5, 6) data1 <- data.frame(var1, var2) var3 <- c(7, 8, 9) var4 <- c(10, 11, 12) data2 <- data.frame(var3, var4) save(file = "c:/temp/test.RData", list = c("data1", "data2")) # Define function t_load_dataset <- function(file_path, file_name) { file_location <- file.path(file_path, file_name) print(paste0('Loading ', file_location, " ...")) cat("\n") object_list <- load(file = file_location, envir = .GlobalEnv) print(paste(length(object_list), "dataset(s) loaded from", file_location)) cat("\n") print("The following objects were loaded:") print(object_list) cat("\n") for (i in object_list) { print(paste0("Object '", i, "' in '", file_name, "' contains:")) str(i) names(i) # does not work } } I have only the character vector object_list containing the names of the objects as strings. I would like to access the objects in object_list to be able to print the names of the variables within the object (usuallly a data frame). Is it possible to do this? How is it done? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Accessing an object using a string (SOLVED)
Hi All, I found the function get() which returns an object. My whole function looks like this: -- cut -- #--- # Module: t_load_dataset.R # Author: Georg Maubach # Date : 2016-08-15 # Update: 2016-08-15 # Description : Load dataset and print information on contents # Source System : R 3.3.0 (64 Bit) # Target System : R 3.3.0 (64 Bit) # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #1-2-3-4-5-6-7-8 t_module_name = "t_load_dataset" t_version = "2016-08-15" t_status = "released" cat( paste0("\n", t_module_name, " (Version: ", t_version, ", Status: ", t_status, ")", "\n", "\n", "Copyright (C) Georg Maubach 2016 This software comes with ABSOLUTELY NO WARRANTY.", "\n", "\n")) # If do_test is not defined globally define it here locally by un-commenting it # Switch t_do_test to TRUE to run test t_do_test <- FALSE # [ Function Defintion ] t_load_dataset <- function(file_path, file_name) { # Loads and RData file with all objects in it and prints information on its # contents # # Args: # file_path (string): #String with path name. # file_name (string): #String with file name. # # Operation: # Loads the RData file with all its objects, stores the objects in the # global environment .GlobalEnv and prints information about the objects. # # Usage: # The function is designed to work only on data frames. # # Returns: # Nothing, but stores loaded objects directly into the global environment. # # Error handling: # None. #- cat("--- [ t_load_dataset() ] --\n\n") file_location <- file.path(file_path, file_name) cat(paste0('Loading ', file_location, " ...\n\n")) dataset_list <- load(file = file_location, envir = .GlobalEnv) cat(paste0( length(dataset_list), " dataset(s) loaded:\n")) cat(dataset_list) cat("\n\n") for (dataset in dataset_list) { cat(paste0("Dataset '", dataset, "' contains ", nrow(get(dataset, envir = .GlobalEnv)), " cases in ", ncol(get(dataset, envir = .GlobalEnv)), " variables:\n")) cat(names(get(dataset, envir = .GlobalEnv))) cat("\n\n") } cat("-- [ Done ] ---\n\n") } # [ Test Defintion ] t_test <- function(do_test = FALSE) { if (do_test == TRUE) { # Example dataset var1 <- c(1, 2, 3) var2 <- c(4, 5, 6) d_data1 <- data.frame(var1, var2) var3 <- c(7, 8, 9) var4 <- c(10, 11, 12) d_data2 <- data.frame(var3, var4) # Save datasets v_file_name <- "test_t_load_dataset.RData" save(file = file.path(getwd(), v_file_name), list = c("d_data1", "d_data2")) # Call function t_load_dataset(file_path = getwd(), file_name = v_file_name) # Cleanup unlink(file.path(getwd(), v_file_name)) } } # [ Test Run ]-- t_test(do_test = t_do_test) # [ Clean up ]-- rm("t_module_name", "t_version", "t_status", "t_do_test", "t_test") # EOF -- cut -- I will include it later the toolbox of R function on Sourceforge.net. Kind regards Georg Von:g.maub...@weinwolf.de An: r-help@r-project.org, Datum: 15.08.2016 10:51 Betreff:[R] Accessing an object using a string Gesendet von: "R-help" Hi All, I would like to access an object using a sting. # Create example dataset var1 <- c(1, 2, 3) var2 <- c(4, 5, 6) data1 <- data.frame(var1, var2) var3 <- c(7, 8, 9) var4 <- c(10, 11, 12) data2 <- data.frame(var3, var4) save(file = "c:/temp/test.RData", list = c("data1", "data2")) # Define function t_load_dataset <- function(file_path, file_name) { file_location <- file.path(file_path, file_name) print(paste0('Loading ', file_location, " ...")) cat("\n") object_list <- load(file = file_location, envir = .GlobalEnv) print(paste(length(object_list), "dataset(s) loaded from", file_location)) cat("\n") print("The following objects were loaded:") print(object_list) cat("\n") for (i in object_list) { print(paste0("Object '", i, "' in '", file_name, "' contains:")) str(i) names(i) # does not work } } I have only the character vector object_list containin
[R] Antwort: Re: Accessing an object using a string
Hi Greg and all others who replied to my question, many thanks for all your answers and help. Currently I store all my objects in .GlobalEnv = Workspace. I am not yet familiar working with different environments nor did I see that this would be necessary for my analysis. Could you explain why working with different environments would be helpful? You suggested to read variables into lists rather than storing them in global variables. This sounds interesting. Could you provide an example of how to define and use this? Kind regards Georg Von:Greg Snow <538...@gmail.com> An: g.maub...@weinwolf.de, Kopie: r-help Datum: 15.08.2016 20:33 Betreff:Re: [R] Accessing an object using a string The names function is a primitive, which means that if it does not already do what you want, it is generally not going to be easy to coerce it to do it. However, the names of an object are generally stored as an attribute of that object, which can be accessed using the attr or attributes functions. If you change your code to not use the names function and instead use attr or attributes to access the names then it should work for you. You may also want to consider changing your workflow to have your data objects read into a list rather than global variables, then process using lapply/sapply (this would require a change in how your data is saved from your example, but if you can change that then everything after can be cleaner/simpler/easier/more fool proof/etc.) On Mon, Aug 15, 2016 at 2:49 AM, wrote: > Hi All, > > I would like to access an object using a sting. > > # Create example dataset > var1 <- c(1, 2, 3) > var2 <- c(4, 5, 6) > data1 <- data.frame(var1, var2) > > var3 <- c(7, 8, 9) > var4 <- c(10, 11, 12) > data2 <- data.frame(var3, var4) > > save(file = "c:/temp/test.RData", list = c("data1", "data2")) > > # Define function > t_load_dataset <- function(file_path, >file_name) { > file_location <- file.path(file_path, file_name) > > print(paste0('Loading ', file_location, " ...")) > cat("\n") > > object_list <- load(file = file_location, > envir = .GlobalEnv) > > print(paste(length(object_list), "dataset(s) loaded from", > file_location)) > cat("\n") > > print("The following objects were loaded:") > print(object_list) > cat("\n") > > for (i in object_list) { > print(paste0("Object '", i, "' in '", file_name, "' contains:")) > str(i) > names(i) # does not work > } > } > > I have only the character vector object_list containing the names of the > objects as strings. I would like to access the objects in object_list to > be able to print the names of the variables within the object (usuallly a > data frame). > > Is it possible to do this? How is it done? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installation of rJava fails
Hi All, I try to install RWeka on Debian GNU Linux 8 Jessie (uname -a: 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64) which has a dependency to "rJava". I did apt-get install openjdk-8-jre which went OK. Java is installed in: /var/lib/dpkg/alternatives/java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java /usr/lib/jvm/java-8-openjdk-amd64/bin/java /etc/alternatives/java When doing this install.packages("rJava") I get * installing *source* package ‘rJava’ ... ** Paket ‘rJava’ erfolgreich entpackt und MD5 Summen überprüft interpreter : '/usr/lib/jvm/default-java/jre/bin/java' archiver: '/usr/lib/jvm/default-java/bin/jar' compiler: '/usr/lib/jvm/default-java/bin/javac' header prep.: '/usr/lib/jvm/default-java/bin/javah' cpp flags : '-I/usr/lib/jvm/default-java/include' java libs : '-L/usr/lib/jvm/default-java/jre/lib/amd64/server -ljvm' checking whether Java run-time works... ./configure: line 3736: /usr/lib/jvm/default-java/jre/bin/java: No such file or directory no configure: error: Java interpreter '/usr/lib/jvm/default-java/jre/bin/java' does not work ERROR: configuration failed for package ‘rJava’ * removing ‘/usr/local/lib/R/site-library/rJava’ Warning in install.packages : installation of package ‘rJava’ had non-zero exit status Do I need to use another Java version or installation? How do I tell install.packages() where my Java installation resides? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Iteration over variables
Hi All, I would like to write a program that iterates over a set of dynamically generated variables and produces some stats or prints parts of the data. # --- data v_turnover_2011 <- c(10, 20, 30, 40 , 50) v_customer_2011 <- c(0, 1, NA, 0, 1) v_turnover_2012 <- c(10, 20, 30, 40 , 50) v_customer_2012 <- c(0, 1, NA, 0, 1) d_dataset <- data.frame(v_turnover_2011, v_turnover_2012, v_customer_2011, v_customer_2012) # -- Aim is to iterate over dynamically generated variables and compute # -- statistics or print parts of the data # -- Does not produce any output for (year in 2011:2012) { head(d_dataset[, c(paste0("v_turnover_", year), paste0("v_customer_", year))]) } # -- Does not produce any output aux_func <- function(year) { head(d_dataset[, c(paste0("v_turnover_", year), paste0("v_customer_", year))]) } for (year in 2011:2012) { aux_func(year = year) } d_results <- data.frame() for (year in 2011:2012) { d_results <- rbind(d_results, paste0("mean", year) = mean(d_dataset[, c(paste0("v_turnover_", year))])) } Is there a way to iterate over variables and compute statistics and print parts of the dataset? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Putting a bunch of Excel files as data.frames into a list fails
Hi All, I need to read a bunch of Excel files and store them in R. I decided to store the different Excel files in data.frames in a named list where the names are the file names of each file (and that is different from the sources as far as I can see): -- cut -- # Sources: # - http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r # - http://stackoverflow.com/questions/9564489/opening-all-files-in-a-folder-and-applying-a-function # - http://stackoverflow.com/questions/12945687/how-to-read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frame-e v_file_path <- "H:/2016/Analysen/Neukunden/Input" v_file_pattern <- "*.xlsx" v_files <- list.files(path = v_file_path, pattern = v_file_pattern, ignore.case = TRUE) print(v_files) v_list_of_files <- list() for (v_file in v_files) { v_list_of_files[v_file] <- openxlsx::read.xlsx( file.path(v_file_path, v_file)) } This code does not work cause it stores only the first variable of each Excel file in a named list. What do I need to change to get it running? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Filtering Cases with != NA
Dear All, I am new to "R" and search for a solution to exclude cases if a certain variable contains NA for a case. Example No Name Turnover 1 Smith 1500 2 Mayor 200 3 Miller 4 Batic 750 I would like to create a subset excluding case 3 Miller NA. I tried to following: new_dataset <- subset(dataset, subset = Turnover != NA) This does not work. The new_dataset contains all variables but not cases are left. R responds "Variables with all observations missing". How could I do it right? Kind regards Georg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging Data Sets with Full Outer Join
Hi All, I would like to match some datasets. Both deliver variables AND cases which might or might not be present in all datasets: This sequence Kunden <- Kunden_2011 Kunden <- merge(Kunden, Kunden_2012, by.x = "Debitor", by.y = "Debitor") Kunden <- merge(Kunden, Kunden_2013, by.x = "Debitor", by.y = "Debitor") Kunden <- merge(Kunden, Kunden_2014, by.x = "Debitor", by.y = "Debitor") Kunden <- merge(Kunden, Kunden_2015, by.x = "Debitor", by.y = "Debitor") delivers too few cases. So I guess it does an equi-join. How can I join the datasets and keep the variables as well as the cases? I am looking forward to your reply. Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating variables on the fly
Hi all, I would like to use a loop for tasks that occurs repeatedly: # Groups # Umsatz <= 0: 1 (NICHT kaufend) # Umsatz > 0: 2 (kaufend) for (year in c("2011", "2012", "2013", "2014", "2015")) { paste0("Kunden$Kunde_real_", year) <- (paste0("Kunden$Umsatz_", year) <= 0) * 1 + (paste0("Kunden$Umsatz_", year) > 0) * 2 paste0("Kunden$Kunde_real_", year) <- factor(paste0("Kunden$Umsatz_", year), levels = c(1, 2), labels = c("NICHT kaufend", "kaufend")) } This actually does not work due to the fact that the expression "paste0("Kunden$Kunde_real_", year)" ist not interpreted as a variable name by the R script language interpreter. Is there a way to assembly variable names on the fly in R? Regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Fw: Re: Creating variables on the fly (SOLVED)
Hi Don, Hi to all readers, many thanks for all your answers and all your help. I adapted Don's code to my data and Don's code does the trick: str(Kunden01) for (year in 2011:2015) { Reeller_Kunde <- paste0("Reeller_Kunde_", year) Umsatz <- paste0("Umsatz_", year) cat('Creating', Reeller_Kunde,'from', Umsatz,'\n') Kunden01[[ Reeller_Kunde ]] <- ifelse( Kunden01[[ Umsatz ]] >= 0, 1, 2) Kunden01[[ Reeller_Kunde ]] <- factor( Kunden01[[ Reeller_Kunde ]], levels=c(1,2), labels= c("NICHT kaufend", "kaufend") ) } str(Kunden01) This way a new variable is created by building it from a string concatenation. I also like the cat() function to document the process within the loop while running the program. Many thanks for your help. Kind regards Georg Von:g.maub...@gmx.de An: g.maub...@weinwolf.de, Datum: 25.04.2016 21:37 Betreff:Fw: Re: [R] Creating variables on the fly > Gesendet: Montag, 25. April 2016 um 19:35 Uhr > Von: "MacQueen, Don" > An: "g.maub...@gmx.de" , "r-help@r-project.org" > Betreff: Re: [R] Creating variables on the fly > > I'm going to assume that Kunden is a data frame, and it has columns > (variables) with names like > Umstatz_2011 > and that you want to create new columns with names like > Kunde_real_2011 > > If that is so, then try this (not tested): > > for (year in 2011:2015) { > nmK <- paste0("Kunde_real_", year) > nmU <- paste0("Umsatz_", year) > cat('Creating',nmK,'from',nmU,'\n') > Kunden[[ nmK ]] <- ifelse( Kunden[[ nmU ]] <= 0, 1, 2) > Kunden[[ nmK ]] <- factor( Kunden[[ nmK ]], >levels=c(1,2), >labels= c("NICHT kaufend", "kaufend") >) > > } > > This little example should illustrate the method: > > > > foo <- data.frame(a=1:4) > > foo > a > 1 1 > 2 2 > 3 3 > 4 4 > > foo[['b']] <- foo[['a']]*3 > > foo > a b > 1 1 3 > 2 2 6 > 3 3 9 > 4 4 12 > > > > -- > Don MacQueen > > Lawrence Livermore National Laboratory > 7000 East Ave., L-627 > Livermore, CA 94550 > 925-423-1062 > > > > > > On 4/22/16, 8:52 AM, "R-help on behalf of g.maub...@gmx.de" > wrote: > > >Hi all, > > > >I would like to use a loop for tasks that occurs repeatedly: > > > ># Groups > ># Umsatz <= 0: 1 (NICHT kaufend) > ># Umsatz > 0: 2 (kaufend) > >for (year in c("2011", "2012", "2013", "2014", "2015")) { > > paste0("Kunden$Kunde_real_", year) <- (paste0("Kunden$Umsatz_", year) > ><= 0) * 1 + > >(paste0("Kunden$Umsatz_", year) > > > 0) * 2 > > paste0("Kunden$Kunde_real_", year) <- factor(paste0("Kunden$Umsatz_", > >year), > > levels = c(1, 2), > > labels = c("NICHT > >kaufend", "kaufend")) > > } > > > >This actually does not work due to the fact that the expression > >"paste0("Kunden$Kunde_real_", year)" ist not interpreted as a variable > >name by the R script language interpreter. > > > >Is there a way to assembly variable names on the fly in R? > > > >Regards > > > >Georg > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Missing Values in Logical Expressions
Hi All, I need to evaluate missing values in my data. I am able to filter these values and do simple statistics on it. But I do need new variables based on variables with missing values in my dataset: Check_Kunde_2011 <- ifelse(is.na(Umsatz_2011) == TRUE & Kunde_2011 == 1, 1, 0) Check_Kunde_2011 <- factor(Check_Kunde_2011, levels = c(1,0), labels = c("Check", "OK")) The new variable is not correctly created. It contains no values: table(Check_Kunde_2011) < table of extent 0 > I searched the web but could not find a solution. How can I work with variables and missing values in logical expressions? Where could I find something about this? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: RE: Missing Values in Logical Expressions
Hi Petr, Hi Jim, many thanks for your help. Today I constructed a sample dataset and tested your suggestions. Everything worked OK. Then I took the code and testet on the original data. And - it worked OK this morning also. I went back to my script of Thuesday and ran it again. OK. Then I used my script of Monday and ran it.. OK. I have no idea what was wrong yesterday. To see that there is a problem and not being able to replicate it a day later even if it did not work all day before is very strange. If the problem arises again, I will raise my hand. Many thanks again for your help. Kind regards Georg Von:PIKAL Petr An: "g.maub...@weinwolf.de" , "r-help@r-project.org" , Datum: 26.04.2016 11:11 Betreff:RE: [R] Missing Values in Logical Expressions Hm Based on Jim's data your construction gives me correct result. > Umsatz_2011<-c(1,2,3,4,5,NA,7,8,NA,10) > Kunde_2011<-rep(0:1,5) > Check_Kunde_2011 <- ifelse(is.na(Umsatz_2011) == TRUE & Kunde_2011 == 1, 1, 0) > Check_Kunde_2011 <- factor(Check_Kunde_2011, levels = c(1,0), labels = c("Check", "OK")) > table(Check_Kunde_2011) Check_Kunde_2011 CheckOK 1 9 So I presume that the problem lies in your data. You should provide some sample of your data either by posting result of str(yourdata) or dput(head(yourdata)) if you want some advice why with correct code you did not get appropriate result. Instead ifelse you can also use Check_Kunde_2011 <- (is.na(Umsatz_2011)&(Kunde_2011==1))*1 to get desired 0/1 vector. Cheers Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@weinwolf.de > Sent: Tuesday, April 26, 2016 10:10 AM > To: r-help@r-project.org > Subject: [R] Missing Values in Logical Expressions > > Hi All, > > I need to evaluate missing values in my data. I am able to filter these values > and do simple statistics on it. But I do need new variables based on variables > with missing values in my dataset: > > Check_Kunde_2011 <- ifelse(is.na(Umsatz_2011) == TRUE & Kunde_2011 == > 1, 1, 0) > Check_Kunde_2011 <- factor(Check_Kunde_2011, levels = c(1,0), labels = > c("Check", "OK")) > > The new variable is not correctly created. It contains no values: > > table(Check_Kunde_2011) > < table of extent 0 > > > I searched the web but could not find a solution. > > How can I work with variables and missing values in logical expressions? > > Where could I find something about this? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
[R] R Script Template
Hi All, I am addressing this post to all who are new to R. When learing R in the last weeks I took some notes for myself to have code snippets ready for the data analysis process. I put these snippets together as a script template for future use. Almost all of the given command prototypes are tested. The template script contains snippets for best practices and leaves out the commands that should not be used. Relying on the given snippets shall lead to high quality code. The code is based on examples from the ressources given in the template. I highly recommend to read the books or take the online courses to see how everything works and fits together. Despite putting everything together with care, the script is provided as-is with no warrenty or liability whatsoever. Please address any remarks or suggestions for improvement to the R-Help mailing list. Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interdependencies of variable types, logical expressions and NA
Hi All, my script tries to do the following on factors: > ## Check for case 3: Umsatz = 0 & Kunde = 1 > for (year in 2011:2015) { + Umsatz <- paste0("Umsatz_", year) + Kunde <- paste0("Kunde01_", year) + Check <- paste0("Check_U_0__Kd_1_", year) + + cat('Creating', Check, 'from', Umsatz, "and", Kunde, '\n') + + Kunden01[[ Check ]] <- ifelse(Kunden01[[ Umsatz ]] == 0 & + Kunden01[[ Kunde ]] == 1, + 1, 0 + ) + Kunden01[[ Check ]] <- factor(Kunden01[[ Check ]], + levels=c(1, 0), + labels= c("Check 0", "OK") + ) + + } Creating Check_U_0__Kd_1_2011 from Umsatz_2011 and Kunde01_2011 Creating Check_U_0__Kd_1_2012 from Umsatz_2012 and Kunde01_2012 Creating Check_U_0__Kd_1_2013 from Umsatz_2013 and Kunde01_2013 Creating Check_U_0__Kd_1_2014 from Umsatz_2014 and Kunde01_2014 Creating Check_U_0__Kd_1_2015 from Umsatz_2015 and Kunde01_2015 > > table(Kunden01$Check_U_0__Kd_1_2011, useNA = "ifany") Check 0 OK 1 16 13 > table(Kunden01$Check_U_0__Kd_1_2012, useNA = "ifany") Check 0 OK 1 17 12 > table(Kunden01$Check_U_0__Kd_1_2013, useNA = "ifany") Check 0 OK 2 17 13 > table(Kunden01$Check_U_0__Kd_1_2014, useNA = "ifany") Check 0 OK 1 15 14 > table(Kunden01$Check_U_0__Kd_1_2015, useNA = "ifany") Check 0 OK 2 15 13 > > Kunden01$Check_U_0__Kd_1_all <- ifelse(Kunden01$Check_U_0__Kd_1_2011 == 1 | +Kunden01$Check_U_0__Kd_1_2012 == 1 | +Kunden01$Check_U_0__Kd_1_2013 == 1 | +Kunden01$Check_U_0__Kd_1_2014 == 1 | +Kunden01$Check_U_0__Kd_1_2015 == 1, +1, 0) > > table(Kunden01$Check_U_0__Kd_1_all, useNA = "ifany") 0 723 (Ann.: I made the values up. But the relations equal real world data.) I had expected to get back a factor or at least a numeric variable containing 0, 1 and NA, instead 1 is not included. I searched the web for information on the treatment of logical expressions when the data contains NA. I found: 1. https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html Examples # Some logical operations do not return NA c(TRUE, FALSE) & NA c(TRUE, FALSE) | NA 2. https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html NA is a valid logical object. Where a component of x or y is NA, the result will be NA if the outcome is ambiguous. In other words NA & TRUE evaluates to NA, but NA & FALSE evaluates to FALSE. See the examples below. ## construct truth tables : x <- c(NA, FALSE, TRUE) names(x) <- as.character(x) outer(x, x, "&") ## AND table outer(x, x, "|") ## OR table Ann. Not very useful. How should it be read? 3. http://www.ats.ucla.edu/stat/r/faq/missing.htm Good explanation for NA in general and in analysis, but no information about NA in logical expressions. Then I made some tests with different data types and variables with NA: -- cut -- # 2016-04-27-001_truth_table_for_logicals_and_NA.R # Test 1 var2 <- c(TRUE, FALSE) var3 <- c(NA, NA) var1 <- c(1, 1) ds <- data.frame(var1, var2, var3) ds ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE) ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE) ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE) print(ds) # Output # var1 var2 var3 value_and_logical logical_and_na value_and_na # 11 TRUE NA TRUE TRUE TRUE # 21 FALSE NA TRUE NA TRUE # Test 2 ds$var1 <- factor(ds$var1, levels = c(0, 1), labels = c("NOT ok", "OK")) ds$var2 <- factor(ds$var2, levels = c(0, 1), labels = c("NOT ok", "OK")) ds$var3 <- factor(ds$var3, levels = c(0, 1), labels = c("NOT ok", "OK")) ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE) ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE) ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE) # Output (abbrev.) # Warning message: # In Ops.factor(ds$var1, ds$var3) : ?|? ist nicht sinnvoll für Faktoren print(ds) # Output # var1 var2 var3 value_and_logical logical_and_na value_and_na # 1 OK NA NA NA # 2 OK NA NA NA -- cut -- I had expected to get the same result in Test 2 as in Test 1. Where can I find information and documentation about NA handling in logical expressions on different variable types? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guid
[R] Antwort: Re: R Script Template
Hi All, please find enclosed the missing attachment. Kind regards Georg -- cut -- #-[ Header ] -- # Program : Framework for R scripts # Author: Georg Maubach # Date : 2016-03-03 # Update: 2016-04-27 # Description : Foundation for the analysis process # Source System : R 3.2.5 (64 Bit) # Target System : R 3.2.5 (64 Bit) # Release : 1 # License : CC-BY-NC-SA # File Name : 2016-04-27_Template_Scipt.R #--- #- [ Purpose of the document ] # This document provides a framework for a script able to handle real world # data throughout the complete analysis process. In each step examples or # prototypes of needed or helpful commands are given. Chapters and sections in # this document can be regarded as a toolbox. The needed tools shall be adapted # to the processed data. Commands are ordered an a consistent way to support the # user to produce high quality output. #--- # - [ At hand ] # help("function")# Extract or Replace Parts of an Object # example("function") # Examples on "Extract" # demo(package = .packages(all.available = TRUE)) # Show demos of packages #--- # - [ Editing Marks ] -- # %ROTA% : Result of the analysis in text form if needed to explain further # steps # %ToDo% : ToDo's #--- # - [ Warrenty Disclaimer ] # The software is provided "as-is". The author disclaims to the fullest extent # authorized by law any and all warranties, whether express or implied, # including, without limitation, any implied warranties of merchantability or # fitness for a particular purpose. Without limitation of the foregoing, the # author expressly does not warrant that: # # (a) the software will meet your requirements or expectations; # (b) the software or the software content will be free of bugs, errors, # viruses or other defects; # (c) any results, output, or data provided through or generated by the software # will be accurate, up-to-date, complete or reliable; # (d) the software will be compatible with third party software; # (e) any errors in the software will be corrected. #--- # - [ Limitation of Liability ] # In no event will the author be liable for any direct, indirect, consequential, # incidental, special, exemplary, or punitive damages or liabilities whatsoever # arising from or relating to the software, the software content or this # agreement, whether based on contract, tort (including negligence), strict # liability or other theory, even if the author has been advised of the # possibility of such damages. # # The use of the software goes to the whole risk of the user. #--- #1-2-3-4-5-6-7-8 #---# # Setup # #---# # Environment # Please make sure that RTools is installed Sys.getenv("R_ZIPCMD", "zip") # needed for openxlsx::write.xlsx() Sys.setenv(R_ZIPCMD= "C:/R-Project/Rtools/bin/zip") .libPaths() # Install directory for libraries # .libPaths("new path if needed") # Workplace sessionInfo()# Environment list.files(R.home()) # Show R home directory getwd() # Get working directory list.dirs() # List directories in working directory list.files() # List files in working directory library()# List all installed packages search() # List all loaded packages ls() # List objects in environment #---# # Configure # #---# path <- file.path("path", "to","directory") setwd(path) # Set working directory options(width = 65) # Set output width #-# # Install # #-# available.packages() # Desired packages my_packages <- c( "ctv"# Package to install packages based on themes "data.table",# Fast manipulation of large datasets "dplyr", # Data manipulation for data frames "geoR", "haven", # import data from stastical packages "Hmisc", "httr", # package to deal with HTTP requests "installr", # Dependency of openxlsx::write.xlsx() "lubridate", "mapdata", # data for high-quality maps "maps", # draw maps "maptools", # import ESRI data "memisc",# package data import
[R] Antwort: Re: selecting columns from a data frame or data table by type, ie, numeric, integer
Hi All, Hi Carl, I am not sure if this is useful to you, but I followed your conversation and thought of you when I read this: for (i in 1:ncol(dataset)) { if(class(dataset) == "character|numeric|factor|or whatsoever") { dataset[, i] <- as.factor(dataset[, i]) } } Source: Zumel, Nina / Mount, John: Practical Data Science with R, Manning Publications: Shelter Island, 2014, Chapter 2: Loading data into R, p. 25 This way you can select variables of a certain class only and do transformations. I found that this approach is not applicable if used with statistical functions like head(). Transformations worked fine for me. I found reading the above given source worthwile. Kind regards Georg PS: I am not related to the above given authors. I am just a reader reporting on - at least to me - a valuable ressource. Von:Carl Sutton via R-help An: William Dunlap , Kopie: "r-help@r-project.org" Datum: 29.04.2016 22:08 Betreff:Re: [R] selecting columns from a data frame or data table by type, ie, numeric, integer Gesendet von: "R-help" Thank you Bill Dunlap. So simple I never tried that approach. Tried dozens of others though, read manuals till I was getting headaches, and of course the answer was simple when one is competent. Learning, its a struggle, but slowly getting there. Thanks again Carl Sutton CPA On Friday, April 29, 2016 10:50 AM, William Dunlap wrote: > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ]a c1 1 1.12 2 1.0...10 10 0.2 Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help wrote: Good morning RGuru's I have a data frame of 575 columns. I want to extract only those columns that are numeric(double) or integer to do some machine learning with. I have searched the web for a couple of days (off and on) and have not found anything that shows how to do this. Lots of ways to extract rows, but not columns. I have attempted to use "(x == y)" indices extraction method but that threw error that == was for atomic vectors and lists, and I was doing this on a data frame. My test code is below # a technique to get column classes library(data.table) a <- 1:10 b <- c("a","b","c","d","e","f","g","h","i","j") c <- seq(1.1, .2, length = 10) dt1 <- data.table(a,b,c) str(dt1) col.classes <- sapply(dt1, class) head(col.classes) dt2 <- subset(dt1, typeof = "double" | "numeric") str(dt2) dt2 # not subset dt2 <- dt1[, list(typeof = "double")] str(dt2) class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)] class_data sum(class_data) typeof(class_data) names(class_data) str(class_data) Any help is appreciated Carl Sutton CPA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Antwort: Re: selecting columns from a data frame or data table by type, ie, numeric, integer
Hi Martin, many thanks for your answer and your broad explanation. I am a newbie to "R" and got help on this list and thought I could give something back what looked OK to me. regarding 0) You're right, it's pseudo code. I assumed that anybody on the list would be able to adapt the code to their needs so that it worked. Next time I will post runnable code. regarding 1) Your right: "[, i]" is missing. My fault. Sorry. regarding 3) I got your point and will do better in the future. One question: What books do you recommend to read to get to know "R" better? Kind regards Georg Von:Martin Maechler An: , Kopie: Carl Sutton , "r-help@r-project.org" Datum: 04.05.2016 09:05 Betreff:[R] Antwort: Re: selecting columns from a data frame or data table by type, ie, numeric, integer > > on Wed, 4 May 2016 08:30:50 +0200 writes: > Hi All, > Hi Carl, > > I am not sure if this is useful to you, but I followed your conversation > and thought of you when I read this: > > for (i in 1:ncol(dataset)) { > if(class(dataset) == "character|numeric|factor|or whatsoever") { > dataset[, i] <- as.factor(dataset[, i]) > } > } Ouch -- so many problems in such a short piece of R code !!! > Source: Zumel, Nina / Mount, John: Practical Data Science with R, Manning > Publications: Shelter Island, 2014, Chapter 2: Loading data into R, p. 25 Sorry, but after reading the above, I'd strongly recommend getting better books about R... {{maybe do not take those containing "data science" ;-)}} Compared to the nice and efficient solution of Bill Dunlap, the above is really bad-bad-bad in at least four ways : 0) They way you write it above, you cannot use it, == "variant1|variant2|..." is pseudocode and does not really work 1) Note the missing "[, i]" in the 2nd line: It should be if(class(dataset[, i]) ... 2) A for loop changing each column at a time is really slow for largish data sets 3) [last but not at all least!] Please ... many of you readers, do learn: Using checks such as if ( class(x) == "numeric" ) are (almost) always wrong by design !!! Instead you really should (almost) always use if(inherits(x, "numeric")) Why? Because classes in R (S3 or S4) can *extend* other classes. Example: Many of you know that after fm <- glm(...) class(fm) is c("glm", "lm") and so > if(class(fm) == "lm") + "yes" Warning message: In if (class(fm) == "lm") "yes" : the condition has length > 1 and only the first element will be used Similarly, in your case y <- 1:10 class(y) <- c("myNumber", "numeric") when that 'y' is a column in your data frame, the test for if(class(dataset[,i]) == "numeric") will *not* work but actually produce the above warning. However, one could als have had Num <- setClass("Num", contains="numeric") N <- Num(1:10) > Num <- setClass("Num", contains="numeric") > N <- Num(1:10) > N An object of class "Num" [1] 1 2 3 4 5 6 7 8 9 10 > if(class(N) == "numeric") "yes" else "no" [1] "no" > I hope that many of the readers --- including *MANY* authors of R packages !! --- have understood the above and will fix their R code -- and even more their books where applicable !! Martin Maechler, ETH Zurich & R Core Team > > This way you can select variables of a certain class only and do > transformations. I found that this approach is not applicable if used with > statistical functions like head(). Transformations worked fine for me. > > I found reading the above given source worthwile. > > Kind regards > > Georg > > PS: I am not related to the above given authors. I am just a reader > reporting on - at least to me - a valuable ressource. > > > > Von:Carl Sutton via R-help > An: William Dunlap , > Kopie: "r-help@r-project.org" > Datum: 29.04.2016 22:08 > Betreff:Re: [R] selecting columns from a data frame or data table > by type, ie, numeric, integer > Gesendet von: "R-help" > > > > Thank you Bill Dunlap. So simple I never tried that approach. Tried > dozens of others though, read manuals till I was getting headaches, and of > course the answer was simple when one is competent. Learning, its a > struggle, but slowly getting there. > Thanks again > Carl Sutton CPA > > > On Friday, April 29, 2016 10:50 AM, William Dunlap > wrote: > > > > > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ]a c1 1 1.12 2 > 1.0...10 10 0.2 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help > wrote: > > Good morning RGuru's > I have a data frame of 575 columns. I want to extract only those columns > that are numeric(double) or integer to do some machine learning with. I > have searched the web for a couple of days (off and on) and have not found > anything that shows how to do this. Lots of ways
[R] sink(): Cannot open file
Hi All, I would like to route the output to a file using sink(). When using the example from the ?sink documentation: sink("sink-examp.txt") i <- 1:10 outer(i, i, "*") sink() unlink("sink-examp.txt") ## capture all the output to a file. zz <- file("all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## back to the console sink(type = "message") sink() file.show("all.Rout") I can not open the file in Windows Explorer. The error message is: "Cannot open file. File is in use be another proces." How can I close the file in a manner that I can open it right after it was created? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: sink(): Cannot open file
Hi Jim, thanks for your reply. ad 1) "all.Rout" was created in the correct directory. It exists properly with correct file properties on Windows, e.g. creation date and time and file size information. ad 2) I can not access the file with Notepad.exe directly after it was created by R. The error message is (translated): "Cannot access file "all.Rout". The file is opened by another process." ad 3) If I close R completely the file access is released. Then I can read the file using Notepad.exe. The contents is: Error in log("a") : non-numeric argument to mathematical function I tried close(zz) but the error persists. To me it looks like R is still accessing the file and not releasing the connection for other programs. close(zz) should have solved the problem but unfortantely it doesn't. What else could I try? Kind regards Georg Von:Jim Lemon An: g.maub...@weinwolf.de, Kopie: r-help mailing list Datum: 10.05.2016 12:50 Betreff:Re: [R] sink(): Cannot open file Hi Georg, I don't suppose that you have: 1) checked that the file "all.Rout" exists somewhere? 2) if so, looked at the file with Notepad, perhaps? 3) let us in on the secret by pasting the contents of "all.Rout" into your message if it is not too big? At a guess, trying: close(zz) might get you there. Jim On Tue, May 10, 2016 at 5:25 PM, wrote: > Hi All, > > I would like to route the output to a file using sink(). When using the > example from the ?sink documentation: > > sink("sink-examp.txt") > i <- 1:10 > outer(i, i, "*") > sink() > unlink("sink-examp.txt") > > ## capture all the output to a file. > zz <- file("all.Rout", open = "wt") > sink(zz) > sink(zz, type = "message") > try(log("a")) > ## back to the console > sink(type = "message") > sink() > file.show("all.Rout") > > I can not open the file in Windows Explorer. The error message is: > > "Cannot open file. File is in use be another proces." > > How can I close the file in a manner that I can open it right after it was > created? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Re: sink(): Cannot open file
Hi Jim, I tried: sink("all.Rout") try(log("a")) sink() The program executes without warning or error. The file "all.Rout" is begin created. Nothing will be written to it. The file is accessable rights after the execution of the program by notepad.exe. The program zz <- file("all.Rout", open = "wt") sink(zz, type = "message") try(log("a")) sink() close(zz) unlink(zz) creates the file, does not write anything to it and is not accessable after program execution in R with notepad.exe. Any ideas what happens behind the szenes? Kind regards Georg Von:Jim Lemon An: g.maub...@weinwolf.de, Kopie: r-help mailing list Datum: 10.05.2016 13:16 Betreff:Re: Re: [R] sink(): Cannot open file Have you tried: sink("all.Rout") try(log("a")) sink() Jim On Tue, May 10, 2016 at 9:05 PM, wrote: > Hi Jim, > > thanks for your reply. > > ad 1) > "all.Rout" was created in the correct directory. It exists properly with > correct file properties on Windows, e.g. creation date and time and file > size information. > > ad 2) > I can not access the file with Notepad.exe directly after it was created > by R. The error message is (translated): > > "Cannot access file "all.Rout". The file is opened by another process." > > ad 3) > If I close R completely the file access is released. Then I can read the > file using Notepad.exe. The contents is: > > Error in log("a") : non-numeric argument to mathematical function > > I tried > > close(zz) > > but the error persists. > > To me it looks like R is still accessing the file and not releasing the > connection for other programs. close(zz) should have solved the problem > but unfortantely it doesn't. > > What else could I try? > > Kind regards > > Georg > > > > > Von:Jim Lemon > An: g.maub...@weinwolf.de, > Kopie: r-help mailing list > Datum: 10.05.2016 12:50 > Betreff:Re: [R] sink(): Cannot open file > > > > Hi Georg, > I don't suppose that you have: > > 1) checked that the file "all.Rout" exists somewhere? > > 2) if so, looked at the file with Notepad, perhaps? > > 3) let us in on the secret by pasting the contents of "all.Rout" into > your message if it is not too big? > > At a guess, trying: > > close(zz) > > might get you there. > > Jim > > On Tue, May 10, 2016 at 5:25 PM, wrote: >> Hi All, >> >> I would like to route the output to a file using sink(). When using the >> example from the ?sink documentation: >> >> sink("sink-examp.txt") >> i <- 1:10 >> outer(i, i, "*") >> sink() >> unlink("sink-examp.txt") >> >> ## capture all the output to a file. >> zz <- file("all.Rout", open = "wt") >> sink(zz) >> sink(zz, type = "message") >> try(log("a")) >> ## back to the console >> sink(type = "message") >> sink() >> file.show("all.Rout") >> >> I can not open the file in Windows Explorer. The error message is: >> >> "Cannot open file. File is in use be another proces." >> >> How can I close the file in a manner that I can open it right after it > was >> created? >> >> Kind regards >> >> Georg >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open file
Hi Sarah, John, Jim, Hi All, I have set my envrionment variable path <- file.path("H:", "2016", "Analysis") setwd(dir = path) This works well cause the file is created in that directory. I have tried close(zz) unlink(zz) and neither worked nor did it work out using them together. I had this before when working with IBM SPSS Statistics. There was a workaround for the problem in SPSS. Is there one for R? Kind regards Georg Von:Sarah Goslee An: g.maub...@weinwolf.de, Kopie: r-help mailing list Datum: 10.05.2016 17:17 Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file Try closing the type of sink you're actually opening: zz <- file("all.Rout", open = "wt") sink(zz, type = "message") try(log("a")) sink(type = "message") close(zz) unlink(zz) If you look carefully at the example in?sink, there are two close statements, one for each stream being sent to that file. Sarah - Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 10.05.2016 18:29 - Von:"John Sorkin" An: , , Kopie: Datum: 10.05.2016 17:20 Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file George, I do not know what operating system you are working with, but when I use sink() under windows, I need to specify a valid path which I don't see in your code. I might, for example specify: sink("c:\myfile.txt") R code goes here sink() with the expectation that I would create a file myfile.txt that would contain the output of my R program. John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) On Tue, May 10, 2016 at 11:05 AM, wrote: > Hi Jim, > > I tried: > > sink("all.Rout") > try(log("a")) > sink() > > The program executes without warning or error. The file "all.Rout" is > begin created. Nothing will be written to it. The file is accessable > rights after the execution of the program by notepad.exe. > > The program > > zz <- file("all.Rout", open = "wt") > sink(zz, type = "message") > try(log("a")) > sink() > close(zz) > unlink(zz) > > creates the file, does not write anything to it and is not accessable > after program execution in R with notepad.exe. > > Any ideas what happens behind the szenes? > > Kind regards > > Georg > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Re: Antwort: Re: Re: sink(): Cannot open file
Hi Sarah, yes, I followed your suggestion. If I do exactly what is in the example of the documentation: sink("C:/Temp/sink-examp.txt") i <- 1:10 outer(i, i, "*") sink() unlink("C:/Temp/sink-examp.txt") it does not write anything, i. e. no file is created in "C:/Temp/". The script is executed without an error or warning message. If I run ## capture all the output to a file. zz <- file("C:/Temp/all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## back to the console sink(type = "message") # I think ,this was your suggestion sink() unlink("C:/Temp/all.Rout") the script is executed without error or warning message, the file is created in "C:/Temp/" but if I try to open it right away after the script is done the message DE: "Auf das Dokument "C:\Temp\all.Rout" kann nicht zugegriffen werden, da es von einer anderen Anwendung verwendet wird." EN: "Cannot access the document "C:\Temp\all.Rout" cause it is used by another application." What do I do wrong? Kind regards Georg Von:Sarah Goslee An: g.maub...@weinwolf.de, Datum: 10.05.2016 18:46 Betreff:Re: Re: [R] Antwort: Re: Re: sink(): Cannot open file On Tue, May 10, 2016 at 12:34 PM, wrote: > sink(type = "message") But did you do that ^^ as I suggested? If you start a message sink with sink(zz, type="message") as you did, you need to explicitly close that stream. Just using sink() doesn't do it. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open file
Duncan, thanks for the hint. I have done it correctly in R fashion ## capture all the output to a file. zz <- file("C:/Temp/all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## back to the console sink(type = "message") sink() unlink("C:/Temp/all.Rout") But the error persits. Kind regards Georg Von:Duncan Murdoch An: John Sorkin , drjimle...@gmail.com, g.maub...@weinwolf.de, Kopie: r-help@r-project.org Datum: 10.05.2016 19:03 Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file On 10/05/2016 11:15 AM, John Sorkin wrote: > George, > I do not know what operating system you are working with, but when I use sink() under windows, I need to specify a valid path which I don't see in your code. I might, for example specify: > > sink("c:\myfile.txt") Note that the backslash should be doubled (so it isn't interpreted as an escape for the "m" that follows it), or replaced with a forward slash. Duncan Murdoch > R code goes here > sink() > > with the expectation that I would create a file myfile.txt that would contain the output of my R program. > > John > > > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > >>> 05/10/16 11:10 AM >>> > Hi Jim, > > I tried: > > sink("all.Rout") > try(log("a")) > sink() > > The program executes without warning or error. The file "all.Rout" is > begin created. Nothing will be written to it. The file is accessable > rights after the execution of the program by notepad.exe. > > The program > > zz <- file("all.Rout", open = "wt") > sink(zz, type = "message") > try(log("a")) > sink() > close(zz) > unlink(zz) > > creates the file, does not write anything to it and is not accessable > after program execution in R with notepad.exe. > > Any ideas what happens behind the szenes? > > Kind regards > > Georg > > > > > Von: Jim Lemon > An: g.maub...@weinwolf.de, > Kopie: r-help mailing list > Datum: 10.05.2016 13:16 > Betreff: Re: Re: [R] sink(): Cannot open file > > > > Have you tried: > > sink("all.Rout") > try(log("a")) > sink() > > Jim > > On Tue, May 10, 2016 at 9:05 PM, wrote: > > Hi Jim, > > > > thanks for your reply. > > > > ad 1) > > "all.Rout" was created in the correct directory. It exists properly with > > correct file properties on Windows, e.g. creation date and time and file > > size information. > > > > ad 2) > > I can not access the file with Notepad.exe directly after it was created > > by R. The error message is (translated): > > > > "Cannot access file "all.Rout". The file is opened by another process." > > > > ad 3) > > If I close R completely the file access is released. Then I can read the > > file using Notepad.exe. The contents is: > > > > Error in log("a") : non-numeric argument to mathematical function > > > > I tried > > > > close(zz) > > > > but the error persists. > > > > To me it looks like R is still accessing the file and not releasing the > > connection for other programs. close(zz) should have solved the problem > > but unfortantely it doesn't. > > > > What else could I try? > > > > Kind regards > > > > Georg > > > > > > > > > > Von: Jim Lemon > > An: g.maub...@weinwolf.de, > > Kopie: r-help mailing list > > Datum: 10.05.2016 12:50 > > Betreff: Re: [R] sink(): Cannot open file > > > > > > > > Hi Georg, > > I don't suppose that you have: > > > > 1) checked that the file "all.Rout" exists somewhere? > > > > 2) if so, looked at the file with Notepad, perhaps? > > > > 3) let us in on the secret by pasting the contents of "all.Rout" into > > your message if it is not too big? > > > > At a guess, trying: > > > > close(zz) > > > > might get you there. > > > > Jim > > > > On Tue, May 10, 2016 at 5:25 PM, wrote: > >> Hi All, > >> > >> I would like to route the output to a file using sink(). When using the > >> example from the ?sink documentation: > >> > >> sink("sink-examp.txt") > >> i <- 1:10 > >> outer(i, i, "*") > >> sink() > >> unlink("sink-examp.txt") > >> > >> ## capture all the output to a file. > >> zz <- file("all.Rout", open = "wt") > >> sink(zz) > >> sink(zz, type = "message") > >> try(log("a")) > >> ## back to the console > >> sink(type = "message") > >> sink() > >> file.show("all.Rout") > >> > >> I can not open the file in Windows Explorer. The error message is: > >> > >> "Cannot open file. File is in use be another proces." > >> > >> How can I close the file in a manner that I can open it right after it > > was > >> created? > >> > >> Kind regards > >> > >> Georg > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEAS
[R] Antwort: Re: Antwort: Re: Antwort: Re: Re: sink(): Cannot open file (SOLVED)
Hi Henrik, Jim, Sarah, Duncan, Hi All, I have tried the built-in solution using PowerShell: $lockedFile="C:\Windows\System32\wshtcpip.dll" Get-Process | foreach{$processVar = $_;$_.Modules | foreach{if($_.FileName -eq $lockedFile){$processVar.Name + " PID:" + $processVar.id}}} It did not show any processes. Then I tried the solution using "RessourceMonitor". There I found two processes: rstudio.exe rsession.exe Right-clicking on rstudio.exe and selecting "Warteschlange analysieren" (= analyse queue?) showed nothing. Right-clicking on rsession.exe and selecting "Warteschlage" said: "Mindestens ein Thread von rsession.exe wartet auf die Fertigstellung von Netzwerk E/A". (= "At least one thread of "rsession.exe" is waiting for finishing a network i/o operation"). Putting rsession.exe into the search field of the handles tap of RessourceMonitor gave no results. No handles were identified. I can not follow the suggestions where installation of software is required due to security rules of the company I work for. I had a look at different R versions on my machine: 1) R i386 3.2.2 2) R i386 3.2.4 (revised) 3) R i386 3.2.5 4) R x54 3.2.2 5) R x64 3.2.4 (revised) 6) R x64 3.2.5 I did ## capture all the output to a file. zz <- file("C:/Temp/all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## back to the console sink(type = "message") sink() unlink("C:/Temp/all.Rout") on R i386 3.2.2 and R x64 3.2.2 directly without RStudio. In both cases the file was locked. Adding close(zz) solved the problem in both versions. Encouraged by this I tired (successivly refered to as "complete code") ## capture all the output to a file. zz <- file("C:/Temp/all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## back to the console sink(type = "message") sink() unlink("C:/Temp/all.Rout") close(zz) on R i386 3.2.4 (revised) and R x64 3.2.4 (revised) without RStudio. Works in both cases. The same with R i386 3.2.5 and R x64 3.2.5 each without RStudio. It did the same with RStudio altering the R version in the RStudio session using "complete code". The results are: R i386 3.2.2: OK R. x64 3.2.2: OK R i386 3.2.4 (revised): OK R x64 3.2.4 (revised): OK R i386 3.2.5: OK R x64 3.2.5: OK This got me lost. I had tried the complete code the last days a hundred times. It never worked. Then I restarted my machine powering up RStudio x64 3.2.5 using the "complete code" and ... it worked. I have no idea what was wrong the last days. As far as I can say today the documentation of ?sink in R is currently ## capture all the output to a file. zz <- file("all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## back to the console sink(type = "message") sink() file.show("all.Rout") and should be - in my opinion - supplemented with close(zz). Any thoughts? Kind regards Georg Von:Henrik Bengtsson An: g.maub...@weinwolf.de, Kopie: Duncan Murdoch , "r-help@r-project.org" Datum: 11.05.2016 21:48 Betreff:Re: [R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open file Sounds like it would be helpful to find out exactly which process is holding on to the file in order to figure out what's going on. From a quick look, it seems that http://superuser.com/questions/117902/find-out-which-process-is-locking-a-file-or-folder-in-windows gives some useful info on how to track down the process that looks the file. /Henrik On Wed, May 11, 2016 at 9:47 AM, wrote: > Duncan, > > thanks for the hint. > > I have done it correctly in R fashion > > ## capture all the output to a file. > zz <- file("C:/Temp/all.Rout", open = "wt") > sink(zz) > sink(zz, type = "message") > try(log("a")) > ## back to the console > sink(type = "message") > sink() > unlink("C:/Temp/all.Rout") > > But the error persits. > > Kind regards > > Georg > > > > > Von:Duncan Murdoch > An: John Sorkin , drjimle...@gmail.com, > g.maub...@weinwolf.de, > Kopie: r-help@r-project.org > Datum: 10.05.2016 19:03 > Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file > > > > On 10/05/2016 11:15 AM, John Sorkin wrote: >> George, >> I do not know what operating system you are working with, but when I use > sink() under windows, I need to specify a valid path which I don't see in > your code. I might, for example specify: >> >> sink("c:\myfile.txt") > > Note that the backslash should be doubled (so it isn't interpreted as an > escape for the "m" that follows it), or replaced with a forward slash. > > Duncan Murdoch > >> R code goes here >> sink() >> >> with the expectation that I would create a file myfile.txt that would > contain the output of my R program. >> >> John >> >> >> John David Sorkin M.D., Ph.D. >> Professor of Medicine >> Chief, Biostatistics and Informatics >> University of Maryland School of Medicine Division of Gerontology and > Geriatric Medicine >> Baltimore VA Medical Center >> 10 North Greene Street >> GRECC (BT/18/GR) >> Ba
[R] Antwort: Antwort: Re: Re: Antwort: Re: Re: sink(): Cannot open file
Hi Martin, many thanks for following-up on my question. I did it again: ## capture all the output to a file. zz <- file("C:/Temp/all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) ## back to the console sink(type = "message") sink() close(zz) This works. I tried several other combinations of the commands, e.g. ## capture all the output to a file. zz <- file("C:/Temp/all.Rout", open = "wt") sink(zz) sink(zz, type = "message") try(log("a")) close(zz) Does not work. As far as I have understood right now, I have to loosen the connection of the streams with sink(zz, type = "message") and sink() before I can close the file connection itself. If I did it like in the last example the connection to the file is lost and then the connection to the streams of sink() can not be recovered. This will last until the R session is closed and opened again. To me it looks like I need to learn more about the operation of R under the hood. Kind regards Georg Von:Martin Maechler An: , Kopie: Sarah Goslee , Datum: 12.05.2016 10:40 Betreff:[R] Antwort: Re: Re: Antwort: Re: Re: sink(): Cannot open file > Hi Sarah, > yes, I followed your suggestion. I doubt that you followed it correctly. Sarah's advise is usually really very sound -- and your code below is *not* : > If I do exactly what is in the example of the documentation: > sink("C:/Temp/sink-examp.txt") > i <- 1:10 > outer(i, i, "*") > sink() > unlink("C:/Temp/sink-examp.txt") > it does not write anything, i. e. no file is created in "C:/Temp/". The > script is executed without an error or warning message. Well, did you ever lookup what unlink() does ? I save you the time : it does *REMOVE* a file. So no wonder that you don't see any result after executing the above R code block.. Martin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to plot a bunch of dichotomous code variables in one plot using ggplot2
Hi All, I have a bunch of dichotomous code variables which shall be plotted in one graph using one of their values, this is "1" in this case. The dataset looks like this: -- cut -- var1 <- c(1,0,1,0,0,1,1,1,0,1) var2 <- c(0,1,1,1,1,0,0,0,0,0) var3 <- c(1,1,1,1,1,1,1,1,0,1) ds <- data.frame(var1, var2, var3) -- cut -- I would like to have a bar plot like this * * * * * * * * * * * * * * * * * * * * - var1 var2 var3 If this possible in R? If so, how can I achieve this? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: RE: How to plot a bunch of dichotomous code variables in one plot using ggplot2
Hi Bob, Hi John, Hi readers, many thanks for your reply. I did barplot(colSums(dataset %>% select(FirstVar:LastVar))) and it worked fine. How would I do it with ggplot2? Kind regards Georg Von:"Fox, John" An: "g.maub...@weinwolf.de" , Kopie: "r-help@r-project.org" Datum: 05.10.2016 15:01 Betreff:RE: [R] How to plot a bunch of dichotomous code variables in one plot using ggplot2 Dear Georg, How about barplot(colSums(ds)) ? Best, John - John Fox, Professor McMaster University Hamilton, Ontario Canada L8S 4M4 Web: socserv.mcmaster.ca/jfox > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of > g.maub...@weinwolf.de > Sent: October 5, 2016 8:47 AM > To: r-help@r-project.org > Subject: [R] How to plot a bunch of dichotomous code variables in one plot > using ggplot2 > > Hi All, > > I have a bunch of dichotomous code variables which shall be plotted in one > graph using one of their values, this is "1" in this case. > > The dataset looks like this: > > -- cut -- > var1 <- c(1,0,1,0,0,1,1,1,0,1) > var2 <- c(0,1,1,1,1,0,0,0,0,0) > var3 <- c(1,1,1,1,1,1,1,1,0,1) > > ds <- data.frame(var1, var2, var3) > -- cut -- > > I would like to have a bar plot like this > > > > * > * > * > * > * * > * * > * * * > * * * > * * * > * * * > - > var1 var2 var3 > > If this possible in R? If so, how can I achieve this? > > Kind regards > > Georg > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Documenting a function using roxygen2
Hi All, I began to document my functions using roxygen2. This is an example of a function I would like to write for training and testing purposes: t_simple_table <- function(variable, useNA = TRUE, print = FALSE) { #' @title Create a simple table for one variable. #' #' @description t_simple_table() creates absolute and relative #' frequencies, cumulative sums and column sums for both as well as #' overall statistics about valid N and missing values. #' #' #' @param variable (vector, list, data.frame): variable the table is #' created for. #' @param useNA (logical): flag to include or exclude missing values #' from the computation. #' @param print (logical): flag to print/not print a table before #' returning it as an object. #' #' @operation #' Coerces the given variable to a factor. #' If useNA = TRUE NA is also transformed to a valid value, #' if useNA = FALSE it is disregarded in all operations. #' #' @return Returns a table with the following statistics: #' #' Frequencies Percent Cumulative #' Percent #' Valid . . #' Missing . . #' Total . 100 #' Categories #' Cat 1 . .. #' Cat 2 . .. #' Cat 3 . .. #' ... . . 100 #' Total . 100 #' #' @errorhandling None #' #' @version "0.1" #' #' @created "2016-10-11" #' @updated "2016-10-11" #' #' @status development #' #' @see Manderscheid: Sozialwissenschaftliche Datenanalyse mit R, #' p. 79ff #' #' @author Georg #' #' @license GPL-2 # function body to be defined } Is this a correct header for a function? How could I do better? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Visibility of libraries called from within functions
Hi All, in my R programs I use different libraries to work with Excel sheets, i. e. xlsx, excel.link. When running chunks of code repeatedly and not always in the order the program should run for development purposes I ran into trouble. There were conflicts between the methods within these functions causing R to crash. I thought about defining functions for the different task and calling the libraries locally to there functions. Doing this test -- cut -- f_test <- function() { library(xlsx) cat("Loaded packages AFTER loading library") print(search()) } cat("Loaded packages BEFORE function call ") search() f_test() cat("Loaded packages AFTER function call -") search() -- cut -- showed that the library "xlsx" was loaded into the global environment and stayed there although I had expected R to unload the library when leaving the function. Thus confilics can occur more often. I had a look into ?library and saw that there is no argument telling R to hold the library in the calling environment. How can I load libraries locally to the calling functions? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Visibility of libraries called from within functions
Hi Duncan, many thanks for your reply. Your suggestion of using requireNamespace() together with explicit namespace calling using the "::" operator is what I was looking for: -- cut -- f_test <- function() { requireNamespace("openxlsx") cat("Loaded packages AFTER loading library") print(search()) xlsx::read.xlsx(file = "c:/temp/test.xlsx", sheetName = "test") } cat("Loaded packages BEFORE function call ") search() f_test() cat("Loaded packages AFTER function call -") search() -- cut -- When reading ?requireNamespace I did not really get how R operates behind the scenes. Using "library" attaches the namespace to the search path. Using "requireNamespace" does not do that. But how does R find the namespace then? What kind of list or directory used R to to store the namespace and lookup the correct function or methods of this namespace? Kind regards Georg Von:Duncan Murdoch An: g.maub...@weinwolf.de, r-help@r-project.org, Datum: 13.10.2016 10:43 Betreff:Re: [R] Visibility of libraries called from within functions On 13/10/2016 4:18 AM, g.maub...@weinwolf.de wrote: > Hi All, > > in my R programs I use different libraries to work with Excel sheets, i. > e. xlsx, excel.link. > > When running chunks of code repeatedly and not always in the order the > program should run for development purposes I ran into trouble. There were > conflicts between the methods within these functions causing R to crash. > > I thought about defining functions for the different task and calling the > libraries locally to there functions. Doing this test > > -- cut -- > > f_test <- function() { > library(xlsx) > cat("Loaded packages AFTER loading library") > print(search()) > } > > cat("Loaded packages BEFORE function call ") > search() > > f_test() > > cat("Loaded packages AFTER function call -") > search() > > -- cut -- > > showed that the library "xlsx" was loaded into the global environment and > stayed there although I had expected R to unload the library when leaving > the function. Thus confilics can occur more often. > > I had a look into ?library and saw that there is no argument telling R to > hold the library in the calling environment. > > How can I load libraries locally to the calling functions? You can detach at the end of your function, but that's tricky to get right: the package might have been on the search list before your function was called. It's better not to touch the search list at all. The best solution is to use :: notation to get functions without putting them on the search list. For example, use xlsx::write.xlsx(data, file) If you are not sure if your user has xlsx installed, you can use requireNamespace() to check. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: Antwort: Re: Visibility of libraries called from within functions
Von:Duncan Murdoch An: g.maub...@weinwolf.de, r-help@r-project.org, Datum: 13.10.2016 12:34 Betreff:Re: Antwort: Re: [R] Visibility of libraries called from within functions On 13/10/2016 6:21 AM, g.maub...@weinwolf.de wrote: > Hi Duncan, > > many thanks for your reply. > > Your suggestion of using requireNamespace() together with explicit > namespace calling using the "::" operator is what I was looking for: > > -- cut -- > > f_test <- function() { > requireNamespace("openxlsx") > cat("Loaded packages AFTER loading library") > print(search()) > xlsx::read.xlsx(file = "c:/temp/test.xlsx", > sheetName = "test") > } Not sure if that's a typo in your message or a real error, but you require "openxlsx" and then use "xlsx". It's a typo! > > cat("Loaded packages BEFORE function call ") > search() > > f_test() > > cat("Loaded packages AFTER function call -") > search() > > -- cut -- > > When reading ?requireNamespace I did not really get how R operates behind > the scenes. > > Using "library" attaches the namespace to the search path. Using > "requireNamespace" does not do that. > > But how does R find the namespace then? What kind of list or directory > used R to to store the namespace and lookup the correct function or > methods of this namespace? R has an internal list of packages that are loaded. Functions in them are only visible to user code if the package is *also* on the search list, or if the package name prefix is used with ::. Can I have a look at this internal list like I can do with search() for pachages or ls() for objects? If xlsx is loaded, xlsx::read.xlsx will just use it; if it is not loaded, the package will be loaded to make the call. So you don't need the requireNamespace call if you can be sure that xlsx will be found. You would normally use its return value (FALSE if the package is not found) to test whether it will be safe to make the xlsx::read.xlsx call. Got it! Duncan Murdoch > > Kind regards > > Georg > > > > > Von:Duncan Murdoch > An: g.maub...@weinwolf.de, r-help@r-project.org, > Datum: 13.10.2016 10:43 > Betreff:Re: [R] Visibility of libraries called from within > functions > > > > On 13/10/2016 4:18 AM, g.maub...@weinwolf.de wrote: >> Hi All, >> >> in my R programs I use different libraries to work with Excel sheets, i. >> e. xlsx, excel.link. >> >> When running chunks of code repeatedly and not always in the order the >> program should run for development purposes I ran into trouble. There > were >> conflicts between the methods within these functions causing R to crash. >> >> I thought about defining functions for the different task and calling > the >> libraries locally to there functions. Doing this test >> >> -- cut -- >> >> f_test <- function() { >> library(xlsx) >> cat("Loaded packages AFTER loading library") >> print(search()) >> } >> >> cat("Loaded packages BEFORE function call ") >> search() >> >> f_test() >> >> cat("Loaded packages AFTER function call -") >> search() >> >> -- cut -- >> >> showed that the library "xlsx" was loaded into the global environment > and >> stayed there although I had expected R to unload the library when > leaving >> the function. Thus confilics can occur more often. >> >> I had a look into ?library and saw that there is no argument telling R > to >> hold the library in the calling environment. >> >> How can I load libraries locally to the calling functions? > > You can detach at the end of your function, but that's tricky to get > right: the package might have been on the search list before your > function was called. It's better not to touch the search list at all. > > The best solution is to use :: notation to get functions without putting > them on the search list. For example, use > > xlsx::write.xlsx(data, file) > > If you are not sure if your user has xlsx installed, you can use > requireNamespace() to check. > > Duncan Murdoch > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reshaping geographic data
Hi All, I need to reshape an ESRI shape file: http://arnulf.us/PLZ and resp http://www.metaspatial.net/download/plz.tar.gz I found an instruction for T-SQL Server: https://blog.oraylis.de/2010/05/german-map-spatial-data-for-plz-postal-code-regions/ How can I do this using R? Kind regards Georg -- cut -- Here's my code so far: download.file( url = "http://www.metaspatial.net/download/plz.tar.gz";, destfile = "C:/temp/plz.tar.gz") untar(tarfile = "C:/temp/plz.tar.gz", exdir = "C:/temp", compressed = "gzip") __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Storing long string with white space in variable
Hi All, I would like to store a long string with white space in a variable: -- cut -- # Create README.md readme <- "--- title: "Your project title here" author: "Author(s) name(s) here" date: "Current date here" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, cache = FALSE) ``` # Project Context # Goals # Approach # Reference to main program ´´´{r} source("main_program.R") ´´´ # Information on used system and configuration ```{r} cat("Gathering system information ...\n) sessionInfo() ``` " cat(readme, file = "README.md") -- cut -- I am looking for an equivalent to Pythons """ """ long string feature. I searched the web and found this: http://stackoverflow.com/questions/6329962/split-code-over-multiple-lines-in-an-r-script https://stat.ethz.ch/pipermail/r-help/2006-October/115358.html But this is not the solution to the problem. How can I store long strings with white space in a variable? Kind regards Georg PS: This is a template for a project folder for each project. I would like to create it with R script instead of distributing it as a template file. This way one needs only the R script to setup a project like this: #--- # Module: t_setup_project_directory.R # Author: Georg Maubach # Date : 2016-10-19 # Update: 2016-10-19 # Description : Setup a directory structure for a new analytics # project # Source System : R 3.3.0 (64 Bit) # Target System : R 3.3.0 (64 Bit) # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #1-2-3-4-5-6-7-- t_version = "2016-10-19" t_module_name = "t_setup_project_directory.R" t_status = "development" cat( paste0( "\n", t_module_name, " (Version: ", t_version, ", Status: ", t_status, ")", "\n", "\n", "Copyright (C) Georg Maubach 2016 This software comes with ABSOLUTELY NO WARRANTY.", "\n", "\n" ) ) library(svDialogs) # If do_test is not defined globally define it here locally by un-commenting it t_do_test <- FALSE # [ Function Defintion ] t_setup_project_directory <- function() { #- # Setup a directory structure for a new analytics # # Args: # None. # # Operation: # The user can create or select a directory for the projects files. # The function then places all sub directories in this project # folder. # The function saves a RData file with objects containing the path # to project directory and its sub folders. # # Returns: # Nothing. # # Error handling: # None. # # See also: # ./. #- # Get and/or create project directory v_project_dir <- svDialogs::dlgDir()$res # Define names for sub directories data <- "data" # data to be loaded into or # saved from R documentation <- "documentation"# explanatory material for results # (e. g. knitR documents) fundamentals <- "fundamentals" # background knowledge input <- "data/input" # input data eventually manually # revised for import meta <- "data/meta"# meta data (e. g. lookup tables) output<- "data/output" raw <- "data/raw" # a copy of all input data never # touched for safety reasons and # not read by R program <- "program" # all scripts and runnable files modules <- "program/modules" # project specific packages, files # or functions in separate files as # well as all other sub routines to # be sourced or loaded results <- "results" # container for all resulring data # in an aggregated form graphics <- "results/graphics" tables<- "results/tables" presentations <- "results/presentations" temp <- "temp" v_paths_relative <- list( project = v_project_dir, documentation = documentation, fundamentals = fundamentals, input = input, meta = meta, output= output, raw = raw, program = program, modules = modules, graphic = graphics, table = tables, presentation = prese
[R] openxlsx Error: length of rows and cols must be
Hi All, when using -- cut -- number_style <- openxlsx::createStyle( numFmt = "COMMA" ) openxlsx::addStyle( wb = xlsx_workbook, sheet = "Kundenliste", style = number_style, rows = 2:nrow(customer_list), cols = 4:5 ) --cut -- I get the error Error in openxlsx::addStyle(wb = xlsx_workbook, sheet = "Kundenliste", : Length of rows and cols must be equal. The customer_list can be of any arbritrary length due to subgroup definitons. I do not see why the argument "rows" and "cols" should be of the same length. This would mean that number formatting can only be done for rectangular areas. What do I need to change to format my numbers in the given area correctly? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Different results when converting a matrix to a data.frame
Hi All, I build an empty dataframe to fill it will values later. I did the following: -- cut -- matrix(NA, 2, 2) [,1] [,2] [1,] NA NA [2,] NA NA > data.frame(matrix(NA, 2, 2)) X1 X2 1 NA NA 2 NA NA > as.data.frame(matrix(NA, 2, 2)) V1 V2 1 NA NA 2 NA NA -- cut -- Why does data.frame deliver different results than as.data.frame with regard to the variable names (V instead of X)? Kind regards Georg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] for loop is looping only once
Hi All, I need to execute a loop on variables to compute several KPIs. Unfortunately the for loop is executed only once for the last KPI given. The code below illustrates my current solution but is not completely necessary to spot the problem. I just give an idea what I am doing overall. Looks much but isn't if copied and run in RStudio. The problem occurs in function f_create_kpi_table() in lines 150 to 157: for (item in length(kpis)) # This loop runs only once! { print(kpis[[item]]) ds_kpi <- f_compute_kpi( years= years, kpi = kpis[[item]], kpi_base = kpi_bases[[item]]) print(ds_kpi) Here is the complete example code with example data: - cut -- dataset <- structure( list( to_2012 = c( 85, 822, 891, 700, 386, 127, 938, 381, 871, 254, 793, 0, 934, 217, 163, 755, 607, 794, 477 ), to_2013 = c( 289, 0, 963, 243, 608, 47, 0, 941, 998, 775, 326, 0, 0, 470, 248, 439, 212, 0, 0 ), to_2014 = c(0, 0, 71, 0, 0, 434, 0, 282, 0, 0, 405, 0, 0, 642, 0, 0, 0, 47, 299), to_2015 = c( 705, 134, 659, 0, 609, 807, 783, 0, 0, 304, 141, 500, 0, 0, 764, 790, 851, 0, 802 ), kpi1_2013 = c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1), kpi1_2014 = c(1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0), kpi1_2015 = c(0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0), kpi1_2016 = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1), kpi2_2013 = c(1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0), kpi2_2014 = c(0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1), kpi2_2015 = c(1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1), kpi2_2016 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0) ), .Names = c( "to_2012", "to_2013", "to_2014", "to_2015", "kpi1_2013", "kpi1_2014", "kpi1_2015", "kpi1_2016", "kpi2_2013", "kpi2_2014", "kpi2_2015", "kpi2_2016" ), row.names = c(NA, 19L), class = "data.frame" ) f_compute_kpi <- function( years, kpi, kpi_base) { print(years) print(kpi) print(kpi_base) ds_result <- data.frame() for (year in years) { current_year <- year previous_year <- year - 1 result <- sum(dataset[dataset[[paste0(kpi, "_", current_year)]] == 1 , paste0(kpi_base, "_", previous_year)], na.rm = TRUE) ds_result <- rbind(ds_result, result) } ds_result <- t(ds_result) rownames(ds_result) <- kpi colnames(ds_result) <- years invisible(ds_result) } f_create_kpi_table <- function( years, kpis, kpi_bases) { print(length(kpis)) #-- Problematic loop -- for (item in length(kpis)) # This loop runs only once! { print(kpis[[item]]) ds_kpi <- f_compute_kpi( years= years, kpi = kpis[[item]], kpi_base = kpi_bases[[item]]) print(ds_kpi) } # This for loop is executed only once for kpi2 instead of # as many times as given kpis in length(kpis), i. e. # kpi1 AND kpi2. # Why? # What do I do wrong? } -- cut -- What do I need to change to get the loop work correctly and loop over two elements instead of one when calling the function f_create_kpi_table(years = 2013:2016, kpis = c("kpi1", "kpi2"), kpi_bases = c("to", "to")) Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Antwort: Re: for loop is looping only once [SOLVED]
Hi Ulrik, oh no! What a mistake did I make. But I definitely did not see the failure. Many thanks for helping me. Kind regards Georg Von:Ulrik Stervbo An: g.maub...@weinwolf.de, r-help@r-project.org, Datum: 17.11.2016 12:24 Betreff:Re: [R] for loop is looping only once Hi Georg, Your for loop iterates over just one value, to get it to work as you intend use for(item in 1:length(kpis)){} HTH Ulrik On Thu, 17 Nov 2016 at 12:18 wrote: Hi All, I need to execute a loop on variables to compute several KPIs. Unfortunately the for loop is executed only once for the last KPI given. The code below illustrates my current solution but is not completely necessary to spot the problem. I just give an idea what I am doing overall. Looks much but isn't if copied and run in RStudio. The problem occurs in function f_create_kpi_table() in lines 150 to 157: for (item in length(kpis)) # This loop runs only once! { print(kpis[[item]]) ds_kpi <- f_compute_kpi( years= years, kpi = kpis[[item]], kpi_base = kpi_bases[[item]]) print(ds_kpi) Here is the complete example code with example data: - cut -- dataset <- structure( list( to_2012 = c( 85, 822, 891, 700, 386, 127, 938, 381, 871, 254, 793, 0, 934, 217, 163, 755, 607, 794, 477 ), to_2013 = c( 289, 0, 963, 243, 608, 47, 0, 941, 998, 775, 326, 0, 0, 470, 248, 439, 212, 0, 0 ), to_2014 = c(0, 0, 71, 0, 0, 434, 0, 282, 0, 0, 405, 0, 0, 642, 0, 0, 0, 47, 299), to_2015 = c( 705, 134, 659, 0, 609, 807, 783, 0, 0, 304, 141, 500, 0, 0, 764, 790, 851, 0, 802 ), kpi1_2013 = c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1), kpi1_2014 = c(1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0), kpi1_2015 = c(0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0), kpi1_2016 = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1), kpi2_2013 = c(1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0), kpi2_2014 = c(0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1), kpi2_2015 = c(1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1), kpi2_2016 = c(1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0) ), .Names = c( "to_2012", "to_2013", "to_2014", "to_2015", "kpi1_2013", "kpi1_2014", "kpi1_2015", "kpi1_2016", "kpi2_2013", "kpi2_2014", "kpi2_2015", "kpi2_2016" ), row.names = c(NA, 19L), class = "data.frame" ) f_compute_kpi <- function( years, kpi, kpi_base) { print(years) print(kpi) print(kpi_base) ds_result <- data.frame() for (year in years) { current_year <- year previous_year <- year - 1 result <- sum(dataset[dataset[[paste0(kpi, "_", current_year)]] == 1 , paste0(kpi_base, "_", previous_year)], na.rm = TRUE) ds_result <- rbind(ds_result, result) } ds_result <- t(ds_result) rownames(ds_result) <- kpi colnames(ds_result) <- years invisible(ds_result) } f_create_kpi_table <- function( years, kpis, kpi_bases) { print(length(kpis)) #-- Problematic loop -- for (item in length(kpis)) # This loop runs only once! { print(kpis[[item]]) ds_kpi <- f_compute_kpi( years= years, kpi = kpis[[item]], kpi_base = kpi_bases[[item]]) print(ds_kpi) } # This for loop is executed only once for kpi2 instead of # as many times as given kpis in length(kpis), i. e. # kpi1 AND kpi2. # Why? # What do I do wrong? } -- cut -- What do I need to change to get the loop work correctly and loop over two elements instead of one when calling the function f_create_kpi_table(years = 2013:2016, kpis = c("kpi1", "kpi2"), kpi_bases = c("to", "to")) Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.o
[R] openxlsx: No Formatting of Numbers (TEXT ONLY)
Hi All, Dear Readers, I am using openxlsx to export data to Microsoft Excel 2013, 32-Bit, German Version: --- schnipp --- library("openxlsx") dataset <- structure( list( a = c(1126039.81, 45636.44, 14847.41), b = c(1194447.5, 88310.53, 18699.68), c = c(1560307.73, 34203.73, 24755.99), d = c(1068790.67, 67581.86, 12378.55) ), .Names = c("a", "b", "c", "d"), row.names = c(NA, 3L), class = "data.frame" ) xlsx_workbook <- openxlsx::createWorkbook() openxlsx::addWorksheet( wb = xlsx_workbook, sheetName = "Numbers") openxlsx::writeData( wb = xlsx_workbook, sheet = "Numbers", x = dataset, rowNames = TRUE, colNames = TRUE, startRow = 2, startCol = 2, borders = c("surrounding") ) myStyle <- openxlsx::createStyle(numFmt = "###.###.##0") openxlsx::addStyle(wb = xlsx_workbook, sheet = "Numbers", style = myStyle, rows = 1:1, cols = 10:10, gridExpand = TRUE, stack = TRUE) openxlsx::saveWorkbook( wb = xlsx_workbook, file = "C:/temp/openxlsx_example.xlsx", overwrite = TRUE ) --- schnipp --- The problem with this is, that it does not apply the number formats to the Excel cell on the sheet. Also, sometimes the boarder of the data on the Excel sheet is delete. I could not find out yet what the cause for this behaviour is. My sessionInfo() output is: R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=German_Germany.1252 [2] LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] tools stats graphics grDevices utils [6] datasets methods base other attached packages: [1] tidyr_0.5.1stringr_1.1.0 reshape2_1.4.1 [4] openxlsx_3.0.0 dplyr_0.5.0 loaded via a namespace (and not attached): [1] lazyeval_0.2.0 plyr_1.8.4 magrittr_1.5 [4] R6_2.2.0 assertthat_0.1 DBI_0.4-1 [7] tibble_1.1 Rcpp_0.12.5stringi_1.1.1 I do not want to round the numbers in R, cause my clients would like to use them as they are in further calculations. How can I export a dataframe to Excel, print a border around the complete table/dataset (not the single cells) and format the numbers like 123.456.789 (thousand delimiter dot ".", all numbers without decimals)? Kind regards Georg __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.