Re: [R] Archive format

2017-04-08 Thread G . Maubach
Hi Joe,

I have read your question with great interest. I am a little bit astonished to 
read about your project. There is a big national institute in Germany called 
GESIS 
(https://de.wikipedia.org/wiki/GESIS_%E2%80%93_Leibniz-Institut_f%C3%BCr_Sozialwissenschaften)
 which does the same job you are trying to set-up since 1986 now. You could try 
to exchange ideas with them.

Your subject is very complex with regard to reproducible research. You might 
want to have a look at

(1) https://cran.r-project.org/web/views/ReproducibleResearch.html
(2) Gandrud, Christopher: Reproducible Research with R and R Studio 
(https://www.amazon.com/Reproducible-Research-Studio-Second-Chapman/dp/1498715370)

Kind regards

Georg

> Gesendet: Mittwoch, 29. März 2017 um 10:44 Uhr
> Von: "Joe Gain" 
> An: R-help@r-project.org
> Cc: bwfdm-i...@lists.kit.edu
> Betreff: [R] Archive format
>
> Hello,
> 
> we are collecting information on the subject of research data management 
> in German on the webplatform:
> 
> www.forschungsdaten.info
> 
> One of the topics, which we are writing about, is how to *archive* data. 
> Unfortunately, none of us in the project is an expert with respect to R 
> and so I would like to ask the list, what they recommend? A related 
> question is to do with the sharing of data. We have already asked some 
> academics, who have basically replied that they don't really know other 
> than to strongly recommend a plain text format.
> 
> We would also like to know, if members of the list recommend converting 
> formats from commercial software such as S-Plus, Terr, SPSS etc. to an 
> R-compatible format for long term archivation? Are there any general 
> rules and best practices, when it comes to archiving (and sharing) 
> statistical data and statistical programs?
> 
> Any comments would be much appreciated!
> Joe
> 
> -- 
> B 1003
> Kommunikations-, Informations-, Medienzentrum (KIM)
> Universitaet Konstanz
> 
> t: ++49-7531-883234
> e: joe.g...@uni-konstanz.de
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Antwort: Re: Way to Plot Multiple Variables and Change Color

2017-04-10 Thread G . Maubach
Hi Ulrik,

many thanks for your reply. I had to take an unplanned break and was not 
in the office during the last two weeks. Thus my late reply.

I followed your advice and converted the variable in argument "fill" to 
factor. Now the color change works:

-- cut --

d_result <- structure(list("variable" = c("Item 1 (ø = 3.3) ", "Item 1 (ø 
= 3.3) ",
"Item 1 (ø = 3.3) ", "Item 1 (ø = 
3.3) ", "Item 1 (ø = 3.3) ",
"Item 1 (ø = 3.3) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
"Item 2 (ø = 3.8) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
"Item 2 (ø = 3.8) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
"Item 3 (ø = 3.4) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
"Item 3 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
"Item 4 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
"Item 4 (ø = 3.4) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
"Item 5 (ø = 3.5) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
"Item 5 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
"Item 6 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
"Item 6 (ø = 3.5) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
"Item 7 (ø = 3.4) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
"Item 7 (ø = 3.4) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
"Item 8 (ø = 3.3) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
"Item 8 (ø = 3.3) "), value = 
structure(c(1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1 = very 
satisfied",
"2", "3", 
"4", "5", "6 = very dissatified"), class = "factor"),
   n = c(14L, 20L, 24L, 14L, 16L, 14L, 9L, 15L, 
21L, 20L, 14L,
 23L, 19L, 17L, 16L, 14L, 16L, 20L, 22L, 
17L, 15L, 16L, 20L,
 12L, 19L, 15L, 16L, 15L, 18L, 19L, 18L, 
15L, 18L, 18L, 16L,
 17L, 17L, 20L, 17L, 17L, 14L, 16L, 16L, 
25L, 16L, 17L, 8L,
 20L)), .Names = c("variable", "value", 
"n"), row.names =
c(NA,
  -48L), vars = list("variable"), drop = TRUE, 
indices =
list(0:5,
 6:11, 12:17, 18:23, 24:29, 30:35, 36:41, 
42:47),
  group_sizes = c(6L,
  6L, 6L, 6L, 6L, 6L, 6L, 6L),
  biggest_group_size = 6L,
  labels = structure(list(
"variable" = structure(1:8, .Label = c("Item 1 (ø 
= 3.3) ",
 "Item 2 (ø = 
3.8) ", "Item 3 (ø = 3.4) ", "Item 4 (ø = 3.4) ",
 "Item 5 (ø = 
3.5) ", "Item 6 (ø = 3.5) ", "Item 7 (ø = 3.4) ",
 "Item 8 (ø = 
3.3) "), class = "factor")),
row.names = c(NA,
  -8L), class = "data.frame", vars = 
list("variable"),
drop = TRUE, .Names = "variable"),
  class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))

ggplot(
  d_result,
  aes(x = variable, y = n, fill = rev(factor(value +
  geom_bar(
stat = "identity") +
  coord_cartesian(ylim = c(0,100)) +
  coord_flip() +
  scale_y_continuous(name = "Percent") +
  scale_fill_manual(
values = rev(
  c(
"forestgreen", "limegreen",
"gold", "orange1",
"tomato3", "darkred"))) +
  ggtitle(
paste(
  "Question 8: Satisfaction?")) +
  labs(fill = "Rating") +
  scale_x_discrete(
name = element_blank()) +
  # scale_color_manual(
  #   values = rev(
  # c(
  #   "forestgreen", "limegreen",
  #   "gold", "orange1",
  #   "tomato3", "darkred"))) +
  geom_text(
aes(label = n),
color = "white",
position = position_stack(vjust = 0.5)) +
  theme_minimal() +
  theme(
legend.position = "right")

-- cut --

I tried to change the order of the items on the y-axis,  e.g. Item 8 
should be last and Item 1 first. I tried to reverse the order of the items 
within ggplot using rev() 

[R] Antwort: Re: Antwort: Re: Antwort: Re: Way to Plot Multiple Variables and Change Color (SOLVED)

2017-04-11 Thread G . Maubach
Hi David,

many thanks for your answer.

I followed your suggesting and came up with the following code:

-- cut --

ggplot(
  d_result,
  aes(x = variable, y = n, fill = value)) +
  geom_bar(
stat = "identity") +
  coord_cartesian(ylim = c(0,100)) +
  coord_flip() +
  scale_y_continuous(name = "Percent") +
  scale_fill_manual(
values = rev(
  c(
"forestgreen", "limegreen",
"gold", "orange1",
"tomato3", "darkred"))) +
  ggtitle(
paste(
  "Question 8: Some Text")) +
  labs(fill = "Rating") +
  scale_x_discrete(
name = element_blank(),
drop = FALSE) +  # keep factor levels if no value exists
  geom_text(
aes(label = n),
color = "white",
position = position_stack(vjust = 0.5)) +
  theme_minimal() +
  theme(
legend.position = "right") +
  guides(fill = guide_legend(reverse = TRUE))

-- cut --

In addition to your suggestion I changed "fill = rev(factor(value))" to 
"fill = value" and I added

guides(fill = guide_legend(reverse = TRUE))

to get the legend in the order from 1 .. 6 instead of 6 .. 1.

In my data I added the counts (n) before the mean value in the labels of 
the left hand side. Now it looks to me as a version conforming to the 
ESOMAR and BVM standards.

Many thanks again for your help.

Kind regards

Georg




Von:David Winsemius 
An: g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  10.04.2017 22:21
Betreff:Re: [R] Antwort: Re: Antwort: Re: Way to Plot Multiple 
Variables and Change Color




> On Apr 10, 2017, at 1:06 PM, David Winsemius  
wrote:
> 
> 
>> On Apr 10, 2017, at 7:45 AM, g.maub...@weinwolf.de wrote:
>> 
>> Hi Ulrik,
>> 
>> many thanks for your reply. I had to take an unplanned break and was 
not 
>> in the office during the last two weeks. Thus my late reply.
>> 
>> I followed your advice and converted the variable in argument "fill" to 

>> factor. Now the color change works:
>> 
>> -- cut --
>> 
>> d_result <- structure(list("variable" = c("Item 1 (ø = 3.3) ", "Item 1 
(ø = 3.3) ",
>>   "Item 1 (ø = 3.3) ", "Item 1 (ø = 
3.3) ", "Item 1 (ø = 3.3) ",
>>   "Item 1 (ø = 3.3) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
>>   "Item 2 (ø = 3.8) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
>>   "Item 2 (ø = 3.8) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
>>   "Item 3 (ø = 3.4) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
>>   "Item 3 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
>>   "Item 4 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
>>   "Item 4 (ø = 3.4) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
>>   "Item 5 (ø = 3.5) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
>>   "Item 5 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
>>   "Item 6 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
>>   "Item 6 (ø = 3.5) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
>>   "Item 7 (ø = 3.4) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
>>   "Item 7 (ø = 3.4) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
>>   "Item 8 (ø = 3.3) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
>>   "Item 8 (ø = 3.3) "), value = 
>> structure(c(1L, 2L, 3L, 4L, 5L,
>>   6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
>>   4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
>>   2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1 = very 

>> satisfied",
>>   "2", "3", 

>> "4", "5", "6 = very dissatified"), class = "factor"),
>>  n = c(14L, 20L, 24L, 14L, 16L, 14L, 9L, 15L, 
>> 21L, 20L, 14L,
>>23L, 19L, 17L, 16L, 14L, 16L, 20L, 22L, 
>> 17L, 15L, 16L, 20L,
>>12L, 19L, 15L, 16L, 15L, 18L, 19L, 18L, 
>> 15L, 18L, 18L, 16L,
>>17L, 17L, 20L, 17L, 17L, 14L, 16L, 16L, 
>> 25L, 16L, 17L, 8L,
>>20L)), .Names = c("variable", "value", 
>> "n"), row.names =
>>   c(NA,
>> -48L), vars = list("variable"), drop = TRUE, 
>> indices =
>>   list(0:5,
>>6:11, 12:17, 18:23, 24:29, 30:35, 36:41, 
>> 42:47),
>> group_sizes = c(6L,
>> 6L, 6L, 6L, 6L, 6L, 6L, 6L),
>> biggest_group_size = 6L,
>> labels = st

[R] ggplot2: ..n.. and ..count.. in geom_text

2017-04-18 Thread G . Maubach
Hi All,

I have the following code:

-- cut 

(g03_02_p02 <- ggplot(data = d_kzb_input) +
  geom_bar(
mapping = aes(x = v03_02_r01, y = round(..prop.. * 100, 0)),
fill = c_ww_palette["blue"]) +
  scale_y_continuous(limits = c(0, c_y_limit)) +
  theme_classic() +
  ggtitle(paste0("Question 3",
"(n = ", <>, ")")) +  # How can I refer to the number of cases 
for this plot? Is there something like "..n.."?
  xlab("Orders") +
  ylab("Percent") +
  geom_text(
aes(label = ..count..),  # How can I refer to the counts for the 
labels of the columns?
color = "white",
position = position_stack(vjust = 0.5)))

-- cut --

I would like to refer to the internal statistics of the geom_bar():

How can I refer to the number of cases for this plot? Is there something 
like "..n.."?
How can I refer to the counts for the labels of the columns?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Follow-up: RStudio: Place for Storing Options (as plain text)

2017-04-19 Thread G . Maubach
Hi All,

some time ago I asded a question about the places where RStudio stores it 
configuration information. I came across this posting

https://support.rstudio.com/hc/en-us/articles/206382178?version=1.0.136&mode=desktop

explaining RStudio keybindings (predefined and customized). At the end of 
the article is the information that RStudio stores keybindings in

~/.R/rstudio/keybindings/rstudio_commands.json
~/.R/rstudio/keybindings/editor_commands.json

I want to share this with you.

Kind regards

Georg


- Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 19.04.2017 10:10 
-

Von:Georg Maubach/WWBO/WW/HAW
An: R-help mailing list , 
Kopie:  Martin Maechler , Jeff Newmiller 

Datum:  08.03.2017 08:59
Betreff:Follow-up: [R] RStudio: Place for Storing Options (as 
plain text)



Hi All,

I got a late reply from RStudio Support concerning the question where 
RStudio store options and configurations:

-- cut --

The post RStudio Config Files has a new comment. 
. . .
Unfortunately, it's unlikely that we'll be able to provide a programmatic 
R interface in the near future -- the way we lay out and store RStudio's 
client state does not make it as amenable to public consumption as we 
might hope.
That said, you can generally copy everything within that folder to a new 
machine (at the same relative path from the user home directory), and 
expect preferences to be respected + restored as you might expect.
. . .
--cut --

The result of the discussion is:

We can copy the complete RStudio directory for storing options and 
configurations under

%localappdata%\RStudio-Desktop or 
C:\Users\\AppData\Local\RStudio-Desktop

and copy it completely to a new installation of RStudio.

A programmatic approach to edit RStudio options and configurations is not 
possible due to design decisions.

The purpose of the initial question was to find a way to save RStudio 
options and configurations, e g. on git/github or similar. This is 
possible by initialising the above given directory with git or similar.

An open question is what happens if a new RStudio release makes changes to 
the options and configurations. If the stored directory can be completely 
used would need additional clearification, i.e. for each new version.

Kind regards

Georg




Von:Martin Maechler 
An: 
Kopie:   ,
Datum:  23.02.2017 08:37
Betreff:Re: [R] RStudio: Place for Storing Options



> Jeff Newmiller 
> on Sat, 11 Feb 2017 08:09:36 -0800 writes:

> For the record, then, Google listened to my incantation of
> "rstudio configuration file" and the second result was:

> 
https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-Desktop-s-State


> RStudio Desktop is also open source, so you can download
> the source code and look at the operating-system-specific
> bits (for "where") if the above link goes out of date or
> disappears.

Thanks a lot, Jeff!

And for the archives:  On reasonable OS's,  the hidden
directory/folder containing all the info is
  ~/.rstudio-desktop/
and if "things are broken" the recommendation is to rename that
   mv ~/.rstudio-desktop  ~/backup-rstudio-desktop
and (zip and) send along with your e-mail to the experts for diagnosis.


> On Thu, 9 Feb 2017, Martin Maechler wrote:

>> 
>>> Ulrik Stervbo  on Thu, 9
>>> Feb 2017 14:37:57 + writes:
>> 
>> > Hi Georg, > maybe someone here knows, but I think you
>> are more likely to get answers to > Rstudio related
>> questions with RStudio support: >
>> https://support.rstudio.com/hc/en-us
>> 
>> > Best, > Ulrik
>> 
>> Indeed, thank you, Ulrik.
>> 
>> In this special case, however, I'm quite sure many
>> readers of R-help would be interested in the answer; so
>> once you receive an answer, please post it (or a link to
>> a public URL with it) here on R-help, thank you in
>> advance.
>> 
>> We would like to be able to *save*, or sometimes *set* /
>> *reset* such options "in a scripted manner", e.g. for
>> controlled exam sessions.
>> 
>> Martin Maechler, ETH Zurich
>> 
>> > On Thu, 9 Feb 2017 at 12:35 
>> wrote:
>> 
>> >> Hi All, >> I would like to make a backup of my RStudio
>> IDE options I configure using >> "Tools/Global Options"
>> from the menu bar. Searching the >> web did not reveal
>> anything.
>> 
>> >> Can you tell me where RStudio IDE does store its
>> configuration?
>> 
>> >> Kind regards >> Georg
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
>> 

> 
-

[R] Multiple-Response Analysis: Cleaning of Duplicate Codes

2017-04-25 Thread G . Maubach
Hi All,

in my current project I am working with multiple-response questions 
(MRSets):

-- Coding --
100 Main Code 1
110 Sub Code 1.1
120 Sub Code 1.2
130 Sub Code 1.3

200 Main Code 2
210 Sub Code 2.1
220 Sub Code 2.2
230 Sub Code 2.3

300 Main Code 3
310 Sub Code 3.1
320 Sub Code 3.2

The coding for the variables is to detailed. Therefore I have recoded all 
sub codes to the respective main code, e.g. all 110, 120 and 130 to 100, 
all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.

Now it happens that some respondents get several times the same main code. 
If the coding was done for respondent 1 with 120 and 130 after recoding 
the values are 100 and 100. If I count this, it would mean that I weight 
the multiple values of this respondent by factor 2. This is not my aim. I 
would like to count the 100 for the respective respondent only once.

Here is my script so far:

# -- cut --

library(expss)

d_sample <-
  structure(
list(
  c05_01 = c(
110,
110,
130,
110,
110,
110,
110,
110,
110,
110,
110,
999,
110,
495,
160,
110,
410
  ),
  c05_02 = c(NA,
 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 
170,
 NA, 130),
  c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
 NA, NA, NA, NA, NA, NA, NA),
  c05_04 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  ),
  c05_05 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  )
),
.Names = c("c05_01",
   "c05_02", "c05_03", "c05_04", "c05_05"),
row.names = c(
  "1",
  "2",
  "3",
  "4",
  "5",
  "10",
  "11",
  "12",
  "13",
  "14",
  "15",
  "20",
  "21",
  "22",
  "23",
  "24",
  "25"
),
class = "data.frame"
  )

c05_xx_r01 <- d_sample %>%
  select(starts_with("c05_")) %>%
  recode(c(
110 %thru% 195 ~ 100,
210 %thru% 295 ~ 200,
310 %thru% 395 ~ 300,
410 %thru% 495 ~ 400,
510 %thru% 595 ~ 500,
810 %thru% 895 ~ 800,
910 %thru% 999 ~ 900))
names(c05_xx_r01) <- paste0("c05_0", 1:5, "_r01")
d_sample <- cbind(d_sample, c05_xx_r01)

# -- cut --

I would like to eliminate all duplicates codes, e. g. 100 and 100 for 
respondents in row 3, 6, 13, 14 and 15 to 100 only once:

# -- cut --
d_sample_1 <-
  structure(
list(
  c05_01 = c(
110,
110,
130,
110,
110,
110,
110,
110,
110,
110,
110,
999,
110,
495,
160,
110,
410
  ),
  c05_02 = c(NA,
 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 
170,
 NA, 130),
  c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
 NA, NA, NA, NA, NA, NA, NA),
  c05_04 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  ),
  c05_05 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  ),
  c05_01_r01 = c(
100,
100,
100,
100,
100,
100,
100,
100,
100,
100,
100,
900,
100,
400,
100,
100,
400
  ),
  c05_02_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, 100),
  c05_03_r01 = c(NA, NA,
 NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, NA, 
NA, NA),
  c05_04_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA),
  c05_05_r01 = c(NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
),
.Names = c(
  "c05_01",
  "c05_02",
  "c05_03",
  "c05_04",
  "c05_05",
  "c05_01_r01",
  "c05_02_r01",
  "c05_03_r01",
  "c05_04_r01

[R] Antwort: Re: Multiple-Response Analysis: Cleaning of Duplicate Codes (SOLVED)

2017-04-26 Thread G . Maubach
Hi Bert,

many thanks for your reply. I appreciate your help a lot.

I would like to do the operation (= finding the duplicates) row-wise.

During this night a solution showed up in my dreams :) Instead of using 
duplicates() to flag and filter the values I could use unique instead with 
the same result. I tested:

# -- cut --

apply(X = c05_xx_r01, MARGIN = 1, unique)

# -- cut --

This finds the unique values for each row. That is nice but lacks the 
requirement that I need a dataframe with a set of variables back that is 
as long as the total amount of unique values for the complete 
data.frame/matrix or the amount of variable of the original data.frame 
respectively.

The result of the above operation gives a list instead of a data.frame due 
to the fact that the amount of resulting values vary from 1 to 7. 
Therefore no data.frame but a list is returned.

I search the web for a solution and found:

http://stackoverflow.com/questions/15753091/convert-mixed-length-named-list-to-data-frame

The complete solution would then look like:

# -- cut --

library(stringi)
library(tidyverse)
my_list <- apply(c05_xx_r01, MARGIN = 1, unique)
my_tibble <- as_tibble(stringi::stri_list2matrix(my_list, byrow = TRUE)
# DONE !

# -- cut --

All-in-all thanks again for your help.

Kind regards

Georg

P.S: I had a look into ?unique. The statement "unique(c05_xx_r01, MARGIN = 
1) does not do the job, cause this looks for unique combinations of values 
on all columns. But that is not the desired outcome.




Von:Bert Gunter 
An: g.maub...@weinwolf.de, 
Kopie:  R-help 
Datum:  25.04.2017 19:10
Betreff:Re: [R] Multiple-Response Analysis: Cleaning of Duplicate 
Codes



If I understand you correctly, one way is:

> z <- rep(LETTERS[1:3],4)
> z
 [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
> z[!duplicated(z)]
[1] "A" "B" "C"


?duplicated

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 25, 2017 at 9:36 AM,   wrote:
> Hi All,
>
> in my current project I am working with multiple-response questions
> (MRSets):
>
> -- Coding --
> 100 Main Code 1
> 110 Sub Code 1.1
> 120 Sub Code 1.2
> 130 Sub Code 1.3
>
> 200 Main Code 2
> 210 Sub Code 2.1
> 220 Sub Code 2.2
> 230 Sub Code 2.3
>
> 300 Main Code 3
> 310 Sub Code 3.1
> 320 Sub Code 3.2
>
> The coding for the variables is to detailed. Therefore I have recoded 
all
> sub codes to the respective main code, e.g. all 110, 120 and 130 to 100,
> all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.
>
> Now it happens that some respondents get several times the same main 
code.
> If the coding was done for respondent 1 with 120 and 130 after recoding
> the values are 100 and 100. If I count this, it would mean that I weight
> the multiple values of this respondent by factor 2. This is not my aim. 
I
> would like to count the 100 for the respective respondent only once.
>
> Here is my script so far:
>
> # -- cut --
>
> library(expss)
>
> d_sample <-
>   structure(
> list(
>   c05_01 = c(
> 110,
> 110,
> 130,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 999,
> 110,
> 495,
> 160,
> 110,
> 410
>   ),
>   c05_02 = c(NA,
>  NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
> 170,
>  NA, 130),
>   c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>  NA, NA, NA, NA, NA, NA, NA),
>   c05_04 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   ),
>   c05_05 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   )
> ),
> .Names = c("c05_01",
>"c05_02", "c05_03", "c05_04", "c05_05"),
> row.names = c(
>   "1",
>   "2",
>   "3",
>   "4",
>   "5",
>   "10",
>   "11",
>   "12",
>   "13",
>   "14",
>   "15",
>   "20",
>   "21",
>   "22",
>   "23",
>   "24",
>   "25"
> ),
> class = "data.frame"
>   )
>
> c05_xx_r01 <- d_sample %>%
>   select(starts_with("c05_")) %>%
>   recode(c(
> 110 %thru% 195 ~ 100,
> 210 %thru% 295 ~ 200,
> 310 %thru% 395 ~ 300,
> 410 %thru% 495 ~ 400,
> 510 %thru% 595 

[R] Antwort: Re: Multiple-Response Analysis: Cleaning of Duplicate Codes (SOLVED)

2017-04-26 Thread G . Maubach
Hi Bert,

many thanks for your reply. I appreciate your help a lot.

I would like to do the operation (= finding the duplicates) row-wise.

During this night a solution showed up in my dreams :) Instead of using 
duplicates() to flag and filter the values I could use unique instead with 
the same result. I tested:

# -- cut --

apply(X = c05_xx_r01, MARGIN = 1, unique)

# -- cut --

This finds the unique values for each row. That is nice but lacks the 
requirement that I need a dataframe with a set of variables back that is 
as long as the total amount of unique values for the complete 
data.frame/matrix or the amount of variable of the original data.frame 
respectively.

The result of the above operation gives a list instead of a data.frame due 
to the fact that the amount of resulting values vary from 1 to 7. 
Therefore no data.frame but a list is returned.

I search the web for a solution and found:

http://stackoverflow.com/questions/15753091/convert-mixed-length-named-list-to-data-frame

The complete solution would then look like:

# -- cut --

library(stringi)
library(tidyverse)
my_list <- apply(c05_xx_r01, MARGIN = 1, unique)
my_tibble <- as_tibble(stringi::stri_list2matrix(my_list, byrow = TRUE)
# DONE !

# -- cut --

All-in-all thanks again for your help.

Kind regards

Georg

P.S: I had a look into ?unique. The statement "unique(c05_xx_r01, MARGIN = 
1) does not do the job, cause this looks for unique combinations of values 
on all columns. But that is not the desired outcome.




Von:Bert Gunter 
An: g.maub...@weinwolf.de, 
Kopie:  R-help 
Datum:  25.04.2017 19:10
Betreff:Re: [R] Multiple-Response Analysis: Cleaning of Duplicate 
Codes



If I understand you correctly, one way is:

> z <- rep(LETTERS[1:3],4)
> z
 [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
> z[!duplicated(z)]
[1] "A" "B" "C"


?duplicated

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 25, 2017 at 9:36 AM,   wrote:
> Hi All,
>
> in my current project I am working with multiple-response questions
> (MRSets):
>
> -- Coding --
> 100 Main Code 1
> 110 Sub Code 1.1
> 120 Sub Code 1.2
> 130 Sub Code 1.3
>
> 200 Main Code 2
> 210 Sub Code 2.1
> 220 Sub Code 2.2
> 230 Sub Code 2.3
>
> 300 Main Code 3
> 310 Sub Code 3.1
> 320 Sub Code 3.2
>
> The coding for the variables is to detailed. Therefore I have recoded 
all
> sub codes to the respective main code, e.g. all 110, 120 and 130 to 100,
> all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.
>
> Now it happens that some respondents get several times the same main 
code.
> If the coding was done for respondent 1 with 120 and 130 after recoding
> the values are 100 and 100. If I count this, it would mean that I weight
> the multiple values of this respondent by factor 2. This is not my aim. 
I
> would like to count the 100 for the respective respondent only once.
>
> Here is my script so far:
>
> # -- cut --
>
> library(expss)
>
> d_sample <-
>   structure(
> list(
>   c05_01 = c(
> 110,
> 110,
> 130,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 999,
> 110,
> 495,
> 160,
> 110,
> 410
>   ),
>   c05_02 = c(NA,
>  NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
> 170,
>  NA, 130),
>   c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>  NA, NA, NA, NA, NA, NA, NA),
>   c05_04 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   ),
>   c05_05 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   )
> ),
> .Names = c("c05_01",
>"c05_02", "c05_03", "c05_04", "c05_05"),
> row.names = c(
>   "1",
>   "2",
>   "3",
>   "4",
>   "5",
>   "10",
>   "11",
>   "12",
>   "13",
>   "14",
>   "15",
>   "20",
>   "21",
>   "22",
>   "23",
>   "24",
>   "25"
> ),
> class = "data.frame"
>   )
>
> c05_xx_r01 <- d_sample %>%
>   select(starts_with("c05_")) %>%
>   recode(c(
> 110 %thru% 195 ~ 100,
> 210 %thru% 295 ~ 200,
> 310 %thru% 395 ~ 300,
> 410 %thru% 495 ~ 400,
> 510 %thru% 595 

[R] Factors and Alternatives

2017-05-09 Thread G . Maubach
Hi All,

I am using factors in a study for the social sciences.

I discovered the following:

-- cut --

library(dplyr)

test1 <- c(rep(1, 4), rep(0, 6))
d_test1 <- data.frame(test)

test2 <- factor(test1)
d_test2 <- data.frame(test2)

test3 <- factor(test1, 
levels = c(0, 1),
labels = c("WITHOUT Contact", "WITH Contact"))
d_test3 <- data.frame(test3)

d_test1 %>% filter(test1 == 0)  # works OK
d_test2 %>% filter(test2 == 0)  # works OK
d_test3 %>% filter(test3 == 0)  # does not work, why?

myf <- function(ds) {
  print(levels(ds$test3))
  print(labels(ds$test3))
  print(as.numeric(ds$test3))
  print(as.character(ds$test3))
}

# This showsthat it is not possible to access the original
# values which were the basis to build the factor:
myf(d_test3)

-- cut --

Why is it not possible to use a factor with labels for filtering with the 
original values?
Is there a data structure that works like a factor but gives also access 
to the original values?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Factors and Alternatives

2017-05-09 Thread G . Maubach
Hi Bob,

many thanks for your reply.

I have read the documentation. In my current project I use "item 
batteries" for dimensions of touchpoints which are rated by our customers. 
I wrote functions to analyse them. If I create a factor before filtering 
and analysing I lose the original values of the variable. If I use the 
original variable for filtering and analysis I might happen that for some 
dimensions values were not selected. This means they are not NA but none 
of the respondents chose "4" for instance on a scale from 1 to 6. That 
means that creating a factor from the analysed data with the complete 
scale (1:6) fails due the different vector length (amount of remaining 
unique values in the analysis vs values in the scale). As I have a 
function doing the analysis I am looking for a way to make my function 
robust to such circumstances and be able to use it to analyse all "item 
batteries". Thus my question. I believe my findings are not odd. Maybe 
there is a way dealing with that kind of problems in R and I am eager to 
learn how it can be solved using R.

What would you suggest?

Kind regards

Georg




Von:"Bob O'Hara" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  09.05.2017 12:26
Betreff:Re: [R] Factors and Alternatives



That's easy! First
> str(test3)
 Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1

tells you that the internal values are 1 and 2, and the labels are
"WITHOUT Contact" and "WITH Contact". If you read the help page for
factor() you'll see this:

levels: an optional vector of the values (as character strings) that
  ‘x’ might have taken.  The default is the unique set of
  values taken by ‘as.character(x)’, sorted into increasing
  order _of ‘x’_.  Note that this set can be specified as
  smaller than ‘sort(unique(x))’.

  labels: _either_ an optional character vector of (unique) labels for
  the levels (in the same order as ‘levels’ after removing
  those in ‘exclude’), _or_ a character string of length 1.

So, when you create test3 you say that test can take values 0 and 1,
and these should be labelled as "WITHOUT Contact" and "WITH Contact".
So R internally codes "1" as 1 and "0" as 2 (internally R codes
factors as integers, which can be both useful and dangerous), and then
gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't
care that they were 1 and 0, because you've told it to change the
labels.

If you want to filter by the original values, then don't change the
labels (or at least not until after you've filtered by the original
labels), or convert the filter to the new labels. You're asking for a
data structure with two sets of labels, which sounds odd in general.

Bob

On 9 May 2017 at 12:12,   wrote:
> Hi All,
>
> I am using factors in a study for the social sciences.
>
> I discovered the following:
>
> -- cut --
>
> library(dplyr)
>
> test1 <- c(rep(1, 4), rep(0, 6))
> d_test1 <- data.frame(test)
>
> test2 <- factor(test1)
> d_test2 <- data.frame(test2)
>
> test3 <- factor(test1,
> levels = c(0, 1),
> labels = c("WITHOUT Contact", "WITH Contact"))
> d_test3 <- data.frame(test3)
>
> d_test1 %>% filter(test1 == 0)  # works OK
> d_test2 %>% filter(test2 == 0)  # works OK
> d_test3 %>% filter(test3 == 0)  # does not work, why?
>
> myf <- function(ds) {
>   print(levels(ds$test3))
>   print(labels(ds$test3))
>   print(as.numeric(ds$test3))
>   print(as.character(ds$test3))
> }
>
> # This showsthat it is not possible to access the original
> # values which were the basis to build the factor:
> myf(d_test3)
>
> -- cut --
>
> Why is it not possible to use a factor with labels for filtering with 
the
> original values?
> Is there a data structure that works like a factor but gives also access
> to the original values?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: Antwort: Re: Factors and Alternatives (SOLVED)

2017-05-09 Thread G . Maubach
Hi David,
Hi Bob,

many thanks for your help.

Your solution - just to use all levels instead of just the one's found in 
the data - helped.

The original code looked like this:

-- cut --

c_v10_val_labs <- c(
  "1 = sehr gut",
  "2", "3", "4", "5",
  "6 = sehr schlecht"
)

# where c_v10_val_labs is handed over to my function as "val_labs".

  ds_results$value <- factor(ds_results$value,
 levels = sort(unique(ds_results$value)),  # 
old code
 labels = sort(unique(val_labs)))

-- cut --

If I write instead

-- cut --

  ds_results$value <- factor(ds_results$value,
 levels = seq_along(val_labs),  # new code 1st 
version
 labels = sort(unique(val_labs)))

-- cut --

Your solution builds a factor with all factor levels even if a value for 
factor is not present (not NA, but does just not occur in the data, i.e. 
not stated by any respondent).

In Zumel's book "Practical Data Science with R" (
https://www.amazon.de/Practical-Data-Science-Nina-Zumel/dp/1617291560), 
Shelter Island: Manning, 2014, p. 23-24, Listing 2-5, a mapping using 
subscripts is described:

-- cut --

mapping <- list(
'A40'='car (new)',
'A41'='car (used)',
'A42'='furniture/equipment',
'A43'='radio/television',
'A44'='domestic appliances',
...
)

for(i in 1:(dim(d))[2]) {
if(class(d[,i])=='character') {
d[,i] <- as.factor(as.character(mapping[d[,i]]))
}
}

-- cut -

Simple stated this would mean:

-- cut --

val_labs <- list(
  "1" = "1 = sehr gut",
  "2" = "2",
  "3" = "3",
  "4" = "4",
  "5" = "5",
  "6" = "6 = sehr schlecht"
)

set.seed(12345)
answers = c(sample(1:5, 10, replace = TRUE))

test <- factor(unlist(val_labs[answers]))

# or just

val_labs <- c(
  "1 = sehr gut",
  "2",
  "3",
  "4",
  "5",
  "6 = sehr schlecht"
)

set.seed(12345)
answers = c(sample(1:5, 10, replace = TRUE))

test <- val_labs[answers]

-- cut --

Adapting this to my code would give:

-- cut --

  ds_results$value <- factor(ds_results$value,
 levels = sort(unique(ds_results$value)),
 labels = 
val_labs[sort(unique(ds_results$value))])  # new code 2nd version

-- cut --

This results in a factor just as long as the vector of unique resulting 
values.

Both solutions work. Which version is best depends on the overall process 
and the purpose of the code. I document all this for use by readers who 
refer later to the list archives.

Using your version and running my code reveals that ggplot runs into 
difficulties cause the legend lacks values and the sequence and coloring 
of the legend is wrong. But that's another story.

Many thanks again for your help.

Kind regards

Georg




Von:David L Carlson 
An: "g.maub...@weinwolf.de" , "Bob O'Hara" 
, 
Kopie:  r-help 
Datum:  09.05.2017 14:37
Betreff:RE: [R] Antwort: Re:  Factors and Alternatives



I'm not sure I understand your question, but you can easily include all 
possible answers when you create the factor by using the levels= argument 
as Bob pointed out. Here is an example of values that range from 1 to 6, 
but value 3 is not represented. Notice that a factor level 3 is created 
even though it does not appear in the data:

> set.seed(42)
> x <- sample.int(6, 10, replace=TRUE)
> table(x)
x
1 2 4 5 6 
1 1 3 3 2 
> y <- factor(x, levels=1:6)
> y
 [1] 6 6 2 5 4 4 5 1 4 5
Levels: 1 2 3 4 5 6

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

Von:"Bob O'Hara" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  09.05.2017 13:58
Betreff:Re: Re: [R] Factors and Alternatives



For the problem you state, would it be enough to explicitly define your 
levels?

fac <- rep(c("a", "b", "d"), each=4)
fac.f <- factor(fac, levels=c("a", "b", "c", "d"))
table(fac.f)

# but be warned...
fac.f2 <- factor(fac.f)
table(fac.f2)

This has the advantage that the code explicitly documents what the
possible values are, so if something goes wrong down-stream, you know
it is a real problem (well, unless you have some type conversions
screwing things up). You might also want to do some defensive
programming, and put some checks in the code, to make sure your
factors have the right number of levels.

Bob

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of 
g.maub...@weinwolf.de
Sent: Tuesday, May 9, 2017 6:37 AM
To: Bob O'Hara 
Cc: r-help 
Subject: [R] Antwort: Re: Factors and Alternatives

Hi Bob,

many thanks for your reply.

I have read the documentation. In my current project I use "item 
batteries" for dimensions of touchpoints which are rated by our customers. 

I wrote functions to analyse them. If I create a factor before filtering 
and analysing I lose the original values of the variable. If I use the 
original variable for filtering and analysis I might happen that for some 
dimensions values were not selected. T

[R] Off-Topic: Project Organisation

2017-05-11 Thread G . Maubach
Hi All,

this post is somewhat off-topic cause it deals with a meta issue related to 
project organisation instead of real R code.

I have updated my blog concerning a possible directory and file structure for 
marketing research projects and data mining projects alike:

https://github.com/gmaubach/R-Know-How/wiki/R-Blog

There I condensed best practices already communicated in articels, books, 
packages and guidelines into a new universial structure. It shall serve as a 
template and construction kit which you can use to create a structure that 
suits your project best.

Comments and suggestions are welcome.

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot: Pie Chart with correct labels

2017-05-30 Thread G . Maubach
Hi All,

I would like to do the following pie chart using ggplot from an official 
data source (
http://www.deutscheweine.de/fileadmin/user_upload/Website/Service/Downloads/Statistik_2016-2017-neu.pdf
, Tab 8, Page 14):

-- cut --

cat("# weinimport_piechart.R\n")


# -- Input 

d_wine_import_DE <- structure(list(Land = structure(1:24, .Label = 
c("Italien", "Frankreich", 
 "Spanien", "USA", 
"Südafrika", "Chile", "Österreich", "Australien", 
 "Portugal", 
"Griechenland", "Argentinien", "Neuseeland", "Ungarn", 
 "Mazedonien", "Schweiz", 
"Dänemark", "Moldawien", "Türkei", "Belgien/Luxemburg", 
 "Rumänien", "Ukraine", 
"Kroatien", "Israel", "Georgien"), class = "factor"), 
   Menge_hl_2015 = c(5481000, 2248000, 3824000, 493000, 
845000, 
 539000, 308000, 446000, 153000, 99000, 
64000, 43000, 123000, 
 186000, 5000, 9000, 28000, 7000, 1, 
15000, 4000, 4000, 
 2000, 2000)), .Names = c("Land", 
"Menge_hl_2015"), class = "data.frame", row.names = c(NA, 
  -24L))
names(d_wine_import_DE)

# -- Data -

d_result <- data.frame(
  country = d_wine_import_DE$Land,
  abs = d_wine_import_DE$Menge_hl_2015) %>%
  mutate(rel = round(abs / sum(abs) * 100, 1)) %>%
  dplyr::arrange(desc(abs)) %>%
  dplyr::mutate(rel_labs = paste(rel, "%")) %>%  # rev() does not work
  dplyr::mutate(breaks = cumsum(abs) - (abs / 2))  # rev() does not work

# -- Plot -

d_result %>%
   ggplot() +
   geom_bar(
 aes(x = "", y = abs, fill = country),
 stat = "identity") +
   # %SOURCE%
   # coord_polar(): Wickham: ggplot2, Springer, 2nd Ed., p. 166
   coord_polar(theta = "y", start = 0) +
   guides(
 fill = guide_legend(
   title = "Länder",
   reverse = FALSE)
   ) +
   scale_y_continuous(
 breaks = d_result$breaks,  # simply "breaks" does not work
 labels = d_result$rel_labs,  # simply "breaks" does not work
 trans = "reverse"
   ) +
   # %SOURCE%
   # Kassambra: Guide to Create Beautiful Graphics
   # in R, sthda.com, 2nd Ed., 2013, p. 136ff
   theme_minimal() +
   theme(
 panel.border = element_blank(),
 panel.grid = element_blank(),
 axis.title.x = element_blank(),
 axis.title.y = element_blank()
 # axis.text.x = element_text(size = 15)
   ) +
   labs(
 title = paste0("Weinimport nach Deutschland 2015"))

-- cut --

I can not figure out how to align the labels (values in %) with the 
reverse printed countries. Also the breaks and labels do need the dataset 
name although I thought "breaks" and "rel_labs" is sufficient due to the 
piping operator.

Can you help me by telling how to

1. get the order of the labels right
2. Why I need to reference "breaks" and "labels" completely?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] purrr::pmap does not work

2017-06-07 Thread G . Maubach
Hi All,

I try to do a scatterplot for a bunch of variables. I plot a dependent 
variable against a bunch of independent variables:

-- cut --
graphics::plot(
  v01_r01 ~ v08_01_up11,
  data = dataset,
  xlab = "Dependent",
  ylab = "Independent #1"
)

-- cut --

It is tedious to repeat the statement for all independent variables. Found 
an alternative, i.e. :

-- cut --

mu <- list(5, 10, -3)
sigma <- list(1, 5, 10)
n <- list(1, 3, 5)
fargs <- list(mean = mu, sd = sigma, n = n)
fargs %>%
  purrr::pmap(rnorm) %>%
  str()

-- cut --

I tried to use this for may scatterplot task:

-- cut --

var_battery$v08 <- paste0("v08_", formatC(1:8, width = 2, format = "d", 
flag = "0"))
v08_var_labs <- paste0("Label_", 1:8)

dataset <- as.data.frame(
  matrix(
data = sample(
  x = 1:11,
  size = 90,
  replace = TRUE),
nrow = 10,
ncol = 9))
names(dataset) <- c("v01_r01", var_battery$v08)

independent <- as.list(dataset$v01_r01)
dependent <- as.list(dataset[var_battery$v08])

fargs <- list(
  x = independent,
  y = dependent,
  ylab = v08_var_labs)

fargs %>% 
  purrr::pmap(
function(d = dataset, xvalue = x, yvalue = y,
 xlab = "Label for x variable",
 ylab = ylab) {
  graphics::plot(
xvalue ~ yvalue,
data = d,
xlab = xlab,
ylab = ylab)
}
  )

-- cut --

The last statement comes back with

Error: Element 2 has length 8, not 1 or 10.

How can I get it up n running? Do you suggest a better solution for the 
task described?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Paths in knitr

2017-06-08 Thread G . Maubach
Hi All,

I have to compile a report for the management and decided to use RMarkdown 
and knitr. I compiled all needed plots (using separate R scripts) before 
compiling the report, thus all plots reside in my graphics directory. The 
RMarkdown report needs to access these files. I have defined

```{r setup, include = FALSE}
knitr::opts_knit$set(
  echo = FALSE,
  xtable.type = "html",
  base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
  root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
  fig.path = "results/graphics")  # relative path required, see 
http://yihui.name/knitr/options
```

and then referenced my plot using



because I want to be able to customize the plotting attributes.

But that fails with the message "pandoc.exe: Could not fetch 
email_distribution_pie.png".

If I give it the absolute path 
"H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics/email_distribution_pie.png"
 
it works fine as well if I copy the plot into the directory where the 
report.RMD file resides. 

How can I tell knitr to fetch the ready-made plots from the graphics 
directory?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Paths in knitr

2017-06-12 Thread G . Maubach
Hi Yihui,
Hi Duncan,

I corrected my typo. Unfortunately knitr did not find my plots in the 
directory where they reside which is different from the Rmd document.

The documentation of knitr says:

base.dir: (NULL) an absolute directory under which the plots are generate
root.dir: (NULL) the root directory when evaluating code chunks; if NULL, 
the directory of the input document will be used

>From that description I thought, if the base.dir can be used for writng 
plots, it is then also used for reading plots if set? No, it is not.
If I set the root directory to the plots/graphics directory will knitr 
then find my plots? No, it does not.

Reading blog posts my thoughts looked not so strange to me, e.g. 
https://philmikejones.wordpress.com/2015/05/20/set-root-directory-knitr/. 
Unfortunately, it does not work for me.

I am using a RStudio project file. Could it be that this interferes which 
the knitr options?

I tried the solution that Duncan suggested:

c_path_plots <- 
"H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics

`r knitr::include_graphics(file.path(c_path_plots, 
"email_distribution_pie.png"))`

This solution works fine. I will go with it for this project as I have to 
finish my report soon.

I read Hadley's book on bulding R Packages (
https://www.amazon.de/R-Packages-Hadley-Wickham/dp/1491910593) and found 
it quite complicated and time consuming to build one. Thus I did not try 
yet to build my own packages. At the end of last week I heard from another 
library (http://reaktanz.de/R/pckg/roxyPackage/) which shall make building 
packages much easier. I plan to try that shortly.

On my path to become better in analytics using R, I will try to use 
modules of Rmd files which can then easily be integrated into a Rmd 
report. I have yet to see how I can include these file into a complete 
report.

Kind regards

Georg


- Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 12.06.2017 08:47 
-

Von:Yihui Xie 
An: g.maub...@gmx.de, 
Kopie:  R Help 
Datum:  09.06.2017 20:53
Betreff:Re: [R] Paths in knitr
Gesendet von:   "R-help" 



I'd say it is an expert-only option. If you do not understand what it
means, I strongly recommend you not to set it.

Similarly, you set the root_dir option and I don't know why you did it, 
but
it is a typo anyway (should be root.dir).

Regards,
Yihui
--
https://yihui.name

On Fri, Jun 9, 2017 at 4:50 AM,  wrote:

> Hi Yi,
>
> many thanks for your reply.
>
> Why I do have to se the base.dir option? Cause, to me it is not clear 
from
> the documentation, where knitr looks for data files and how I can adjust
> knitr to tell it where to look. base.dir was a try, but did not work.
>
> Can you give me a hint where I can find information/documentation on 
this
> path issue?
>
> Kind regards
>
> Georg
>
>
> > Gesendet: Donnerstag, 08. Juni 2017 um 15:05 Uhr
> > Von: "Yihui Xie" 
> > An: g.maub...@weinwolf.de
> > Cc: "R Help" 
> > Betreff: Re: [R] Paths in knitr
> >
> > Why do you have to set the base.dir option?
> >
> > Regards,
> > Yihui
> > --
> > https://yihui.name
> >
> >
> > On Thu, Jun 8, 2017 at 6:15 AM,   wrote:
> > > Hi All,
> > >
> > > I have to compile a report for the management and decided to use
> RMarkdown
> > > and knitr. I compiled all needed plots (using separate R scripts)
> before
> > > compiling the report, thus all plots reside in my graphics 
directory.
> The
> > > RMarkdown report needs to access these files. I have defined
> > >
> > > ```{r setup, include = FALSE}
> > > knitr::opts_knit$set(
> > >   echo = FALSE,
> > >   xtable.type = "html",
> > >   base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
> > >   root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
> > >   fig.path = "results/graphics")  # relative path required, see
> > > http://yihui.name/knitr/options
> > > ```
> > >
> > > and then referenced my plot using
> > >
> > > 
> > >
> > > because I want to be able to customize the plotting attributes.
> > >
> > > But that fails with the message "pandoc.exe: Could not fetch
> > > email_distribution_pie.png".
> > >
> > > If I give it the absolute path
> > > "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/
> graphics/email_distribution_pie.png"
> > > it works fine as well if I copy the plot into the directory where 
the
> > > report.RMD file resides.
> > >
> > > How can I tell knitr to fetch the ready-made plots from the graphics
> > > directory?
> > >
> > > Kind regards
> > >
> > > Georg
>

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/li

[R] Antwort: Re: Re: Paths in knitr

2017-06-12 Thread G . Maubach
Hi Yihui,

I took root.dir and base.dir out. Everything works fine despite the 
change.

I have implemented the solution Duncun suggested. I have difficulties with 
the scaling / image size in my report. Some plots are too big, some are 
too small. I need to adjust any plot. Steep learning curve :)

Kind regards

Georg




Von:Yihui Xie 
An: g.maub...@weinwolf.de, 
Kopie:  Duncan Murdoch , R Help 

Datum:  12.06.2017 18:29
Betreff:Re: Re: [R] Paths in knitr
Gesendet von:   xieyi...@gmail.com



Will there be anything wrong if you do not set these options?

Regards,
Yihui
--
https://yihui.name


On Mon, Jun 12, 2017 at 2:24 AM,   wrote:
> Hi Yihui,
> Hi Duncan,
>
> I corrected my typo. Unfortunately knitr did not find my plots in the
> directory where they reside which is different from the Rmd document.
>
> The documentation of knitr says:
>
> base.dir: (NULL) an absolute directory under which the plots are 
generate
> root.dir: (NULL) the root directory when evaluating code chunks; if 
NULL,
> the directory of the input document will be used
>
> From that description I thought, if the base.dir can be used for writng
> plots, it is then also used for reading plots if set? No, it is not.
> If I set the root directory to the plots/graphics directory will knitr
> then find my plots? No, it does not.
>
> Reading blog posts my thoughts looked not so strange to me, e.g.
> https://philmikejones.wordpress.com/2015/05/20/set-root-directory-knitr/
.
> Unfortunately, it does not work for me.
>
> I am using a RStudio project file. Could it be that this interferes 
which
> the knitr options?
>
> I tried the solution that Duncan suggested:
>
> c_path_plots <-
> "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics
>
> `r knitr::include_graphics(file.path(c_path_plots,
> "email_distribution_pie.png"))`
>
> This solution works fine. I will go with it for this project as I have 
to
> finish my report soon.
>
> I read Hadley's book on bulding R Packages (
> https://www.amazon.de/R-Packages-Hadley-Wickham/dp/1491910593) and found
> it quite complicated and time consuming to build one. Thus I did not try
> yet to build my own packages. At the end of last week I heard from 
another
> library (http://reaktanz.de/R/pckg/roxyPackage/) which shall make 
building
> packages much easier. I plan to try that shortly.
>
> On my path to become better in analytics using R, I will try to use
> modules of Rmd files which can then easily be integrated into a Rmd
> report. I have yet to see how I can include these file into a complete
> report.
>
> Kind regards
>
> Georg
>
>
> - Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 12.06.2017 08:47
> -
>
> Von:Yihui Xie 
> An: g.maub...@gmx.de,
> Kopie:  R Help 
> Datum:  09.06.2017 20:53
> Betreff:Re: [R] Paths in knitr
> Gesendet von:   "R-help" 
>
>
>
> I'd say it is an expert-only option. If you do not understand what it
> means, I strongly recommend you not to set it.
>
> Similarly, you set the root_dir option and I don't know why you did it,
> but
> it is a typo anyway (should be root.dir).
>
> Regards,
> Yihui
> --
> https://yihui.name
>
> On Fri, Jun 9, 2017 at 4:50 AM,  wrote:
>
>> Hi Yi,
>>
>> many thanks for your reply.
>>
>> Why I do have to se the base.dir option? Cause, to me it is not clear
> from
>> the documentation, where knitr looks for data files and how I can 
adjust
>> knitr to tell it where to look. base.dir was a try, but did not work.
>>
>> Can you give me a hint where I can find information/documentation on
> this
>> path issue?
>>
>> Kind regards
>>
>> Georg
>>
>>
>> > Gesendet: Donnerstag, 08. Juni 2017 um 15:05 Uhr
>> > Von: "Yihui Xie" 
>> > An: g.maub...@weinwolf.de
>> > Cc: "R Help" 
>> > Betreff: Re: [R] Paths in knitr
>> >
>> > Why do you have to set the base.dir option?
>> >
>> > Regards,
>> > Yihui
>> > --
>> > https://yihui.name
>> >
>> >
>> > On Thu, Jun 8, 2017 at 6:15 AM,   wrote:
>> > > Hi All,
>> > >
>> > > I have to compile a report for the management and decided to use
>> RMarkdown
>> > > and knitr. I compiled all needed plots (using separate R scripts)
>> before
>> > > compiling the report, thus all plots reside in my graphics
> directory.
>> The
>> > > RMarkdown report needs to access these files. I have defined
>> > >
>> > > ```{r setup, include = FALSE}
>> > > knitr::opts_knit$set(
>> > >   echo = FALSE,
>> > >   xtable.type = "html",
>> > >   base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
>> > >   root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
>> > >   fig.path = "results/graphics")  # relative path required, see
>> > > http://yihui.name/knitr/options
>> > > ```
>> > >
>> > > and then referenced my plot using
>> > >
>> > > 
>> > >
>> > > because I want to be able to customize the plotting attributes.
>> > >
>> > > But that fails with the message "pandoc.exe: Could not fetch
>> > > email_distribution_pie.png".
>> > >
>> > > If I give it the absolute path
>> > > "H:/2017

[R] WG: Fw: Re: rmarkdown and font size

2017-06-12 Thread G . Maubach
Hi Dan,
Hi All,

I read the below post. I am wondering how do I know which "keys" are 
available, e.g. "core.r" and "pre". Where kind I find the definition of 
what can be adjusted and which "words" to use?

Kind regards

Georg


> Gesendet: Donnerstag, 08. Juni 2017 um 16:16 Uhr
> Von: "Nordlund, Dan (DSHS/RDA)" 
> An: "MacQueen, Don" , "r-help@r-project.org" 

> Betreff: Re: [R] rmarkdown and font size
>
> You can change the style, modifying a variety of things.  E.g,
> 
> ---
> title: Test
> ---
> 
> 
> 
> body{ /* Normal  */
>   font-size: 12px;
>   }
> td {  /* Table  */
>   font-size: 8px;
> }
> h1.title {
>   font-size: 38px;
>   color: DarkRed;
> }
> h1 { /* Header 1 */
>   font-size: 28px;
>   color: DarkBlue;
> }
> h2 { /* Header 2 */
> font-size: 22px;
>   color: DarkBlue;
> }
> h3 { /* Header 3 */
>   font-size: 18px;
>   font-family: "Times New Roman", Times, serif;
>   color: DarkBlue;
> }
> code.r{ /* Code block */
> font-size: 12px;
> }
> pre { /* Code block - determines code spacing between lines */
> font-size: 14px;
> }
> 
> 
> Here is some normal text.  It is a 12-point font.  The table is in 
8-point . 
> 
> ```{r example, echo=FALSE, results='asis'}
> tmp <- data.frame(a=1:5, b=letters[1:5])
> print( knitr::kable(tmp, row.names=FALSE))
> ```
> 
> 
> Hope this is helpful,
> 
> Dan
> 
> Daniel Nordlund, PhD
> Research and Data Analysis Division
> Services & Enterprise Support Administration
> Washington State Department of Social and Health Services
> 
> 
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> > MacQueen, Don
> > Sent: Wednesday, June 07, 2017 4:58 PM
> > To: r-help@r-project.org
> > Subject: [R] rmarkdown and font size
> > 
> > Suppose I have a file (named "tmp.rmd") containing:
> > 
> > 
> > ---
> > title: Test
> > ---
> > 
> > ```{r example, echo=FALSE, results='asis'}
> > tmp <- data.frame(a=1:5, b=letters[1:5])
> > print( knitr::kable(tmp, row.names=FALSE))
> > ```
> > 
> > 
> > 
> > And I render it with:
> > 
> > rmarkdown::render('tmp.rmd',
> > output_format=c('html_document','pdf_document'))
> > 
> > I get two files:
> >   tmp.pdf
> >   tmp.html
> > 
> > Is there a way to control (change or specify) the font size of the 
table in the
> > pdf output?
> > (or of the entire document, if it can't be changed for just the table)
> > 
> > With my actual data, the table is too wide to fit on a page in the pdf 
output;
> > perhaps if I reduce the font size I can get it to fit.
> > 
> > I would like the html version to still look decent, but I don't care 
very much
> > what happens to its font size.
> > 
> > Thanks!
> > -Don
> > 
> > --
> > Don MacQueen
> > 
> > Lawrence Livermore National Laboratory
> > 7000 East Ave., L-627
> > Livermore, CA 94550
> > 925-423-1062
> > 
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Filtering String Variables

2016-05-23 Thread G . Maubach
# Hi All,
# 
# I have the following data frame (example):

Debitor <- c("968691", "968691", "968691",
 "A04046", "A04046",
 "L0006", "L0006", "L0006",
 "L0023", "L0023",
 "L0056", "L0056",
 "L0094", "L0094", "L0094",
 "L0124", "L0124",
 "L0143", 
 "L0170",
 "13459",
 "473908",
 "394704",
 "4711",
 "4712",
 "4713")
Debitor <- as.character(Debitor)
var1 <- c(11, 12, 13,
  14, 14,
  12, 13, 14,
  10, 11,
  12, 12,
  12, 12, 12,
  15, 17,
  11,
  14,
  12,
  17,
  13,
  15,
  16,
  11)
ds_example <- data.frame(Debitor, var1)
ds_example$case_id <- 1:nrow(ds_example)
ds_example <- ds_example[, sort(colnames(ds_example))]
ds_example

# I would like to generate a data frame that contains the duplicates AND 
the
# corresponding non-duplicates to the duplicates.
# For example, finding the duplicates with deliver case 2 and 3 but the 
list
# should also contain case 1 because case 1 is the corresponding case to 
the
# duplicate cases 2 and 3.
# For the whole example dataset that would be:
needed <- c(1, 1, 1,
1, 1,
1, 1, 1,
1, 1,
1, 1,
1, 1, 1,
1, 1,
0, 0, 0, 0, 0, 0, 0, 0)
needed <- as.logical(needed)
ds_example <- data.frame(ds_example, needed)
ds_example

# To find the duplicates and the corresponding non-duplicates
duplicates <- duplicated(ds_example$Debitor)

list_of_duplicated_debitors <- as.character(ds_example[duplicates, 
"Debitor"])

filter_variable <- unique(list_of_duplicated_debitors)

ds_duplicates <- ds_example["Debitor" == filter_variable]  # Result: 
dataset with 0 columns

ds_duplicates <- ds_example["Debitor"] %in% filter_variable  # Result: 
FALSE

# How can I create a dataset like this

ds_example <- ds_example[needed, ]
ds_example

# using the Debitor IDs?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] WG: Filtering String Variables (SOLVED)

2016-05-23 Thread G . Maubach
Hi All,

the solution for my question is as follows

## Filter duplicates and correpsonding non-duplicates
### To filter duplicates and their corresponding non-duplicates use the
### following code snippet:
Debitor <- c("968691", "968691", "968691",
 "A04046", "A04046",
 "L0006", "L0006", "L0006",
 "L0023", "L0023",
 "L0056", "L0056",
 "L0094", "L0094", "L0094",
 "L0124", "L0124",
 "L0143", 
 "L0170",
 "13459",
 "473908",
 "394704",
 "4711",
 "4712",
 "4713")
Debitor <- as.character(Debitor)
var1 <- c(11, 12, 13,
  14, 14,
  12, 13, 14,
  10, 11,
  12, 12,
  12, 12, 12,
  15, 17,
  11,
  14,
  12,
  17,
  13,
  15,
  16,
  11)
ds_example <- data.frame(Debitor, var1)
ds_example$case_id <- 1:nrow(ds_example)
ds_example <- ds_example[, sort(colnames(ds_example))]
ds_example

# This task is to generate a data frame that contains the duplicates AND 
the
# corresponding non-duplicates to the duplicates.
# For example, finding the duplicates will deliver case 2 and 3 but the 
list
# should also contain case 1 because case 1 is the corresponding case to 
the
# duplicate cases 2 and 3.
# For the whole example dataset that would be:
needed <- c(1, 1, 1,
1, 1,
1, 1, 1,
1, 1,
1, 1,
1, 1, 1,
1, 1,
0, 0, 0, 0, 0, 0, 0, 0)
needed <- as.logical(needed)
ds_example <- data.frame(ds_example, needed)
ds_example

# To find the duplicates and the corresponding non-duplicates
duplicates <- duplicated(ds_example$Debitor)

list_of_duplicated_debitors <- as.character(ds_example[duplicates, 
"Debitor"])

filter_variable <- unique(list_of_duplicated_debitors)

### Wrong code. Do not run.
### ds_duplicates <- ds_example["Debitor" == filter_variable]  # Result: 
dataset with 0 columns
### duplicates_and_correponding_non_duplicates <- ds_example["Debitor"] 
%in% filter_variable  # Result: FALSE

duplicates_and_correponding_non_duplicates <- ds_example$Debitor %in% 
filter_variable  # Result: OK
duplicates_and_correponding_non_duplicates <- ds_example[, "Debitor"] %in% 
filter_variable  # Result: OK

### Create the dataset with duplicates and corresponding non-duplicates
ds_example <- ds_example[duplicates_and_correponding_non_duplicates, ]
ds_example

It was a simple mistake when subscripting.

Kind regards

Georg Maubach


- Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 23.05.2016 15:54 
-

Von:Georg Maubach/WWBO/WW/HAW
An: r-help@r-project.org, 
Datum:  23.05.2016 15:28
Betreff:Filtering String Variables


# Hi All,
# 
# I have the following data frame (example):

Debitor <- c("968691", "968691", "968691",
 "A04046", "A04046",
 "L0006", "L0006", "L0006",
 "L0023", "L0023",
 "L0056", "L0056",
 "L0094", "L0094", "L0094",
 "L0124", "L0124",
 "L0143", 
 "L0170",
 "13459",
 "473908",
 "394704",
 "4711",
 "4712",
 "4713")
Debitor <- as.character(Debitor)
var1 <- c(11, 12, 13,
  14, 14,
  12, 13, 14,
  10, 11,
  12, 12,
  12, 12, 12,
  15, 17,
  11,
  14,
  12,
  17,
  13,
  15,
  16,
  11)
ds_example <- data.frame(Debitor, var1)
ds_example$case_id <- 1:nrow(ds_example)
ds_example <- ds_example[, sort(colnames(ds_example))]
ds_example

# I would like to generate a data frame that contains the duplicates AND 
the
# corresponding non-duplicates to the duplicates.
# For example, finding the duplicates with deliver case 2 and 3 but the 
list
# should also contain case 1 because case 1 is the corresponding case to 
the
# duplicate cases 2 and 3.
# For the whole example dataset that would be:
needed <- c(1, 1, 1,
1, 1,
1, 1, 1,
1, 1,
1, 1,
1, 1, 1,
1, 1,
0, 0, 0, 0, 0, 0, 0, 0)
needed <- as.logical(needed)
ds_example <- data.frame(ds_example, needed)
ds_example

# To find the duplicates and the corresponding non-duplicates
duplicates <- duplicated(ds_example$Debitor)

list_of_duplicated_debitors <- as.character(ds_example[duplicates, 
"Debitor"])

filter_variable <- unique(list_of_duplicated_debitors)

ds_duplicates <- ds_example["Debitor" == filter_variable]  # Result: 
dataset with 0 columns

ds_duplicates <- ds_example["Debitor"] %in% filter_variable  # Result: 
FALSE

# How can I create a dataset like this

ds_example <- ds_example[needed, ]
ds_example

# using the Debitor IDs?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE

[R] Creating a data frame from scratch

2016-05-24 Thread G . Maubach
Hi All,

I need to create a data frame from scratch and fill variables created on the 
fly with values. What I have so far:

-- schnipp --

# Example dataset
gene <- 
c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0207604", 
  "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG00394039490",
  "ENSG09943004048")
hsap <- c(0,0,0, 0, 0, 0, 1,1, 1)
mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA)
mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1)
rnor <- c(NA,2 ,NA, 1 , NA, 3 , NA,NA, 2)
cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA)

ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam)
ds_example$gene <- as.character(ds_example$gene)

t_count_na <- function(dataset,
   variables = "all")
  # credit: 
http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame
  {
  ds_na <- data.frame()
  # if variables = "all" create character vector of variable names
  if (variables == "all") {
variable_list <- dimnames(dataset)[[ 2 ]] 
  }
  # if a character vector with variable names is given
  # to run the function on a defined set of selected variables
  else {
variable_list <- variables
  }
  
  for (var in variable_list) {
new_name <- paste0("na_", var)
ds_na[[ new_name ]] <- as.data.frame(is.na(dataset[[ var ]]))
  }
  
  ds_na[[ "na_count" ]] <- rowSums(ds_na)
  return(ds_na)
}

test <- t_count_na(dataset = ds_example, variables = c("mmul", "mmus"))

-- schnipp --

gives:

 Error in `[[<-.data.frame`(`*tmp*`, new_name, value = 
list(`is.na(dataset[[var]])` = c(TRUE,  : 
  replacement has 9 rows, data has 0 In addition: Warning message:
In if (variables == "all") { :
  the condition has length > 1 and only the first element will be used

My goal is to create a dataset from scratch on the fly which has the same 
amount of variables as the dataset ds_example plus a single variable storing 
the amount of NA's in a row for the given variables. This is the basis for a 
decious which cases to keep and which to drop.

I do not want to alter the base dataset like ds_example in the first place nor 
do I want to make a copy of the existing dataset due to memory allocation. The 
function shall also work with big data, e. g. datasets with more than 1 GB 
memory consumption.

I also do not want the newly created variables to be stored in the original 
data frame. They shall be separate.

A former similar solution worked:
http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html

Why doesn't this one?

How do I create the variables within the data frame if the data frame is empty?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Creating a data frame from scratch (SOLVED)

2016-05-25 Thread G . Maubach
Hi Dan,
Hi All,

many thanks for your help.

Please find enclosed my little function for your use:

-- cut --

#---
# Module: t_count_na.R
# Author: Georg Maubach
# Date  : 2016-05-24
# Update: 2016-05-25
# Description   : Count NA's
# Source System : R 3.2.2 (64 Bit)
# Target System : R 3.2.2 (64 Bit)
# License   : CC-BY-SA-NC
#1-2-3-4-5-6-7-8

test <- FALSE

t_count_na <- function(dataset,
   variables = "all") {
  # Counts the number of NA within given set of veriables
  #
  # Args:
  #   dataset  : Object with dimnames, e.g. data frame, data table.
  #   variables: Character vector with variable names.
  #
  # Operation:
  #   Adds the variable "na_count" to the given dataset containing the 
count of
  #   NA's within the given variables
  #
  # Returns:
  #   Original dataset with variable "na_count" added.
  #
  # Error handling:
  #   None.
  #
  # Credits: 
  #   
http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame
  #   
http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html
 
  version <- "2016-05-25"
 
  if (identical(variables, "all")) {
variable_list <- names(dataset)
  }  else {
variable_list <- variables
  } 
  dataset[["na_count"]] <- apply(dataset[,variable_list],
 1, 
 function(x) sum(is.na(x)))
 
  return(dataset)
 
}

#---

test <- function(do_test = FALSE) {
 
  cat("\n", "\n", "Test function t_count_na()", "\n", "\n")
 
  # Example dataset
gene <- 
c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0207604", 

 "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG00394039490",
  "ENSG09943004048")
hsap <- c(0,0,0, 0, 0, 0, 1,1, 1)
mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA)
mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1)
rnor <- c(NA,2 ,NA, 1 , NA, 3 , NA,NA, 2)
cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA)
ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam)
ds_example$gene <- as.character(ds_example$gene)
 
  cat("\n", "\n", "Example dataset before function call", "\n", "\n")
  print(ds_example)
 
  cat("\n", "\n", "Function call", "\n", "\n")
  ds_example <- t_count_na(dataset = ds_example,
   variables = c("mmul", "mmus"))
 
  cat("\n", "\n", "Example dataset after function call", "\n", "\n")
  print(ds_example)
}

test(do_test = test)

# EOF .

-- cut --

Kind regards

Georg Maubach




Von:"Nordlund, Dan (DSHS/RDA)" 
An:  "r-help@r-project.org" , 
Datum:  24.05.2016 21:41
Betreff:Re: [R] Creating a data frame from scratch
Gesendet von:   "R-help" 




I would probably write the function something like this:


t_count_na <- function(dataset,
   variables = "all") {
  if (identical(variables, "all")) {
variable_list <- names(dataset)
  }  else {
variable_list <- variables
  } 
  apply(dataset[,variable_list], 1, function(x) sum(is.na(x)))
}


Hope this is helpful,

Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@gmx.de
> Sent: Tuesday, May 24, 2016 11:55 AM
> To: r-help@r-project.org
> Subject: [R] Creating a data frame from scratch
> 
> Hi All,
> 
> I need to create a data frame from scratch and fill variables created on 
the fly
> with values. What I have so far:
> 
> -- schnipp --
> 
> # Example dataset
> gene <-
> c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0
> 207604",
> 
> "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG0039403
> 9490",
>   "ENSG09943004048")
> hsap <- c(0,0,0, 0, 0, 0, 1,1, 1)
> mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA)
> mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 ,
> NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA)
> 
> ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam)
> ds_example$gene <- as.character(ds_example$gene)
> 
> t_count_na <- function(dataset,
>variables = "all")
>   # credit: http://stackoverflow.com/questions/4862178/remove-rows-with-
> nas-in-data-frame
>   {
>   ds_na <- data.frame()
>   # if variables = "all" create character vector of variable names
>   if (variables == "all") {
> variable_list <- dimnames(dataset)[[ 2 ]]
>   }
>   # if a character vector with variable names is given
>   # to run the function on a defined set of selected variables
>   else {
> variable_list <- variables
>   }
> 
>   for (var in variable_list) {
> new_name <- paste0("na_", var)
> ds_na[[ new_name ]] <- as.data.frame(is.na(datas

[R] Difference subsetting (dataset$variable vs. dataset["variable"]

2016-05-30 Thread G . Maubach
Hi All,

I thought dataset$variable is the same as dataset["variable"]. I tried the 
following:

> str(ZWW_Kunden$Branche)
 chr [1:49673] "231" "151" "151" "231" "231" "111" "231" "111" "231" "231" 
"151" "111" ...
> str(ZWW_Kunden["Branche"])
'data.frame':49673 obs. of  1 variable:
 $ Branche: chr  "231" "151" "151" "231" ...

and get different results: "chr {1:49673]" vs. "data.frame". First one is 
a simple vector, second one is a data.frame.

This has consequences when subsetting a dataset and filter cases:

> ZWW_Kunden["Branche"] %in% c("315", "316", "317")
[1] FALSE

> head(ZWW_Kunden$Branche %in% c("315", "316", "317")) # head() only to 
shorten output
[1] FALSE FALSE FALSE FALSE FALSE FALSE

I have thought dataset$variable is the same as dataset["variable"] but 
actually it's not.

Can you explain what the difference is?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Variable labels and value labels

2016-05-31 Thread G . Maubach
Hi All,

I am using R for social sciences. In this field I am used to use short 
variable names like "q1" for question 1, "q2" for question 2 and so on and 
label the variables like q1 : "Please tell us your age" or q2 : "Could you 
state us your household income?" or something similar indicating which 
question is stored in the variable.

Similar I am used to label values like 1: "Less than 18 years", 2 : "18 to 
30 years", 3 : "31 to 60 years" and 4 : "61 years and more".

I know that the packages Hmisc and memisc have a functionality for this 
but these labeling functions are limited to the packages they were defined 
for. Using the question tests as variable names is possible but very 
inconvenient.

I there another way for labeling variables and values in R?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Utility Functions

2016-05-31 Thread G . Maubach
Hi All,

I was new to R and this list a couple of mounths ago. When processing my 
data I got tremendous support from R-Help mailing list.

The solutions I have worked out with your help might be also helpful for 
others. I have put the solutions in a couple of small functions with 
documentation and tests. You can find the software on Sourceforge.net at

https://sourceforge.net/projects/r-project-utilities/files/?source=navbar

You should download at least "r_toolbox.R" and store it in a directory 
like "r_toolbox" in your favourite project folder. Within "r_toolbox" 
folder put all the other files. You have to adjust the variable 
"t_toolbox_path" to your favourite project directory including the 
"r_toolbox" folder, e. g. "C:\My-Projects\t-toolbox\" on Windows or 
"/home/username/my-projects/r-toolbox" on Unix-like systems.

You can use them for your projects. Although I developed them with great 
care these functions come with absolutely no warrenty. You need to use 
them at your own risk. As the functions are small and overseeable you will 
find out quickly by reading the source code that the functions are save to 
use.

If you have any recommendations or improvement proposals please get back 
to me.

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Installing miniCRAN on Debian

2016-06-01 Thread G . Maubach
Hi All,

I am installng miniCRAN on Debian GNU Linux 8 Jessie (Linux analytics7 
4.5.0-0.bpo.2-amd64 #1 SMP Debian 4.5.4-1~bpo8+1 (2016-05-13) x86_64 GNU/Linux) 
and R 3.3.0 

-- cut --
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=de_DE.UTF-8   LC_NUMERIC=C   LC_TIME=de_DE.UTF-8  
 
 [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8   LC_NAME=C  LC_ADDRESS=C 
 
[10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C  
 

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.3.0
-- cut --

After running

sudo apt-get install libssl-dev libcurl4-openssl-dev libxml2-dev libhunspell-dev

and calling

install.packages(pkgs = "miniCRAN", repos = "http://cran.csiro.au";, 
dependencies = TRUE)

I get the message

- ANTICONF ERROR ---
Configuration failed because hunspell was not found. Try installing:
 * deb: libhunspell-dev (Debian, Ubuntu, etc)
 * rpm: hunspell-devel (Fedora, CentOS, RHEL)
 * brew: hunspell (Mac OSX)
If hunspell is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a hunspell.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'

Running

find / -name hunspell.pc

gives

/usr/lib/x86_64-linux-gnu/pkgconfig/hunspell.pc

and running

find / -name pkg-config

gives

/usr/share/bash-completion/completions/pkg-config

How do I need to configure R correctly to get miniCRAN running?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Unable to update R software to 3.3.0

2016-06-01 Thread G . Maubach
Hi all,

I did it today on Debian GNU Linux 8 Jessie this way:

vim /etc/apt/sources.list
deb http://cran.uni-muenster.de/bin/linux/debian jessie-cran3
ESC;:wq

apt.get update
apt-get install r-base r-base-dev

This worked for me.

When installing R packages from within R I found that R needed the 
following:

apt-get install libssl-dev libcurl4-openssl-dev libhunspell-dev 
libxml2-dev 

You probably might to wish to install this also.

HTH.

Kind regards

Georg




Von:Marc Schwartz 
An: Sunish Kumar Bilandi , 
Kopie:  R-help 
Datum:  01.06.2016 17:18
Betreff:Re: [R] Unable to update R software to 3.3.0
Gesendet von:   "R-help" 




> On Jun 1, 2016, at 1:33 AM, Sunish Kumar Bilandi 
 wrote:
> 
> Hi Team,
> 
> I am using RedHat 5 and installed R using YUM, (R version 3.2.3) Now I 
want to update R version tp 3.3.0, but I am unable to do that, Is there 
any alternate to do this?
> 
> Hope to hear from your side.
> 
> Regards,
> 
> 
> Sunish Bilandi
> Business Analyst, CIDA-01
> Evalueserve


Hi,

First, RHEL and related distributions (e.g. Fedora), have a dedicated 
R-SIG list:
 
  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

Future queries in this domain should be submitted there, as many of the RH 
package maintainers (e.g. Tom Callaway, aka Spot) read that list.

For R 3.3.0, it would appear that it is about a day away from being 
available for release:

  https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-6fc2c863b0

So for now, it would be available via the EPEL testing repos.

Otherwise, you can wait until it is available via release in the next day 
or so, or download the RPMS directly here:

  http://koji.fedoraproject.org/koji/buildinfo?buildID=762521

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: RE: Variable labels and value labels

2016-06-01 Thread G . Maubach
Hi Petr,

I am looking for a general procedure that I can use with any package of R.

As to my current experience it probably will happen that I need a 
procedure from another package than hmisc or memisc and the my solution 
shall work even than so that I do need to find another way to do it.

Kind regards

Georg



Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
"r-help@r-project.org" , 
Datum:  31.05.2016 14:56
Betreff:RE: [R] Variable labels and value labels



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Tuesday, May 31, 2016 2:01 PM
> To: r-help@r-project.org
> Subject: [R] Variable labels and value labels
>
> Hi All,
>
> I am using R for social sciences. In this field I am used to use short 
variable
> names like "q1" for question 1, "q2" for question 2 and so on and label 
the
> variables like q1 : "Please tell us your age" or q2 : "Could you state 
us your
> household income?" or something similar indicating which question is 
stored
> in the variable.
>
> Similar I am used to label values like 1: "Less than 18 years", 2 : "18 
to
> 30 years", 3 : "31 to 60 years" and 4 : "61 years and more".

Seems to me that it is work for factors

nnn <- sample(1:4, 20, replace=TRUE)
q1 <-factor(nnn, labels=c("Less than 18 years", "18 to 30 years", "31 to 
60 years","61 years and more"))

You can store such variables in data.frame with names "q1" to "qwhatever" 
and possibly "Subject"

And you can store annotation of questions in another data frame with 2 
columns e.g. "Question" and "Description"

Basically it is an approach similar to database and in R you can merge 
those two data.frames by ?merge.
>
> I know that the packages Hmisc and memisc have a functionality for this 
but
> these labeling functions are limited to the packages they were defined 
for.

It seems to me strange. What prevents you to use functions from Hmisc?

Regards
Petr

> Using the question tests as variable names is possible but very 
inconvenient.
>
> I there another way for labeling variables and values in R?
>
> Kind regards
>
> Georg Maubach
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě 
neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho 
kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi 
či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření 
smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve 
výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za 
společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně 
zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly 
adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, 
předloženy nebo jejich existence je adresátovi či osobě jím zastoupené 
známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its 
sender. Delete the contents of this e-mail with all attachments and its 
copies from your system.
If you are not the intended recipient of this e-mail, you are not 
authorized to use, disseminate, copy or disclose this e-mail in any 
manner.
The sender of this e-mail shall not be liable for any possible damage 
caused by modifications of the e-mail or by delay with transfer of the 
email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to 
immediately accept such offer; The sender of this e-mail (offer) excludes 
any acceptance of the offer on the part of the recipient containing any 
amendment or variation.
- the sender insists on that the respective contract is concluded only 
upon an express mutual agreement on 

[R] Antwort: Re: Variable labels and value labels

2016-06-01 Thread G . Maubach
Hi Jim,

many thanks for the hint.

When looking at the documentation I did not get how I do control which 
value gets which label. Is it possible to define it?

Kind regards

Georg




Von:Jim Lemon 
An: g.maub...@weinwolf.de, r-help mailing list , 

Datum:  01.06.2016 03:59
Betreff:Re: [R] Variable labels and value labels



Hi Georg,
You may find the "add.value.labels" function in the prettyR package 
useful.

Jim

On Tue, May 31, 2016 at 10:00 PM,   wrote:
> Hi All,
>
> I am using R for social sciences. In this field I am used to use short
> variable names like "q1" for question 1, "q2" for question 2 and so on 
and
> label the variables like q1 : "Please tell us your age" or q2 : "Could 
you
> state us your household income?" or something similar indicating which
> question is stored in the variable.
>
> Similar I am used to label values like 1: "Less than 18 years", 2 : "18 
to
> 30 years", 3 : "31 to 60 years" and 4 : "61 years and more".
>
> I know that the packages Hmisc and memisc have a functionality for this
> but these labeling functions are limited to the packages they were 
defined
> for. Using the question tests as variable names is possible but very
> inconvenient.
>
> I there another way for labeling variables and values in R?
>
> Kind regards
>
> Georg Maubach
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging variables

2016-06-06 Thread G . Maubach
Hi All,

I merged two datasets:

ds_merge1 <- merge(x = ds_bw_customer_4_match, y = 
ds_zww_customer_4_match,
  by.x = "customer", by.y = "customer",
  all.x = TRUE, all.y = FALSE)

R created a new dataset with the variables customer.x and customer.y. I 
would like to merge these two variable back together. I wrote a little 
function (code can be run) for it:

-- cut --

customer.x <- c("Miller", "Smith", NA,"Bird", NA)
customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE)

t_merge_variables <-
  function(dataset,
   var1,
   var2,
   merged_var) {
 
# Initialize
dataset[[merged_var]] = rep(NA, nrow(dataset))
dataset[["mismatch"]] = rep(NA, nrow(dataset))
 
for (i in 1:nrow(dataset)) {
 
  # Check 1: var1 missing, var2 missing
  if (is.na(dataset[[i, var1]]) &
  is.na(dataset[[i, var2]])) {
dataset[["mismatch"]] <- 1  # var1 & var2 are missing
 
  # Check 2: var1 filled, var2 missing
  } else if (!is.na(dataset[[i, var1]]) &
 is.na(dataset[[i, var2]])) {
dataset[[i, merged_var]] <- dataset[[i, var1]]
dataset[["mismatch"]] <- 0
 
  # Check 3: var1 missing, var2 filled
  } else if (is.na(dataset[[i, var1]]) &
 !is.na(dataset[i, var2])) {
dataset[[i, merged_var]] <- dataset[[i, var2]]
dataset[["mismatch"]] <-  0
 
  # Check 4: var1 == var2
  } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
  dataset[[i, merged_var]] <- dataset[[i, var1]]
  dataset[["mismatch"]] <- 0

  # Leftover: var1 != var2
  } else {
dataset[[i, merged_var]] <- NA
dataset[["mismatch"]] <- 2  # var1 != var2
  }  # end if
}  # end for
return(dataset)
}

ds_var_merge1 <- t_merge_variables(dataset = ds_test,
  var1 = "customer.x",
  var2 = "customer.y",
  merged_var = "customer")

ds_var_merge1

-- cut --

It is executed without error but delivers the wrong values in the variable 
"mismatch". This variable is always 1 although it should be NA, 1 or 2 
respectively.

Can you tell me why the variable is not correctly set?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: RE: Merging variables

2016-06-06 Thread G . Maubach
Hi David,
Hi Petr,

many thanks for your help. With your hints I got the idea how I could do 
it and I came up with this solution:

-- cut --

#---
# Module: t_merge_variables.R
# Author: Georg Maubach
# Date  : 2016-06-06
# Update: 2016-06-06
# Description   : Merge two variables
# Source System : R 3.2.5 (64 Bit)
# Target System : R 3.2.5 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7-8

t_module_name = "t_merge_variables.R"
t_version = "2016-06-06"

cat(
  paste0("\n",
 t_module_name, " (Version: ", t_version, ")", "\n", "\n",
 "This software comes with ABSOLUTELY NO WARRANTY.",
 "\n", "\n"))

# If do_test is not defined globally define it here locally by 
un-commenting it
# Switch t_do_test to TRUE to run test
t_do_test <- FALSE

# [ Function Defintion 
]
t_merge_variables <-
  function(dataset,
   var1,
   var2,
   merged_var) {
# Merges two variables with identical, different or missing values
#
# Args:
#  dataset (data frame, data table):
#Object with dimnames, e.g. data frame, data table.
#  var1 (character):
#Variable 1 to be merged.
#  var2 (character):
#Variable 2 to be merged.
#  merged_var (class based on input variable, coercion done if 
possible):
#Variable with the merged variables var1 and var2.
#
# Operation:
#   Var1 and var2 are merged like follows:
#   if var1 == var2: merged_var <- var1
#   if var1 != var2: merged_var <- -900 (-900 = indicating mismatch)
#   if var1 is filled & var2 is missing: merged_var <- var1
#   if var1 is missing & var2 is filled: merged_var <- var2
#   if var1 is missing & var2 is filled: merged_var <- -999
#(-999 = indicating NA)
#
# Returns:
#   Original dataset and variable given in "merged_var" will be added.
#
# Error handling:
#   None.
#
# Credits: 
#   https://www.mail-archive.com/r-help@r-project.org/msg236012.html
 
# Initialize
dataset[merged_var] = rep(NA, nrow(dataset))

dataset[merged_var] <-
  # Check 1: var1 missing, var2 missing
  ifelse(is.na(dataset[, var1]) & is.na(dataset[, var2]), 
# then
dataset[[merged_var]] <- 0,
# Check 2: var1 filled, var2 missing
ifelse(!is.na(dataset[, var1]) & is.na(dataset[, var2]),
  # then
  dataset[[merged_var]] <- dataset[, var1],
  # Check 3: var1 missing, var2 filled
  ifelse(is.na(dataset[ , var1]) & !is.na(dataset[, var2]),
# then
dataset[[merged_var]] <- dataset[ , var2],
# Check 4: var1 == var2
ifelse(dataset[, var1] == dataset[, var2],
  # then: use var1
  dataset[[merged_var]] <- dataset[, var1],
  #Leftover: var1 != var2
  dataset[merged_var] <- 1
 
return(dataset)
}

# [ Test Defintion 
]
t_test <- function(do_test = FALSE) {
  if (do_test == TRUE) {
cat("\n", "\n", "Test function t_count_na()", "\n", "\n")
 
# Example dataset
customer.x <- c("Miller", "Smith", NA,"Bird", NA)
customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
ds_test <-
  data.frame(customer.x, customer.y, stringsAsFactors = FALSE)
 
# Call function
ds_merge <- t_merge_variables(
  dataset = ds_test,
  var1 = "customer.x",
  var2 = "customer.y",
  merged_var = "customer"
)
 
# Dataset after function call
ds_merge
  }
}

# [ Test Run 
]--
t_test(do_test = t_do_test)

# [ Clean up 
]--
rm("t_do_test", "t_module_name", "t_version", "t_test")

# EOF

-- cut --

It delivers the customer name if there is one or they match. If they don't 
match it delivers 1. If both are missing it delivers 0.

This solution is for my applications sufficient.

Many thanks again for your help and giving me the ideas to solve my data 
transformation task.

Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
"r-help@r-project.org" , 
Datum:  06.06.2016 15:04
Betreff:RE: [R] Merging variables



Hi

Not sure if this is the most effective or general solution but

Here you get 2 if the value is same in both columns, 1 if it is only in 
one column and the other is NA and 0 if there is mismatch of values.
temp <- (ds_test[,2] %in% ds_test[,1])+(ds_test[,1] %in% ds_test[,2])

here you get 0 if t

[R] Antwort: Re: Merging variables

2016-06-06 Thread G . Maubach
Hi Michael,

yes, I was astonished about this behaviour either. I have worked with SPSS 
a lot - and that works different.

I would like to share some of my data. Can you tell me how I can dump a 
dataset in a way that I can post it here as text?

Kind regards

Georg




Von:Michael Dewey 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  06.06.2016 15:45
Betreff:Re: [R] Merging variables



X-Originating-<%= hostname %>-IP: [217.155.205.190]

Dear Georg

I find it a bit surprising that you end up with customer.x and 
customer.y. Can you share with us a toy example of two data.frames which 
exhibit this behaviour?

On 06/06/2016 13:29, g.maub...@weinwolf.de wrote:
> Hi All,
>
> I merged two datasets:
>
> ds_merge1 <- merge(x = ds_bw_customer_4_match, y =
> ds_zww_customer_4_match,
>   by.x = "customer", by.y = "customer",
>   all.x = TRUE, all.y = FALSE)
>
> R created a new dataset with the variables customer.x and customer.y. I
> would like to merge these two variable back together. I wrote a little
> function (code can be run) for it:
>
> -- cut --
>
> customer.x <- c("Miller", "Smith", NA,"Bird", NA)
> customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
> ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE)
>
> t_merge_variables <-
>   function(dataset,
>var1,
>var2,
>merged_var) {
>
> # Initialize
> dataset[[merged_var]] = rep(NA, nrow(dataset))
> dataset[["mismatch"]] = rep(NA, nrow(dataset))
>
> for (i in 1:nrow(dataset)) {
>
>   # Check 1: var1 missing, var2 missing
>   if (is.na(dataset[[i, var1]]) &
>   is.na(dataset[[i, var2]])) {
> dataset[["mismatch"]] <- 1  # var1 & var2 are missing
>
>   # Check 2: var1 filled, var2 missing
>   } else if (!is.na(dataset[[i, var1]]) &
>  is.na(dataset[[i, var2]])) {
> dataset[[i, merged_var]] <- dataset[[i, var1]]
> dataset[["mismatch"]] <- 0
>
>   # Check 3: var1 missing, var2 filled
>   } else if (is.na(dataset[[i, var1]]) &
>  !is.na(dataset[i, var2])) {
> dataset[[i, merged_var]] <- dataset[[i, var2]]
> dataset[["mismatch"]] <-  0
>
>   # Check 4: var1 == var2
>   } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
>   dataset[[i, merged_var]] <- dataset[[i, var1]]
>   dataset[["mismatch"]] <- 0
>
>   # Leftover: var1 != var2
>   } else {
> dataset[[i, merged_var]] <- NA
> dataset[["mismatch"]] <- 2  # var1 != var2
>   }  # end if
> }  # end for
> return(dataset)
> }
>
> ds_var_merge1 <- t_merge_variables(dataset = ds_test,
>   var1 = "customer.x",
>   var2 = "customer.y",
>   merged_var = "customer")
>
> ds_var_merge1
>
> -- cut --
>
> It is executed without error but delivers the wrong values in the 
variable
> "mismatch". This variable is always 1 although it should be NA, 1 or 2
> respectively.
>
> Can you tell me why the variable is not correctly set?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: RE: Merging variables

2016-06-06 Thread G . Maubach
Hi Petr,

I would like to describe the data situation in brief:

I have an business warehouse dataset (referred to as BW data) containing 
sales and an ERP  customer master data dataset with additional information 
(referred to as ERP data). Though customer IDs and customer names are 
identical due to the fact that the business warehouse data is derived from 
the ERP data.  Due to selection criteria the BW data contains slightly 
more customers than the ERP data. So customer names and all other 
information is missing in the ERP data for some cases of the BW data.  If 
I merge those by customer ID variable customer names are duplicated using 
customer.x and customer.y as variable names.

As both fields contains the same contents I would have expected R to merge 
this into one variable, e. g. customer. But this is not the case.

Can I adjust the below given merge statement - which looks almost the same 
in my script - that R does the merge of the variables if they are 
identical automatically?

This is my code using left join:

-- cut --

ds_merge1 <- merge(x = ds_bw_customer_4_match, y = 
ds_erp_customer_4_match,
  by.x = "CustID", by.y = "CustID",
  all.x = TRUE, all.y = FALSE)

-- cut --

Kind regards

Georg




Von:PIKAL Petr 
An: Michael Dewey , "g.maub...@weinwolf.de" 
, "r-help@r-project.org" , 
Datum:  06.06.2016 17:04
Betreff:RE: [R] Merging variables



Hi Michael

it is simple

set.seed(111)
let=sample(letters[1:10],6, replace=T)
dat1<-data.frame(let=let, customer=sample(1:10,6, replace=T))
let=sample(letters[1:10],6, replace=T)
dat2<-data.frame(let=let, customer=sample(1:10,6, replace=T))
merge(dat1, dat2, by.x="let", by.y="let", all=T)

Of course you could add customer variable to by parameter but sometimes it 
is necessary to leave it out. When you have two sets of analytical results 
and you have 2 variables operator but you want to merge those sets e.g. by 
date/hour of analysis.

Regards
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael
> Dewey
> Sent: Monday, June 6, 2016 3:46 PM
> To: g.maub...@weinwolf.de; r-help@r-project.org
> Subject: Re: [R] Merging variables
>
> X-Originating-<%= hostname %>-IP: [217.155.205.190]
>
> Dear Georg
>
> I find it a bit surprising that you end up with customer.x and 
customer.y. Can
> you share with us a toy example of two data.frames which exhibit this
> behaviour?
>
> On 06/06/2016 13:29, g.maub...@weinwolf.de wrote:
> > Hi All,
> >
> > I merged two datasets:
> >
> > ds_merge1 <- merge(x = ds_bw_customer_4_match, y =
> > ds_zww_customer_4_match,
> >   by.x = "customer", by.y = "customer",
> >   all.x = TRUE, all.y = FALSE)
> >
> > R created a new dataset with the variables customer.x and customer.y.
> > I would like to merge these two variable back together. I wrote a
> > little function (code can be run) for it:
> >
> > -- cut --
> >
> > customer.x <- c("Miller", "Smith", NA,"Bird", NA)
> > customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
> > ds_test <- data.frame(customer.x, customer.y, stringsAsFactors =
> > FALSE)
> >
> > t_merge_variables <-
> >   function(dataset,
> >var1,
> >var2,
> >merged_var) {
> >
> > # Initialize
> > dataset[[merged_var]] = rep(NA, nrow(dataset))
> > dataset[["mismatch"]] = rep(NA, nrow(dataset))
> >
> > for (i in 1:nrow(dataset)) {
> >
> >   # Check 1: var1 missing, var2 missing
> >   if (is.na(dataset[[i, var1]]) &
> >   is.na(dataset[[i, var2]])) {
> > dataset[["mismatch"]] <- 1  # var1 & var2 are missing
> >
> >   # Check 2: var1 filled, var2 missing
> >   } else if (!is.na(dataset[[i, var1]]) &
> >  is.na(dataset[[i, var2]])) {
> > dataset[[i, merged_var]] <- dataset[[i, var1]]
> > dataset[["mismatch"]] <- 0
> >
> >   # Check 3: var1 missing, var2 filled
> >   } else if (is.na(dataset[[i, var1]]) &
> >  !is.na(dataset[i, var2])) {
> > dataset[[i, merged_var]] <- dataset[[i, var2]]
> > dataset[["mismatch"]] <-  0
> >
> >   # Check 4: var1 == var2
> >   } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
> >   dataset[[i, merged_var]] <- dataset[[i, var1]]
> >   dataset[["mismatch"]] <- 0
> >
> >   # Leftover: var1 != var2
> >   } else {
> > dataset[[i, merged_var]] <- NA
> > dataset[["mismatch"]] <- 2  # var1 != var2
> >   }  # end if
> > }  # end for
> > return(dataset)
> > }
> >
> > ds_var_merge1 <- t_merge_variables(dataset = ds_test,
> >   var1 = "customer.x",
> >   var2 = "customer.y",
> >   merged_var = "customer")
> >
> > ds_var_merge1
> >
> > -- cut --
> >
> > It is executed without error but delivers the wrong values in the
> > variable "mismatch". This variable is always 1 although it should be
> > NA, 1 or 2 respectively.
> >
> > Can you tell me why the variable is not correctly set?
> >
> > Kind regards
> >
> > Georg
> 

[R] Antwort: RE: Antwort: Re: Merging variables

2016-06-08 Thread G . Maubach
Hi Petr,

thanks for your reply.

I prepared little example for you:

-- cut --

ds_temp_1 <-
  structure(list(
CustId = c(1001, 1002, 1003, 1004, 1005, 1006),
CustName = c("Miller", "Smith", "Doe", "White", "Black",
 "Nobody"),
sales = c(100, 500, 300, 50, 700, 10)
  ),
  .Names = c("CustId",
 "CustName", "sales"), row.names = c(NA, 6L), class = 
"data.frame")

ds_temp_2 <-
  structure(
list(
  CustId = c(1001, 1002, 1003),
  CustName = c("Miller",
   "Smith", "Doe"),
  CustGroup = c(1, 2, 3)
),
.Names = c("CustId",
   "CustName", "CustGroup"),
row.names = c(NA, 3L),
class = "data.frame"
  )

ds_merge <- merge(ds_temp_1, ds_temp_2,
  by.x = "CustId", all.x = TRUE,
  by.y = "CustId", all.y = FALSE)

ds_merge

-- cut --

which gives

ds_merge
  CustId CustName.x sales CustName.y CustGroup
1   1001 Miller   100 Miller 1
2   1002  Smith   500  Smith 2
3   1003Doe   300Doe 3
4   1004  White50   NA
5   1005  Black   700   NA
6   1006 Nobody10   NA

where CustName is split into CustName.x and CustName.y.

What I would like to have is:

ds_merge
  CustId CustName   sales  CustGroup
1   1001 Miller   100  1
2   1002  Smith   500  2
3   1003Doe   300  3
4   1004  White50 NA
5   1005  Black   700 NA
6   1006 Nobody10 NA

That is CustName in a single variable cause the values within that 
variable are identical. I guess because of NA for some cases in ds_temp_2 
R generates CustName.x and CustName.y.

Is there a simple way of merging a dataset and having R return a single 
variable is the values are identical or missing in either one of the 
datasets?

Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  07.06.2016 13:11
Betreff:RE: [R] Antwort: Re:  Merging variables



Hi

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Tuesday, June 7, 2016 8:19 AM
> To: Michael Dewey 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Re: Merging variables
>
> Hi Michael,
>
> yes, I was astonished about this behaviour either. I have worked with 
SPSS a
> lot - and that works different.

If you want to join two data frames by common names you can use use

merge(dat1, dat2, )

without specifing by. From help page:

By default the data frames are merged on the columns with names they both 
have, but separate specifications of the columns can be given by by.x and 
by.y. The rows in the two data frames that match on the specified columns 
are extracted, and joined together.

>
> I would like to share some of my data. Can you tell me how I can dump a
> dataset in a way that I can post it here as text?

copy result of dput directly to your mail

dput(dat)
structure(list(hz = c(0, 25, 50), vykon = c(0, 11.6, 22.6)), .Names = 
c("hz",
"vykon"), row.names = c(NA, -3L), class = "data.frame")

We can use

dat <- structure(list(hz = c(0, 25, 50), vykon = c(0, 11.6, 22.6)), .Names 
= c("hz",
"vykon"), row.names = c(NA, -3L), class = "data.frame")

to reconstruct the object.

Regards
Petr

>
> Kind regards
>
> Georg
>
>
>
>
> Von:Michael Dewey 
> An: g.maub...@weinwolf.de, r-help@r-project.org,
> Datum:  06.06.2016 15:45
> Betreff:Re: [R] Merging variables
>
>
>
> X-Originating-<%= hostname %>-IP: [217.155.205.190]
>
> Dear Georg
>
> I find it a bit surprising that you end up with customer.x and 
customer.y. Can
> you share with us a toy example of two data.frames which exhibit this
> behaviour?
>
> On 06/06/2016 13:29, g.maub...@weinwolf.de wrote:
> > Hi All,
> >
> > I merged two datasets:
> >
> > ds_merge1 <- merge(x = ds_bw_customer_4_match, y =
> > ds_zww_customer_4_match,
> >   by.x = "customer", by.y = "customer",
> >   all.x = TRUE, all.y = FALSE)
> >
> > R created a new dataset with the variables customer.x and customer.y.
> > I would like to merge these two variable back together. I wrote a
> > little function (code can be run) for it:
> >
> > -- cut --
> >
> > customer.x <- c("Miller", "Smith", NA,"Bird", NA)
> > customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
> > ds_test <- data.frame(customer.x, customer.y, stringsAsFactors =
> > FALSE)
> >
> > t_merge_variables <-
> >   function(dataset,
> >var1,
> >var2,
> >merged_var) {
> >
> > # Initialize
> > dataset[[merged_var]] = rep(NA, nrow(dataset))
> > dataset[["mismatch"]] = rep(NA, nrow(dataset))
> >
> > for (i in 1:nrow(dataset)) {
> >
> >   # Check 1: var1 missing, var2 missing
> >   if (is.na(dataset[[i, var1]]) &
> >   is.na(dataset[[i, var2]])) {
> > dataset[["mismatch"]] <- 1  # var1 & var2 are missing

[R] Warning message in openxlsx

2016-06-14 Thread G . Maubach
Hi All,

I get the warning message

Warning message:
In styles$font : partial match of 'font' to 'fonts'

when executing


> xls_workbook <- t_create_workbook()
> xls_sheetname <- "Kunden"
> xls_ds_to_save <- ds_merge1
> xls_filename <- paste0(data_created, 
"_Merge1_BW-SAP-Kunden_cleaned.xlsx")
> t_add_sheet(workbook = xls_workbook,
+ sheetname = xls_sheetname,
+ dataset = xls_ds_to_save)
> t_write_xlsx(workbook = xls_workbook,
+  path = path_output,
+  filename = xls_filename,
+  overwrite = TRUE)

where t_create_workbook() is

return(createWorkbook())

and t_add_sheet() is

 addWorksheet(workbook,
sheetName = sheetname)
  writeDataTable(workbook, 
sheet = sheetname, 
x = dataset)
  ### writeDataTable writes data to a sheet an adds
  ### autofilter to the first line
  if (freeze_row <= 1 | freeze_col <= 1) {
NULL # do nothing
  }
  else {
freezePane(workbook,
  sheet = sheetname,
  firstActiveRow = freeze_row,
  firstActiveCol = freeze_col)
  }
 
  setColWidths(workbook,
sheet = sheetname,
cols = 1:ncol(dataset), 
widths = "auto")

and t_write_xlsx is

saveWorkbook(workbook, 
file = file.path(path, filename),
overwrite = overwrite)

I am woundring what "partial match of 'font' to 'fonts'" means cause I do 
not call it in the functions calls. I use these calls a lot in my programs 
but never got this message before.

What does this message mean? How can I avoid this message?

Kind regards

Georg Maubach

PS: You can find more information about the used functions by going to 
https://sourceforge.net/projects/r-project-utilities/files/?source=navbar 
.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Installation of package "rio" broken

2016-06-14 Thread G . Maubach
Hi all,

today I wanted to install package "rio". As it depends on package "feather" 
which is only available as source I have chosen to install "rio" from source. 
The installations fails with the following messages:

-- cut --
* installing *source* package 'feather' ...
** Paket 'feather' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c RcppExports.cpp -o RcppExports.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather-read.cpp -o feather-read.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather-types.cpp -o feather-types.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather-write.cpp -o feather-write.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/buffer.cc -o feather/buffer.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/feather-c.cc -o feather/feather-c.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/io.cc -o feather/io.o
feather/io.cc:18:0: warning: "NOMINMAX" redefined [enabled by default]
c:\program 
files\rtools\gcc-4.6.3\bin\../lib/gcc/i686-w64-mingw32/4.6.3/../../../../include/c++/4.6.3/i686-w64-mingw32/bits/os_defines.h:46:0:
 note: this is the location of the previous definition
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/metadata.cc -o feather/metadata.o
feather/metadata.cc:29:7: error: expected nested-name-specifier before 
'FBString'
feather/metadata.cc:29:7: error: 'FBString' has not been declared
feather/metadata.cc:29:16: error: expected ';' before '=' token
feather/metadata.cc:29:16: error: expected unqualified-id before '=' token
feather/metadata.cc:32:7: error: expected nested-name-specifier before 
'ColumnVector'
feather/metadata.cc:32:7: error: 'ColumnVector' has not been declared
feather/metadata.cc:32:20: error: expected ';' before '=' token
feather/metadata.cc:32:20: error: expected unqualified-id before '=' token
feather/metadata.cc:178:3: error: 'ColumnVector' does not name a type
feather/metadata.cc: In member function 'feather::Status 
feather::metadata::TableBuilder::Impl::Finish()':
feather/metadata.cc:146:5: error: 'FBString' was not declared in this scope
feather/metadata.cc:146:14: error: expected ';' before 'desc'
feather/metadata.cc:148:7: error: 'desc' was not declared in this scope
feather/metadata.cc:154:9: error: 'desc' was not declared in this scope
feather/metadata.cc:156:27: error: 'columns_' was not declared in this scope
feather/metadata.cc:157:34: error: unable to deduce 'auto' from ''
feather/metadata.cc: In member function 'void 
feather::metadata::TableBuilder::Impl::add_column(const 
flatbuffers::Offset&)':
feather/metadata.cc:173:5: error: 'columns_' was not declared in this scope
feather/metadata.cc: In constructor 
'feather::metadata::TableBuilder::TableBuilder()':
feather/metadata.cc:190:5: error: type 'feather::metadata::TableBuilder' is not 
a direct base of 'feather::metadata::TableBuilder'
make: *** [feather/metadata.o] Error 1
Warnung: Ausführung von Kommando 'make -f "Makevars" -f 
"C:/PROGRA~1/R/R-32~1.2/etc/i386/Makeconf" -f 
"C:/PROGRA~1/R/R-32~1.2/share/make/winshlib.mk" CXX='$(CXX1X) $(CXX1XSTD)' 
CXXFLAGS='$(CXX1XFLAGS)' CXXPICFLAGS='$(CXX1XPICFLAGS)' 
SHLIB_LDFLAGS='$(SHLIB_CXX1XLDFLAGS)' SHLIB_LD='$(SHLIB_CXX1XLD)' 
SHLIB="feather.dll" OBJECTS="RcppExports.o feather-read.o feather-types.o 
feather-write.o"' ergab Status 2
ERROR: compilation failed for package 'feather'
* removing 'C:/Users/admin/Documents/R/win-library/3.2/feather'
Warning in install.packages :
  running command '"C:/PROGRA~1/R/R-32~1.2/bin/x64/R" CMD INSTALL -l 
"C:\Users\admin\Documents\R\win-library\3.2" 
C:\Users\admin\AppData\Local\

[R] Building a binary vector out of dichotomous variables

2016-06-16 Thread G . Maubach
Hi All,

I need to build a binary vector made of a set of dichotomous variables.

What I have so far is:

-- cut --

ds_example <-
  structure(
list(
  year2013 = c(0, 0, 0, 1, 1, 1, 1, 0),
  year2014 = c(0,
   0, 1, 1, 0, 0, 1, 1),
  year2015 = c(0, 1, 1, 1, 0, 1, 0, 0)
),
.Names = c("year2013",
   "year2014", "year2015"),
row.names = c(NA, 8L),
class = "data.frame"
  )

attach(ds_example)
base <- 1000
binary_vector <- base + year2013 * 100 + year2014 * 10 + year2015
detach(ds_example)

binary_vector

ds_example <- cbind(ds_example, binary_vector)

varlist <- c("year2013", "year2014", "year2015")

base <- 10^length(varlist)

binary_vector <- NULL

for (i in 1:3) {
  binary_vector <- 
   base + 
   ds_example [[varlist[i]]] * base / (10 ^ i)
}

ds_example <- cbind(ds_example, binary_vector)

message("Wrong result!")
ds_example

-- cut --

How do I get vectors like  1000 1001 1011  1100 1101 1110 1010 for 
each case?

Is there a better approach than mine?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fw: Aw: Re: Building a binary vector out of dichotomous variables

2016-06-17 Thread G . Maubach
> Hi Tom,
> 
> thanks for your reply.
> 
> Yes, that's exactly what I am looking for. I did not know about the automatic 
> type conversion in R.
> 
> #-- cut --
> ds_example <-
>   structure(
> list(
>   year2013 = c(0, 0, 0, 1, 1, 1, 1, 0),
>   year2014 = c(0,
>0, 1, 1, 0, 0, 1, 1),
>   year2015 = c(0, 1, 1, 1, 0, 1, 0, 0)
> ),
> .Names = c("year2013",
>"year2014", "year2015"),
> row.names = c(NA, 8L),
> class = "data.frame"
>   )
> 
> #-- Proposal: works!
> as.numeric(with(ds_example,paste(1,year2013,year2014,year2015,sep='')))
> 
> # I store my know-how about R in functions for later use.
> 
> #--´ Putting it in a function - does not work!
> t_make_binary_vector <- function(dataset,
>  input_variables,
>  output_variable = "binary_vector") {
>   dataset[output_variable] <- "1"
>   print(dataset[output_variable])
>   
>   for (variable in input_variables) {
> print(variable)
> dataset[output_variable] <- paste(dataset[output_variable],
>   dataset[variable], 
>   sep='')
>   }
>   
>   # print(dataset[output_variable])
> 
>   dataset[output_variable] <- as.integer(dataset[output_variable])
>   
>   return(dataset)
> }
> 
> t_make_binary_vector(dataset = ds_example,
>  input_variables = c("year2013", "year2014", "year2015"),
>  output_variable = "binary_vector")
> 
> 
> #-- Doesn't work either.
> t_make_binary_vector <- function(dataset,
>  input_variables,
>  output_variable = "binary_vector") {
>   dataset[output_variable] <- as.integer(paste(1, dataset[ , 
> input_variables], sep = ''))
> 
>   return(dataset)
> }
> 
> t_make_binary_vector(dataset = ds_example,
>  input_variables = c("year2013", "year2014", "year2015"),
>  output_variable = "binary_vector")
> 
> #-- cut --
> 
> Why is R taking the parameter value itself to paste it together instead of 
> referencing the variable within the dataset?
> 
> What did I get wrong about R? How can I fix it?
> 
> Kind regards
> 
> Georg
> 
> 
> > Gesendet: Donnerstag, 16. Juni 2016 um 16:13 Uhr
> > Von: "Tom Wright" 
> > An: g.maub...@weinwolf.de
> > Cc: "R. Help" 
> > Betreff: Re: [R] Building a binary vector out of dichotomous variables
> >
> > Does this do what you want?
> > 
> > as.numeric(with(ds_example,paste(1,year2013,year2014,year2015,sep='')))
> > 
> > On Thu, Jun 16, 2016 at 8:57 AM,   wrote:
> > > Hi All,
> > >
> > > I need to build a binary vector made of a set of dichotomous variables.
> > >
> > > What I have so far is:
> > >
> > > -- cut --
> > >
> > > ds_example <-
> > >   structure(
> > > list(
> > >   year2013 = c(0, 0, 0, 1, 1, 1, 1, 0),
> > >   year2014 = c(0,
> > >0, 1, 1, 0, 0, 1, 1),
> > >   year2015 = c(0, 1, 1, 1, 0, 1, 0, 0)
> > > ),
> > > .Names = c("year2013",
> > >"year2014", "year2015"),
> > > row.names = c(NA, 8L),
> > > class = "data.frame"
> > >   )
> > >
> > > attach(ds_example)
> > > base <- 1000
> > > binary_vector <- base + year2013 * 100 + year2014 * 10 + year2015
> > > detach(ds_example)
> > >
> > > binary_vector
> > >
> > > ds_example <- cbind(ds_example, binary_vector)
> > >
> > > varlist <- c("year2013", "year2014", "year2015")
> > >
> > > base <- 10^length(varlist)
> > >
> > > binary_vector <- NULL
> > >
> > > for (i in 1:3) {
> > >   binary_vector <-
> > >base +
> > >ds_example [[varlist[i]]] * base / (10 ^ i)
> > > }
> > >
> > > ds_example <- cbind(ds_example, binary_vector)
> > >
> > > message("Wrong result!")
> > > ds_example
> > >
> > > -- cut --
> > >
> > > How do I get vectors like  1000 1001 1011  1100 1101 1110 1010 for
> > > each case?
> > >
> > > Is there a better approach than mine?
> > >
> > > Kind regards
> > >
> > > Georg
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible

[R] (Off-Topic] Introducing a new R Blog

2016-06-20 Thread G . Maubach
Hi All,

today I would like to announce a now R blog. I contains a few entries 
about the findings during my course of studies and my daily work:

https://github.com/gmaubach/R-Know-How/wiki/R-Blog

I hope you'll find my hints usefull.

In addition you could have a look at a small R collection of functions I 
found usefull when working with my data:

https://github.com/gmaubach/R-Project-Utilities

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subscripting problem with is.na()

2016-06-23 Thread G . Maubach
Hi All,

I would like to recode my NAs to 0. Using a single vector everything is 
fine.

But if I use a data.frame things go wrong:

-- cut --

var1 <- c(1:3, NA, 5:7, NA, 9:10)
var2 <- c(1:3, NA, 5:7, NA, 9:10)
ds_test <-
  data.frame(var1, var2)

test <- var1
test[is.na(test)] <- 0
test  # NA recoded OK

# First try
ds_test[is.na(ds_test$var1)] <- 0  # duplicate subscripts WRONG

# Second try
ds_test[is.na("var1")] <- 0 
ds_test$var1  # not recoded WRONG

# Third try: to me the most intuitive approach
is.na(ds_test["var1"]) <- 0  # attempt to select less than one element in 
integerOneIndex WRONG

# Fourth try
ds_test[is.na(var1)] <- 0  # duplicate subscripts for columns WRONG

-- cut --
 
How can I do it correctly?

Where could I have found something about it?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] r_toolbox: Update

2016-06-23 Thread G . Maubach
Hi folks,

I have updated the functions of the r_toolbox.R set of utilities:

https://sourceforge.net/projects/r-project-utilities/files/?source=navbar

Naming was changed with some functions to reflect similar functions in SAS 
or SPSS, e. g. t_n_miss, t_n_valid. In addition I added functions for 
reporting memory usage, selecting variables by type and getting an 
overview over the levels of factors.

I hope you find these functions useful.

Please get back to me if you have suggestions or encounter any 
difficulties.

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subscripting problem with is.na()

2016-06-24 Thread G . Maubach
Hi Bert,

many thanks for all your help and your comments. I learn at lot this way.

My question was about is.na() at the first sight but the actual task looks like 
this:

I have two variables in my customer data that signal if the customer accout was 
closed by master data management or by sales. Say these variables are 
closed_mdm and closed_sls. They contain NA if the customer account is still 
open or a closing code from "01" to "08" if the customer account was closed and 
why.

For my analysis I need a variable that combines the two variables closed_mdm 
and closed_sls to set a filter easily on those who are closed not matter what 
the reason was nor who closed the account.

As I always encounter problems when dealing with ifelse statements and NA I 
decided to merge these two variables to one variable containing 0 = not closed 
and 1 = closed. In my context this seems to be - at least to me - a reasonable 
approach.

Replacement of missing values and merging the variables is the easiest way for 
me.

-- cut --

cust_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20)
closed_mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, "04", 
NA, NA, NA, NA, NA, NA, NA)
closed_sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, NA, NA, 
"05", NA, NA, NA, NA, NA)

# 1st try
ds_temp1 <- data.frame(cust_id, closed_mdm, closed_sls)
ds_temp1

ds_temp1$closed <- closed_mdm | closed_sls  # WRONG

# 2nd try
closed_mdm_fac1 <- as.factor(closed_mdm)
closed_sls_fac1 <- as.factor(closed_sls)

ds_temp2 <- data.frame(cust_id, closed_mdm_fac1, closed_sls_fac1)
ds_temp2

ds_temp2$closed <- ds_temp$closed_mdm_fac1 | ds_temp$closed_sls_fac1  # WRONG

# 3rd try
closed_mdm_num1 <- as.numeric(closed_mdm)  # OK
closed_sls_num1 <- as.numeric(closed_sls)  # OK

ds_temp3 <- data.frame(cust_id, closed_mdm_num1, closed_sls_num1)
ds_temp3

ds_temp3$closed <- ds_temp$closed_mdm_num1 | ds_temp$closed_sls_num1  # WRONG

# 4th try
ds_temp4 <- ds_temp3
ds_temp4

# Does not run due to not allowed NA in subscripts
ds_temp4[is.na(ds_temp4$closed_mdm_num1), ds_temp4$closed_mdm_num1] <- 0
ds_temp4[is.na(ds_temp4$closed_sls_num1), ds_temp4$closed_sls_num1] <- 0

# 5th try
ds_temp4$closed_mdm_num1 <- ifelse(is.na(ds_temp4$closed_mdm_num1), 1, 0)
ds_temp4$closed_sls_num1 <- ifelse(is.na(ds_temp4$closed_sls_num1), 1, 0)
ds_temp4

ds_temp4$closed <- ifelse(ds_temp4$closed_mdm_num1 == 1 | 
ds_temp4$closed_sls_num1 == 1, 1, 0)
ds_temp4

-- cut --

Is there a better way to do it?

Kind regards

Georg


> Gesendet: Donnerstag, 23. Juni 2016 um 23:55 Uhr
> Von: "Bert Gunter" 
> An: "David L Carlson" 
> Cc: "R Help" 
> Betreff: Re: [R] Subscripting problem with is.na()
>
> ... actually, FWIW, I would say that this little discussion mostly
> demonstrates why the OP's request is probably not a good idea in the
> first place. Usually, NA's should be left as NA's to be dealt with
> properly by R and packages. In biological measurements, for example,
> NA's often mean "below the ability to reliably measure." Biologists
> with whom I've worked over many years often want to convert these to 0
> or omit the cases, both of which lead to biased estimates and/or
> underestimates of variability and excess claims of "statistical
> significance" (for those who belong to this religious persuasion). One
> should never say never, but I suspect that there are relatively few
> circumstances where the conversion the OP requested is actually wise.
> 
> Feel free to ignore/reject such extraneous comments of course.
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson  wrote:
> > Good point. I did not think about factors. Also your example raises another 
> > issue since column c is logical, but gets silently converted to numeric. 
> > This would seem to get the job done assuming the conversion is intended for 
> > numeric columns only:
> >
> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> >> sapply(test, class)
> > a b c
> > "numeric"  "factor" "logical"
> >> num <- sapply(test, is.numeric)
> >> test[, num][is.na(test[, num])] <- 0
> >> test
> >   ab  c
> > 1 1A NA
> > 2 0b NA
> > 3 2  NA
> >
> > David C
> >
> > -Original Message-
> > From: Bert Gunter [mailto:bgunter.4...@gmail.com]
> > Sent: Thursday, June 23, 2016 1:48 PM
> > To: David L Carlson
> > Cc: Ivan Calandra; R Help
> > Subject: Re: [R] Subscripting problem with is.na()
> >
> > Not in general, David:
> >
> > e.g.
> >
> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> >
> >> is.na(test)
> >  a bc
> > [1,] FALSE FALSE TRUE
> > [2,]  TRUE FALSE TRUE
> > [3,] FALSE  TRUE TRUE
> >
> >> test[is.na(test)]
> > [1] NA NA NA NA NA
> >
> >> test[is.na(test)

[R] Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach
Hi David,
Hi Bert,

many thanks for the valuable discussion on NA in R (please see extract 
below). I follow your arguments leaving NA as they are for most of the 
time. In special occasions however I want to replace the NA with another 
value. To preserve the newly acquired knowledge for me I wrote this 
function:

-- cut --
t_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[[variable]], "factor") == TRUE) {
   dataset[variable] <- as.character(dataset[variable])
   print(class(dataset[variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[variable] <- as.factor(dataset[variable])
   print(class(dataset[variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}

ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA))
print(sapply(ds_test, class))

t_replace_na(ds_test, "a", value = -1)
t_replace_na(ds_test, "b", value = -2)
t_replace_na(ds_test, "c", value = -3)
-- cut --

Unfortunately the if-statement does not work due to a wrong class 
definition within the function. When finding out what is going on I did 
this:

-- cut --
test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[variable]), TRUE))
  } else {
return(c(class(dataset[variable]), FALSE))
  }
}

ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA))
print(sapply(ds_test, class))

# -- Test a --
class(ds_test[, "a"])
if(inherits(ds_test[, "a"], "factor")) {
  print(c(class(ds_test[, "a"]), "TRUE"))
} else {
  print(c(class(ds_test[, "a"]), "FALSE"))
}
test_class(ds_test, "a")
warning("'a' should be numeric NOT data.frame!")

# -- Test b --
if(inherits(ds_test[, "b"], "factor")) {
  print(c(class(ds_test[, "b"]), "TRUE"))
} else {
  print(c(class(ds_test[, "b"]), "FALSE"))
}
class(ds_test[, "b"])
test_class(ds_test, "b")
warning("'b' should be logical NOT data.frame!")

# -- Test c --
if(inherits(ds_test[, "c"], "factor")) {
  print(c(class(ds_test[, "c"]), "TRUE"))
} else {
  print(c(class(ds_test[, "c"]), "FALSE"))
}
class(ds_test[, "c"])
test_class(ds_test, "c")
warning("'c' should be factor NOT data.frame.
In addition data.frame != factor")
-- cut --

Why do I get different results for the same function if it is inside or 
outside my own function definition?

Kind regards

Georg



> Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> Von: "David L Carlson" 
> An: "Bert Gunter" 
> Cc: "R Help" 
> Betreff: Re: [R] Subscripting problem with is.na()
>
> Good point. I did not think about factors. Also your example raises 
another issue since column c is logical, but gets silently converted to 
numeric. This would seem to get the job done assuming the conversion is 
intended for numeric columns only:
> 
> > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> > sapply(test, class)
> a b c 
> "numeric"  "factor" "logical" 
> > num <- sapply(test, is.numeric)
> > test[, num][is.na(test[, num])] <- 0
> > test
>   ab  c
> 1 1A NA
> 2 0b NA
> 3 2  NA
> 
> David C

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach
Hi Petr,

many thanks for your reply and the examples.

My subscripting problems drive me nuts.

I have understood that dataset[variable] is semantically identical to 
dataset[, variable] cause dataset[variable] takes all cases because no 
other subscripts are given.

Where can I lookup the rules when to use the comma and when not?

Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with 
is.na()



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson ; Bert Gunter
> 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to replace the NA with another
> value. To preserve the newly acquired knowledge for me I wrote this
> function:
>
> -- cut --
> t_replace_na <- function(dataset, variable, value) {
>  if(inherits(dataset[[variable]], "factor") == TRUE) {
>dataset[variable] <- as.character(dataset[variable])
>print(class(dataset[variable]))
>dataset[, variable][is.na(dataset[, variable])] <- value
>dataset[variable] <- as.factor(dataset[variable])
>print(class(dataset[variable]))
>  } else {
>dataset[, variable][is.na(dataset[, variable])] <- value
>  }
>  return(dataset)
> }
>



> class(ds_test[, "c"])
> test_class(ds_test, "c")
> warning("'c' should be factor NOT data.frame.
> In addition data.frame != factor")
> -- cut --
>
> Why do I get different results for the same function if it is inside or
> outside my own function definition?

Because you still are missing the way how to subscript data frames.

test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[,variable]), TRUE))
 
} else {
return(c(class(dataset[,variable]), FALSE))
##
  }
}

> test_class(ds_test, "a")
[1] "numeric" "FALSE"
> test_class(ds_test, "c")
[1] "factor" "TRUE"
>

If you properly arrange commas in your function you get desired result

p_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[,variable], "factor") == TRUE) {
   dataset[,variable] <- as.character(dataset[,variable])
   print(class(dataset[,variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[, variable] <- as.factor(dataset[, variable])
   print(class(dataset[, variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}

> p_replace_na(ds_test, "c", value = -3)
[1] "character"
[1] "factor"
   a  b  c
1  1 NA  A
2 NA NA  b
3  2 NA -3

> t_replace_na(ds_test, "c", value = -3)
[1] "data.frame"
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
>

Cheers
Petr



>
> Kind regards
>
> Georg
>
> 
>
> > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> > Von: "David L Carlson" 
> > An: "Bert Gunter" 
> > Cc: "R Help" 
> > Betreff: Re: [R] Subscripting problem with is.na()
> >
> > Good point. I did not think about factors. Also your example raises
> another issue since column c is logical, but gets silently converted to
> numeric. This would seem to get the job done assuming the conversion is
> intended for numeric columns only:
> >
> > > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> > > sapply(test, class)
> > a b c
> > "numeric"  "factor" "logical"
> > > num <- sapply(test, is.numeric)
> > > test[, num][is.na(test[, num])] <- 0
> > > test
> >   ab  c
> > 1 1A NA
> > 2 0b NA
> > 3 2  NA
> >
> > David C
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jak�koliv k n�mu p�ipojen� dokumenty jsou d�v�rn� a jsou 
ur�eny pouze jeho adres�t�m.
Jestli�e jste obdr�el(a) tento e-mail omylem, informujte laskav� 
neprodlen� jeho odes�latele. Obsah tohoto emailu i s p��lohami a jeho 
kopie vyma�te ze sv�ho syst�mu.
Nejste-li zam��len�m adres�tem tohoto emailu, nejste opr�vn�ni tento email 
jakkoliv u��vat, roz�i�ovat, kop�rovat �i zve�ej�ovat.
Odes�latel e-mailu neodpov�d� za eventu�ln� �kodu zp�sobenou modifikacemi 
�i zpo�d�n�m p�enosu e-mailu.

V p��pad�, �e je tento e-mail sou��st� obchodn�ho jedn�n�:
- vyhrazuje si odes�latel pr�vo ukon�it kd

[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach
Hi All,

Petr, Bert, David, Ivan, Duncan and Rui helped me to develop a function 
able to replace NA's in variables IF NEEDED:

#---
# Module: t_replace_na.R
# Author: Georg Maubach
# Date  : 2016-06-27
# Update: 2016-06-27
# Description   : Replace NA with another value
# Source System : R 3.3.0 (64 Bit)
# Target System : R 3.3.0 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7-8

t_version = "2016-06-27"
t_module_name = "t_replace_na.R"

cat(
  paste0("\n",
 t_module_name, " (Version: ", t_version, ")", "\n", "\n",
 "This software comes with ABSOLUTELY NO WARRANTY.",
 "\n", "\n"))

# If do_test is not defined globally define it here locally by 
un-commenting it
t_do_test <- FALSE

# [ Function Defintion 
]
t_replace_na <- function(dataset, variables, value) {
  # Replace NA with another given value
  #
  # Args:
  #   dataset (data frame, data table):
  # Object with dimnames, e.g. data frame, data table.
  #   variables (character vector):
  # List of variable names.
  #
  # Operation:
  #   NA is replaced by the value given with the parameter "value".
  #
  #   A factor is converted explicitly with as.character(), the missing 
value
  #   replacement is done and then the character vector is converted back 
with
  #   as.factor(). Thus NA becomes a category of the new factor variable.
  #
  # Caution:
  #   Please check your data in case you replace NA within factors due to
  #   explicit type conversion. Tests were done only for the below given
  #   dataset.
  #
  # Returns:
  #   Original dataset.
  #
  # Error handling:
  #   None.
  #
  # Credits: 
https://www.mail-archive.com/r-help@r-project.org/msg236537.html

  for (variable in variables) {
if (inherits(dataset[, variable], "factor") == TRUE) {
  dataset[, variable] <- as.character(dataset[, variable])
  print(class(dataset[, variable]))
  dataset[, variable][is.na(dataset[, variable])] <- value
  dataset[, variable] <- as.factor(dataset[, variable])
  print(class(dataset[, variable]))
} else {
  dataset[, variable][is.na(dataset[, variable])] <- value
}
  }
  return(dataset)
}

# [ Test Defintion 
]
t_test <- function(do_test = FALSE) {
  if (do_test == TRUE) {
cat("\n", "\n", "Test function t_count_na()", "\n", "\n")
 
# Example dataset
ds_example <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = 
c("A","b",NA))
 
cat("\n", "\n", "Example dataset before function call", "\n", "\n")
cat("Variables and their classes:\n")
print(sapply(ds_example, class))
cat("Dataset:\n")
print(ds_example)
 
cat("\n", "\n", "Function call", "\n", "\n")
ds_result <- t_replace_na(ds_example, "a", value = -1)
cat("\n", "\n", "Dataset after function call", "\n", "\n") 
print(ds_result)
 
cat("\n", "\n", "Function call", "\n", "\n")
ds_result <- t_replace_na(ds_example, "b", value = -2)
cat("\n", "\n", "Example dataset after function call", "\n", "\n") 
print(ds_result)

cat("\n", "\n", "Function call", "\n", "\n") 
ds_result <- t_replace_na(ds_example, "c", value = -3)
cat("\n", "\n", "Example dataset after function call", "\n", "\n") 
print(ds_result) 
  }
}

# [ Test Run 
]--
t_test(do_test = t_do_test)

# [ Clean up 
]--
rm("t_module_name", "t_version", "t_do_test", "t_test")

# EOF .

Please note: R has capabilities to handle NA correctly. There is often no 
need to recode NA. Also NA might or might not have meaning. You have to 
decide with regard to the meaning of the original data and the business 
problem.

Kind regards

Georg




Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with 
is.na()



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson ; Bert Gunter
> 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to replace the NA with another
> value. To preserve the newly acquired knowledge for me I wrote this
> function:
>
> --

[R] Installing from source on Windows 7: tibble

2016-06-29 Thread G . Maubach
Hi All,

I would like to install R packages from source on Windows 7 64-Bit. 
Currently my settings are:

-- cut --
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252 
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.3.0
-- cut --

The environment variable PATH on Windows 7 is set to:

C:\R-Project\Rtools\mingw_32\bin;C:\R-Project\Rtools\mingw_64\bin;C:\R-Project\Rtools\bin;C:\R-Project\Rtools\gcc-4.6.3\bin;C:\Program
 
Files\Python 3.5\Scripts\;C:\Program Files\Python 
3.5\;C:\Python27\;C:\Python27\Scripts; etc. etc.

RTools is installed in C:\R-Project\RTools

The call of

C:\R-Project\Rtools\mingw_64\bin\g++.exe --version

results in

g++ (x86_64-posix-seh, Built by MinGW-W64 project) 4.9.3

If I do


> install.packages("tibble", type = "source")

I get

-- cut --
trying URL 'https://cran.uni-muenster.de/src/contrib/tibble_1.0.tar.gz'
Content type 'application/x-gzip' length 38038 bytes (37 KB)
downloaded 37 KB

* installing *source* package 'tibble' ...
** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG 
-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
c:/Rtools/mingw_32/bin/g++: not found
make: *** [RcppExports.o] Error 127
Warnung: Ausführung von Kommando 'make -f 
"C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f 
"C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" 
SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' 
SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab 
Status 2
ERROR: compilation failed for package 'tibble'
* removing 'C:/R-Project/R-3.3.0/library/tibble'
* restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
Warning in install.packages :
  running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l 
"C:\R-Project\R-3.3.0\library" 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\Rtmp23SQxM/downloaded_packages/tibble_1.0.tar.gz'
 
had status 1
Warning in install.packages :
  installation of package ‘tibble’ had non-zero exit status
-- cut --

There is no make.conf in "C:\R-Project\Rtools\mingw_64\etc". I found "
Makeconf" in "C:\R-Project\R-3.3.0\etc\x64". Do I need it? How do I need 
to configure the settings in this file?

I searched old aunt Google but did not understand what to do and how to 
configure R environment variables correctly.

What do I need to do to install packages from source?

Kind regards

Georg


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Installing from source on Windows 7: tibble

2016-06-29 Thread G . Maubach
Hi Duncan,

many thanks for your reply.

I did insert die paths to the g++ compiler because I got the message about 
the not existent compiler.

I took the directories for the compiler out again:

C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program 
Files\Python 3.5\Scripts\;C:\Program Files\Python 
3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc.

Calling

install.packages("tibble", type  = "source")


gives this message:

-- cut --
* installing *source* package 'tibble' ...
** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG 
-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
c:/Rtools/mingw_32/bin/g++: not found
make: *** [RcppExports.o] Error 127
Warnung: Ausführung von Kommando 'make -f 
"C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f 
"C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" 
SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' 
SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab 
Status 2
ERROR: compilation failed for package 'tibble'
* removing 'C:/R-Project/R-3.3.0/library/tibble'
* restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
Warning in install.packages :
  running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l 
"C:\R-Project\R-3.3.0\library" 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\RtmpGqOlOW/downloaded_packages/tibble_1.0.tar.gz'
 
had status 1
Warning in install.packages :
  installation of package ‘tibble’ had non-zero exit status
-- cut --

What else could I do?

Kind regards

Georg





Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  29.06.2016 13:07
Betreff:Re: [R] Installing from source on Windows 7: tibble



On 29/06/2016 5:49 AM, g.maub...@weinwolf.de wrote:
> Hi All,
>
> I would like to install R packages from source on Windows 7 64-Bit.
> Currently my settings are:
>
> -- cut --
>> sessionInfo()
> R version 3.3.0 (2016-05-03)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_3.3.0
> -- cut --
>
> The environment variable PATH on Windows 7 is set to:
>
> 
C:\R-Project\Rtools\mingw_32\bin;C:\R-Project\Rtools\mingw_64\bin;C:\R-Project\Rtools\bin;C:\R-Project\Rtools\gcc-4.6.3\bin;C:\Program
> Files\Python 3.5\Scripts\;C:\Program Files\Python
> 3.5\;C:\Python27\;C:\Python27\Scripts; etc. etc.

Take the mingw_32, mingw_64 and gcc-4.6.3 directories off your path. 
They aren't needed; the first two could conceivably be harmful.

>
> RTools is installed in C:\R-Project\RTools
>
> The call of
>
> C:\R-Project\Rtools\mingw_64\bin\g++.exe --version
>
> results in
>
> g++ (x86_64-posix-seh, Built by MinGW-W64 project) 4.9.3
>
> If I do
>
>
>> install.packages("tibble", type = "source")
>
> I get
>
> -- cut --
> trying URL 'https://cran.uni-muenster.de/src/contrib/tibble_1.0.tar.gz'
> Content type 'application/x-gzip' length 38038 bytes (37 KB)
> downloaded 37 KB
>
> * installing *source* package 'tibble' ...
> ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
> ** libs
>
> *** arch - i386
> c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG
> -I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> c:/Rtools/mingw_32/bin/g++: not found
> make: *** [RcppExports.o] Error 127
> Warnung: Ausführung von Kommando 'make -f
> "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f
> "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk"
> SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)'
> SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab
> Status 2
> ERROR: compilation failed for package 'tibble'
> * removing 'C:/R-Project/R-3.3.0/library/tibble'
> * restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
> Warning in install.packages :
>   running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l
> "C:\R-Project\R-3.3.0\library"
> 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\Rtmp23SQxM/downloaded_packages/tibble_1.0.tar.gz'
> had status 1
> Warning in install.packages :
>   installation of package ‘tibble’ had non-zero exit status
> -- cut --
>
> There is no make.conf in "C:\R-Project\Rtools\mingw_64\etc". I found "
> Makeconf" in "C:\R-Project\R-3.3.0\etc\x64". Do I need it? How do I need
> to configure the settings in this file?

Yes, since you haven't installed Rtools in the default location, you 
should edit two Makeconf files.  In 
C:\R-Project\R-3.3.0\etc\x64

[R] Antwort: Re: Antwort: Re: Installing from source on Windows 7: tibble [SOLVED]

2016-06-29 Thread G . Maubach
Hi Duncan,

indeed, I did not see the other part of your message.

I did

BINPREF ?= C:/R-Project/Rtools/mingw_32/bin/
COMPILED_BY = g++ # instead of gcc-4.9.3

in "C:\R-Project\R-3.3.0\etc\i386\Makeconf"

and

BINPREF ?= C:/R-Project/Rtools/mingw_64/bin/
COMPILED_BY = g++ # instead of gcc-4.9.3

in "C:\R-Project\R-3.3.0\etc\x64\Makeconf"

Now I could compile the package with no futher errors.

Messages are

-- cut --
* installing *source* package 'tibble' ...
** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
matrixToDataFrame.cpp -o matrixToDataFrame.o
C:/R-Project/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o 
tibble.dll tmp.def RcppExports.o matrixToDataFrame.o 
-Ld:/Compiler/gcc-4.9.3/local330/lib/i386 
-Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/i386 -lR
installing to C:/R-Project/R-3.3.0/library/tibble/libs/i386

*** arch - x64
C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
matrixToDataFrame.cpp -o matrixToDataFrame.o
C:/R-Project/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o 
tibble.dll tmp.def RcppExports.o matrixToDataFrame.o 
-Ld:/Compiler/gcc-4.9.3/local330/lib/x64 
-Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/x64 -lR
installing to C:/R-Project/R-3.3.0/library/tibble/libs/x64
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (tibble)
-- cut --

So - complete success.

Many thanks for your help.

One last questions: Why did Rtools.exe not create a directory named 
"gcc-4.9.3" in "C:\R-Project\Rtools" and putting "
C:\R-Project\Rtools\mingw_32" and "C:\R-Project\Rtools\mingw_64" directly 
in "C:\R-Project\Rtools\"? gcc-4.6.3 was installed that way.

Kind regards

Georg





Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  29.06.2016 16:21
Betreff:Re: Antwort: Re: [R] Installing from source on Windows 7: 
tibble



On 29/06/2016 10:17 AM, g.maub...@weinwolf.de wrote:
> Hi Duncan,
>
> many thanks for your reply.
>
> I did insert die paths to the g++ compiler because I got the message 
about
> the not existent compiler.
>
> I took the directories for the compiler out again:
>
> C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program
> Files\Python 3.5\Scripts\;C:\Program Files\Python
> 3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc.
>
> Calling
>
> install.packages("tibble", type  = "source")
>
>
> gives this message:
>
> -- cut --
> * installing *source* package 'tibble' ...
> ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
> ** libs
>
> *** arch - i386
> c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG
> -I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> c:/Rtools/mingw_32/bin/g++: not found
> make: *** [RcppExports.o] Error 127
> Warnung: Ausführung von Kommando 'make -f
> "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f
> "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk"
> SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)'
> SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab
> Status 2
> ERROR: compilation failed for package 'tibble'
> * removing 'C:/R-Project/R-3.3.0/library/tibble'
> * restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
> Warning in install.packages :
>running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l
> "C:\R-Project\R-3.3.0\library"
> 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\RtmpGqOlOW/downloaded_packages/tibble_1.0.tar.gz'
> had status 1
> Warning in install.packages :
>installation of package ‘tibble’ had non-zero exit status
> -- cut --
>
> What else could I do?

You seem to have missed the second part of my advice, describing what to 
do with the two Makeconf files.

Duncan Murdoch

>
> Kind regards
>
> Georg
>
>
>
>
>
> Von:Duncan Murdoch 
> An: g.maub...@weinwolf.de, r-help@r-project.org,
> Datum:  29.06.2016 13:07
> 

[R] Antwort: Re: Antwort: Re: Antwort: Re: Installing from source on Windows 7: tibble [RE OPENED]

2016-06-29 Thread G . Maubach
Hi Duncan,

I would not have changed the COMPILED_BY option unless I thought I have 
to.

In my "C:\R-Project\Rtools\mingw_32\bin" I have 

c++.exe
g++.exe
gcc.exe
i686-w64-mingw32-c++.exe
i686-w64-mingw32-g++.exe
i686-w64-mingw32-gcc-4.9.3.exe
i686-w64-mingw32-gcc.exe

In my "C:\R-Project\Rtools\mingw_64\bin" I have

c++.exe
cpp.exe
g++.exe
gcc.exe
x86_64-w64-mingw32-c++.exe
x86_64-w64-mingw32-g++.exe
x86_64-w64-mingw32-gcc-4.9.3.exe
x86_64-w64-mingw32-gcc.exe

Which one should I configure and use?

Kind regards

Georg




Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  29.06.2016 17:34
Betreff:Re: Antwort: Re: Antwort: Re: [R] Installing from source 
on Windows 7: tibble [SOLVED]



On 29/06/2016 10:48 AM, g.maub...@weinwolf.de wrote:
> Hi Duncan,
>
> indeed, I did not see the other part of your message.
>
> I did
>
> BINPREF ?= C:/R-Project/Rtools/mingw_32/bin/
> COMPILED_BY = g++ # instead of gcc-4.9.3

I wouldn't change the COMPILED_BY; some packages use it to configure 
themselves for gcc-4.9.3, as opposed to the previous version gcc-4.6.3.

>
> in "C:\R-Project\R-3.3.0\etc\i386\Makeconf"
>
> and
>
> BINPREF ?= C:/R-Project/Rtools/mingw_64/bin/
> COMPILED_BY = g++ # instead of gcc-4.9.3
>
> in "C:\R-Project\R-3.3.0\etc\x64\Makeconf"
>
> Now I could compile the package with no futher errors.
>
> Messages are
>
> -- cut --
> * installing *source* package 'tibble' ...
> ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
> ** libs
>
> *** arch - i386
> C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> matrixToDataFrame.cpp -o matrixToDataFrame.o
> C:/R-Project/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o
> tibble.dll tmp.def RcppExports.o matrixToDataFrame.o
> -Ld:/Compiler/gcc-4.9.3/local330/lib/i386
> -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/i386 -lR
> installing to C:/R-Project/R-3.3.0/library/tibble/libs/i386
>
> *** arch - x64
> C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> matrixToDataFrame.cpp -o matrixToDataFrame.o
> C:/R-Project/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o
> tibble.dll tmp.def RcppExports.o matrixToDataFrame.o
> -Ld:/Compiler/gcc-4.9.3/local330/lib/x64
> -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/x64 -lR
> installing to C:/R-Project/R-3.3.0/library/tibble/libs/x64
> ** R
> ** inst
> ** preparing package for lazy loading
> ** help
> *** installing help indices
> ** building package indices
> ** installing vignettes
> ** testing if installed package can be loaded
> *** arch - i386
> *** arch - x64
> * DONE (tibble)
> -- cut --
>
> So - complete success.
>
> Many thanks for your help.
>
> One last questions: Why did Rtools.exe not create a directory named
> "gcc-4.9.3" in "C:\R-Project\Rtools" and putting"
> C:\R-Project\Rtools\mingw_32" and "C:\R-Project\Rtools\mingw_64" 
directly
> in "C:\R-Project\Rtools\"? gcc-4.6.3 was installed that way.

The 4.6.3 compiler was compiled for "multilib" operation:  the same 
compiler took command line options to distinguish between 32 bit and 64 
bit compiles.  The newer version doesn't support that, so we need two 
separate installs.

Duncan Murdoch

> Kind regards
>
> Georg
>
>
>
>
>
> Von:Duncan Murdoch 
> An: g.maub...@weinwolf.de,
> Kopie:  r-help@r-project.org
> Datum:  29.06.2016 16:21
> Betreff:Re: Antwort: Re: [R] Installing from source on Windows 
7:
> tibble
>
>
>
> On 29/06/2016 10:17 AM, g.maub...@weinwolf.de wrote:
> > Hi Duncan,
> >
> > many thanks for your reply.
> >
> > I did insert die paths to the g++ compiler because I got the message
> about
> > the not existent compiler.
> >
> > I took the directories for the compiler out again:
> >
> > C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program
> > Files\Python 3.5\Scripts\;C:\Program Files\Python
> > 3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc.
> >
> > Calling
> >
> > install.packages("tibble", type  = "source")
> >
> >
> > gives this message:
> >
> > -- cut --
> > * installing *source* package 'tibble' ...
> > ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
> > ** libs
> >
> > *** arch - i386
> > c:/Rt

[R] Writing a formula to Excel

2016-06-30 Thread G . Maubach
Hi All,

I am using excel.link to work seemslessly with Excel.

In addition to values, like numbers and strings, I would like to insert a 
full operational formula into a cell.


xlc["G14"] <- print(paste("=G9*100/G6"), quote = FALSE)


The strings is put into the cell, but the cell is not evaluated. Thus the 
string is show as result of the computation.

If I open that cell b pressing "F2" or by double-clicking the cell and 
pressing RETURN will start the evaluation of the expession.


xlc["G14"] <- parse("=G9*100/G6") # does not run


How can I put a formula into Excel that is evaluated right away?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Documenting data

2016-06-30 Thread G . Maubach
Hi Pito,
Dear Readers,

as other have already mentioned, there are good practices for documenting code 
and data. I would like to summarize them and add a few not mentioned earlier:

1. You should have always two things: your raw data and your R script/s. The 
raw data is immutable whereas the R script/s produce the results.

2. You might want to distinguish between documentating your CODE and 
documenting your DATA. Documenting code is similar to what you already know 
from your programmng experiences. Documenting data is somewhat different cause 
you store information about the meaning of you data directly in your data.

Example
You have a variable with codes ranging from 1 to 5. But what do they mean? 
Perhaps it could be

1 = Strongly agree
2 = Agree
3 = Neither agree/nor disagree
4 = Disagree
5 = Strongly Disagree

But it could also be the other way round:

1 = Strongly Disagree
2 = Disagree
3 = Nether agree/nor disagree
4 = Agree
5 = Strongly Agree

What the codes in your variable means depends on the systems oder processes you 
derived your data from.

Within R there are some limitations for storing the informtation about what a 
variable or a value within a variable means. Possibilities to store this 
information is in other software packages like SAS or SPSS much broader 
implemented. In R you can work with meaningful variable names and the data 
type/class factor which can store mappings between values and value 
descriptions.

Example
-- cut --
var1 <- c(rep(1:5, 3))
ds_example <- data.frame(var1)

var1_labels <- c("1 = Strongly Agree",
"2 = Agree",
"3 = Neither agree/nor disagree",
"4 = Disagree",
"5 = Strongly disagree")

ds_example[["var1"]] <- factor(ds_example[["var1"]],
   levels = c(1, 2, 3, 4, 5),
   labels = var1_labels)

summary(ds_example["var1"])
-- cut --

In addition you find methods to work with variable labels and value labels in 
the pacakges Hmisc and memisc. They can also produce a thing called codebook 
which contains all variable names, variable labels, values, value labels and 
summaries of the distribution of values within the variables.

3. In addition to this you could structure your script in a modular way 
according to the analysis process, e. g. 
importing, cleaning, preparation for analysis, analysis, reporting. Other 
structure may be more sufficient in your case. These modules could have a 
number in the file name indicating in which sequence the scripts should be run.

4. I find it valuable to use a software repository like Github, Sourceforge or 
others to keep the revisions save and seucre in case you would like to go back 
to a version with code you deleted before and figure out that you need it now 
again. The R Studio IDE has an interface to git if you like to go with that. 
Good commit message can help you track what has changed. Commits also help you 
to prepare precise steps when developing your scripts.

5. I have no experience with Sweave or knitr but you could also compile a 
simple documentation through copying comments to an Excel sheet using R-2-Excel 
libraries like excel.link or others.

Example
install.packages("excel.link")
library(excel.link)
xlc["A1"] <- "Project Documentation"
xlc["A2"] <- "Step XY"
xlc["A3"] <- "Some explanation about step xy"

This way you have the documentation in your code and in an external source.

Which approach you chose depends on your experience with R and its libraries as 
well as the size of your project and the need for documentation.

6. It can be helpful to store interim results in a format that can be read by 
non-R-users, e. g. Excel.

7. Documenting code can be done using roxygen2.

If there are different opinions to my suggestions please say so.

Kind regards

Georg


> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr
> Von: "Pito Salas" 
> An: r-help@r-project.org
> Betreff: [R] Documenting data
>
> I am studying statistics and using R in doing it. I come from software 
> development where we document everything we do.
> 
> As I “massage” my data, adding columns to a frame, computing on other data, 
> perhaps cleaning, I feel the need to document in detail what the meaning, or 
> background, or calculations, or whatever of the data is. After all it is now 
> derived from my raw data (which may have been well documented) but it is 
> “new.” 
> 
> Is this a real problem? Is there a “best practice” to address this?
> 
> Thanks!
> 
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE

Re: [R] Documenting data

2016-06-30 Thread G . Maubach
Hi Bert,
Hi Readers,

I did not know much about attributes in R and how to use them. If it is that 
flexible you are right and I have learnt something.

Kind regards

Georg

> Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr
> Von: "Bert Gunter" 
> An: g.maub...@gmx.de
> Cc: "Pito Salas" , "R Help" 
> Betreff: Re: [R] Documenting data
>
> I believe Georg's pronouncements are wrong. See inline below.
> 
> -- Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> "...
> 
> > Within R there are some limitations for storing the informtation about what 
> > a variable or a value within a variable means.
> 
> That is FALSE. There are no limitations. For example, just attach a
> "doc" attribute to your data that says whatever you wish to about
> them. e.g.
> 
> > somedata <- runif(10)
> > attr(somedata,"doc") <- "Anything you want to say about the data"
> 
> > attr(somedata,"doc")
> [1] "Anything you want to say about the data"
> 
> 
> You can go as crazy as you want to with this, e.g. creating a (S3 or
> S4 )class "documented" with appropriate methods for printing it from
> classes that inherit from data frames, lists, etc. See also the
> roxygen2 package for data documentation and R's ?promptData function
> for data documentation file in Rd format.
> 
> R is Turing complete -- so it can do anything any other programming
> language can do. You could program SAS in R if you wanted. The
> difference is that SAS has pre-programmed some capabilities that R
> leaves for users, including contributed packages -- like Sweave,
> knitr, etc.  You may or may not like this extra flexibility (and extra
> work, depending on whether someone else has already done the work for
> you), and efficiency may or may not be an issue; but to say that R has
> "limitations" is a gross misrepresentation, imho.
> 
> 
> 
> Possibilities to store this information is in other software packages
> like SAS or SPSS much broader implemented. In R you can work with
> meaningful variable names and the data type/class factor which can
> store mappings between values and value descriptions.
> >
> > Example
> > -- cut --
> > var1 <- c(rep(1:5, 3))
> > ds_example <- data.frame(var1)
> >
> > var1_labels <- c("1 = Strongly Agree",
> > "2 = Agree",
> > "3 = Neither agree/nor disagree",
> > "4 = Disagree",
> > "5 = Strongly disagree")
> >
> > ds_example[["var1"]] <- factor(ds_example[["var1"]],
> >levels = c(1, 2, 3, 4, 5),
> >labels = var1_labels)
> >
> > summary(ds_example["var1"])
> > -- cut --
> >
> > In addition you find methods to work with variable labels and value labels 
> > in the pacakges Hmisc and memisc. They can also produce a thing called 
> > codebook which contains all variable names, variable labels, values, value 
> > labels and summaries of the distribution of values within the variables.
> >
> > 3. In addition to this you could structure your script in a modular way 
> > according to the analysis process, e. g.
> > importing, cleaning, preparation for analysis, analysis, reporting. Other 
> > structure may be more sufficient in your case. These modules could have a 
> > number in the file name indicating in which sequence the scripts should be 
> > run.
> >
> > 4. I find it valuable to use a software repository like Github, Sourceforge 
> > or others to keep the revisions save and seucre in case you would like to 
> > go back to a version with code you deleted before and figure out that you 
> > need it now again. The R Studio IDE has an interface to git if you like to 
> > go with that. Good commit message can help you track what has changed. 
> > Commits also help you to prepare precise steps when developing your scripts.
> >
> > 5. I have no experience with Sweave or knitr but you could also compile a 
> > simple documentation through copying comments to an Excel sheet using 
> > R-2-Excel libraries like excel.link or others.
> >
> > Example
> > install.packages("excel.link")
> > library(excel.link)
> > xlc["A1"] <- "Project Documentation"
> > xlc["A2"] <- "Step XY"
> > xlc["A3"] <- "Some explanation about step xy"
> >
> > This way you have the documentation in your code and in an external source.
> >
> > Which approach you chose depends on your experience with R and its 
> > libraries as well as the size of your project and the need for 
> > documentation.
> >
> > 6. It can be helpful to store interim results in a format that can be read 
> > by non-R-users, e. g. Excel.
> >
> > 7. Documenting code can be done using roxygen2.
> >
> > If there are different opinions to my suggestions please say so.
> >
> > Kind regards
> >
> > Georg
> >
> >
> >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr
> >> Von: "Pito Salas" 
> >> An: r-help@r-project.org
> >> Betreff

[R] Dump of new Methods

2016-07-04 Thread G . Maubach
Dear Readers,
Hi All,

to drive my R knowlegde a bit further I followed the advice of some of you 
by reading Chambers: Programming with data.

I tried some examples from the book:

-- cut --

setClass("track", representation (x = "numeric",
y = "numeric"))

track <- function(x, y) {
  # an object representing measurements 'y', tracked at positions 'x'
  x <- as(x, "numeric")
  y <- as(y, "numeric")
  if(length(x) != length(y)) {
stop("x, y should have equal length!")
  }
  new("track", x = x, y = y)
}

dumpMethod("track", "track")

setMethod("show", "track",
  function(object) {
xy = rbind(object@x, object@y)
dimanmes(xy) = list(c("x", "y"),
1:ncol(y))
show(xy)
  })

setMethod("plot",
  signature(x = "track", y = "missing"),
  function(x, y, ...)
plot(unclass(x), xlab = "Position", ylab = "Value", ...)
  )

dumpMethod("plot", "track")

-- cut --

Where do I find the dumped data? Is it in a single file or is every dump 
stored in a separate file? Where is it stored on my drive?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Dump of new Methods (SOLVED)

2016-07-04 Thread G . Maubach
Hi Bert,

many thanks.

Found them.

Kind regards

Georg




Von:Bert Gunter 
An: g.maub...@weinwolf.de, 
Datum:  04.07.2016 16:43
Betreff:Re: [R] Dump of new Methods



?getwd

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jul 4, 2016 at 1:34 AM,   wrote:
> Dear Readers,
> Hi All,
>
> to drive my R knowlegde a bit further I followed the advice of some of 
you
> by reading Chambers: Programming with data.
>
> I tried some examples from the book:
>
> -- cut --
>
> setClass("track", representation (x = "numeric",
> y = "numeric"))
>
> track <- function(x, y) {
>   # an object representing measurements 'y', tracked at positions 'x'
>   x <- as(x, "numeric")
>   y <- as(y, "numeric")
>   if(length(x) != length(y)) {
> stop("x, y should have equal length!")
>   }
>   new("track", x = x, y = y)
> }
>
> dumpMethod("track", "track")
>
> setMethod("show", "track",
>   function(object) {
> xy = rbind(object@x, object@y)
> dimanmes(xy) = list(c("x", "y"),
> 1:ncol(y))
> show(xy)
>   })
>
> setMethod("plot",
>   signature(x = "track", y = "missing"),
>   function(x, y, ...)
> plot(unclass(x), xlab = "Position", ylab = "Value", ...)
>   )
>
> dumpMethod("plot", "track")
>
> -- cut --
>
> Where do I find the dumped data? Is it in a single file or is every dump
> stored in a separate file? Where is it stored on my drive?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: dplyr : row total for all groups in dplyr summarise

2016-07-05 Thread G . Maubach
Hi guys,

I checked out your example but I can't follow the results.:

> mtcars %>%
+   group_by (am, gear) %>%
+   summarise (n=n()) %>%
+   mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
+   ungroup() %>%
+   mutate(row.tot = sum(n))
Source: local data frame [4 x 5]

 am  gear n rel.freq row.tot
  (dbl) (dbl) (int)(chr)   (int)
1 0 315  79%  32
2 0 4 4  21%  32
3 1 4 8  62%  32
4 1 5 5  38%  32

We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %. 
The same with the other columns. How is 79 % calculated?

When searching the web I saw this example:

-- cut --

#-- not run --
url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv";
response <- GET(url)
Hollywoodmovies2011 <- content(x = GET(url), as = data.frame)
#-- end not run

Hollywoodmovies2011 %>% 
  group_by(genre) %>%
  summarize(count = n()) %>%
  mutate(rf = count / sum(count))

-- cut --

which gives

Source: local data frame [9 x 3]

  Genre count   %
 (fctr) (int)   (dbl)
1Action32 0.235294118
2 Adventure 1 0.007352941
3 Animation12 0.088235294
4Comedy27 0.198529412
5 Drama21 0.154411765
6   Fantasy 2 0.014705882
7Horror17 0.12500
8   Romance11 0.080882353
9  Thriller13 0.095588235

Here the % correspond to the count and the sum of count, e. g. sum = 136 
and 32 / 136 = 0,2352941.

What is the difference when counting? What do the relative counts in the 
first example mean?

Kind regards

Georg





Von:Ulrik Stervbo 
An: David Winsemius , 
Kopie:  r-help@r-project.org, mai...@infomed.sld.cu
Datum:  05.07.2016 06:06
Betreff:Re: [R] dplyr : row total for all groups in dplyr 
summarise
Gesendet von:   "R-help" 



That will give you the wrong result when used on summarised data

David Winsemius  schrieb am Di., 5. Juli 2016 
02:10:

> I thought there was an nrow() function?
>
> Sent from my iPhone
>
> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo  
wrote:
>
> If you want the total number of rows in the original data.frame after
> counting the rows in each group, you can ungroup and sum the row counts,
> like:
>
> library("dplyr")
>
>
> mtcars %>%
>group_by (am, gear) %>%
>summarise (n=n()) %>%
>mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>ungroup() %>%
>mutate(row.tot = sum(n))
>
> HTH
> Ulrik
>
> On Mon, 4 Jul 2016 at 18:23 David Winsemius 
> wrote:
>
>>
>> > On Jul 4, 2016, at 6:56 AM, mai...@infomed.sld.cu wrote:
>> >
>> > Hello,
>> > How can I aggregate row total for all groups in dplyr summarise ?
>>
>> Row total … of what? Aggregate … how? What is the desired answer?
>>
>>
>>
>> > library(dplyr)
>> > mtcars %>%
>> >  group_by (am, gear) %>%
>> >  summarise (n=n()) %>%
>> >  mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
>> >
>> > best regard
>> > Maicel Monzon
>> >
>> >
>> >
>> > 
>> >
>> >
>> >
>> >
>> > --
>> > Este mensaje le ha llegado mediante el servicio de correo electronico
>> que ofrece Infomed para respaldar el cumplimiento de las misiones del
>> Sistema Nacional de Salud. La persona que envia este correo asume el
>> compromiso de usar el servicio a tales fines y cumplir con las 
regulaciones
>> establecidas
>> >
>> > Infomed: http://www.sld.cu/
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] WG: Fw: Re: dplyr : row total for all groups in dplyr summarise

2016-07-06 Thread G . Maubach
Hi All,

if I run the suggested code

mtcars %>%
  group_by (am, gear) %>%
  summarise (n = n()) %>%
  mutate(rel.freq = paste0(round(100 * n / sum(n), 0), "%")) %>%
  ungroup() %>%
  plyr::rbind.fill(data.frame(n = nrow(mtcars), rel.freq =
  "100%”))

I get

> mtcars %>%
+   group_by (am, gear) %>%
+   summarise (n = n()) %>%
+   mutate(rel.freq = paste0(round(100 * n / sum(n), 0), "%")) %>%
+   ungroup() %>%
+   plyr::rbind.fill(data.frame(n = nrow(mtcars), rel.freq =
+   "100%”))




+ 


R stops execution cause something within the prgram syntax is missing.

What has to be changed to be able to run the code?

Kind regards

Georg Maubach


> Gesendet: Dienstag, 05. Juli 2016 um 18:30 Uhr
> Von: "David Winsemius" 
> An: mai...@infomed.sld.cu
> Cc: r-help@r-project.org
> Betreff: Re: [R] dplyr : row total for all groups in dplyr summarise
>
> 
> 
> mtcars %>%
>group_by (am, gear) %>%
>summarise (n=n()) %>%
>mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>ungroup() %>% plyr::rbind.fill(data.frame( 
n=nrow(mtcars),rel.freq="100%”))
> 
> 
> > On Jul 5, 2016, at 4:47 AM, mai...@infomed.sld.cu wrote:
> > 
> > Sorry, what I wanted to do was to add a total row at the end of the 
summary. The marginal totals by columns correspond to 100% and the sum of 
levels.
> > best reagard
> > Maicel Monzon
> > 
> > 
> > Ulrik Stervbo  escribió:
> > 
> >> Yes. But in the sample code the data is summarised. In which case you 
get 4
> >> rows and not the correct 32.
> >> 
> >> On Tue, 5 Jul 2016, 07:48 David Winsemius,  
wrote:
> >> 
> >>> nrow(mtcars)
> >>> 
> >>> 
> >>> Sent from my iPhone
> >>> 
> >>> On Jul 4, 2016, at 9:03 PM, Ulrik Stervbo  
wrote:
> >>> 
> >>> That will give you the wrong result when used on summarised data
> >>> 
> >>> David Winsemius  schrieb am Di., 5. Juli 
2016
> >>> 02:10:
> >>> 
>  I thought there was an nrow() function?
>  
>  Sent from my iPhone
>  
>  On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo 
>  wrote:
>  
>  If you want the total number of rows in the original data.frame 
after
>  counting the rows in each group, you can ungroup and sum the row 
counts,
>  like:
>  
>  library("dplyr")
>  
>  
>  mtcars %>%
>    group_by (am, gear) %>%
>    summarise (n=n()) %>%
>    mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>    ungroup() %>%
>    mutate(row.tot = sum(n))
>  
>  HTH
>  Ulrik
>  
>  On Mon, 4 Jul 2016 at 18:23 David Winsemius 

>  wrote:
>  
> > 
> > > On Jul 4, 2016, at 6:56 AM, mai...@infomed.sld.cu wrote:
> > >
> > > Hello,
> > > How can I aggregate row total for all groups in dplyr summarise 
?
> > 
> > Row total ? of what? Aggregate ? how? What is the desired answer?
> > 
> > 
> > 
> > > library(dplyr)
> > > mtcars %>%
> > >  group_by (am, gear) %>%
> > >  summarise (n=n()) %>%
> > >  mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
> > >
> > > best regard
> > > Maicel Monzon
> > >
> > >
> > >
> > > 
> > >
> > >
> > >
> > >
> > > --
> > > Este mensaje le ha llegado mediante el servicio de correo 
electronico
> > que ofrece Infomed para respaldar el cumplimiento de las misiones 
del
> > Sistema Nacional de Salud. La persona que envia este correo asume 
el
> > compromiso de usar el servicio a tales fines y cumplir con las 
regulaciones
> > establecidas
> > >
> > > Infomed: http://www.sld.cu/
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, 
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible 
code.
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>  
>  
> >> 
> > 
> > 
> > 
> > 
> > This message was sent using IMP, the Internet Messaging Program.
> > 
> > 
> > 
> > --
> > Este mensaje le ha llegado mediante el servicio de correo electronico 
que ofrece Infomed para respaldar el cumplimiento de las misiones del 
Sistema Nacional de Salud. La persona que envia este correo asume el 
compromiso de usar el servicio a tales fines y cumplir con las 
regulaciones establecidas
> > 
> > Infomed: http://www.sld.cu/
> > 
>

[R] Formatting ggplot2 graph

2016-07-06 Thread G . Maubach
Hi All,

my current code looks lke this:

freq_ls <- structure(list(Var1 = c("zldkkd", "aakdkdk", 
   "aaakdkd", "aaieiwo", "vöalsl", 
"ssddkdk", 
   "glowowp", "laoiw", "ruklow", 
"rolsl", 
   "delk", "inslvnz"), Anzahl = c(1772L, 
761L, 
 536L, 317L, 197L, 160L, 30L, 20L, 10L, 6L, 6L, 1L), Prozent = c(46.4, 
 19.9, 14, 
8.3, 5.2, 4.2, 0.8, 0.5, 0.3, 0.2, 0.2, 0)), .Names = c("Var1", 
  "Anzahl", 
"Prozent"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
 -12L))
ggplot(freq_ls) +
  geom_bar(aes(x = Var1,
   y = Anzahl),
   stat = "identity",
   fill = "gray") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ggtitle("Title of the Plot")

I would like to add the abolute and relative frequencies on top of the 
bars. In addition I want the values printed in descending ording according 
to the data.

I searched the web and found:

geom_text(stat='bin',aes(label=..count..),vjust=-1)

(Source: 
http://stackoverflow.com/questions/26553526/how-to-add-frequency-count-labels-to-the-bars-in-a-bar-graph-using-ggplot2
)

but this does not work in my case. Inserting the code

ggplot(freq_ls) +
  geom_bar(aes(x = Var1,
   y = Anzahl),
   stat = "identity",
   fill = "gray") +
  geom_text(stat='bin',aes(label=..count..),vjust=-1) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ggtitle("Title of the Plot")

results in

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: Removed 1 rows containing missing values (geom_text). 


I looked in the book Wickhan: ggplot2 but could find an answer to the 
question:

- How to show number if tey are pre-calculated?
- How to sort the bars according to the sequence of values in descending 
order or if - pre-ordered - in the given order?

What do I have to change in my code to do it?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Choropleth: Turnover by ZipCode

2016-07-11 Thread G . Maubach
Hi All,
Dear Readers,

I need to create a choropleth graph with turnover by zipcode. This is what 
I have so far:

# Not run (Begin)
# Install packages if needed
# install.packages(pkgs = c("maptools", "rgdal", "RColorBrewer", 
"grDevices"))
# Not run (End)

# Load libraries
library(maptools); library(rgdal); library(RColorBrewer); 
library(grDevices)

# Configuration
# Adjust if needed!
file_path <- file.path("C:", "temp")

# Read data 
# Source: http://arnulf.us/PLZ
url <- "http://www.metaspatial.net/download/plz.tar.gz";
file_name_gzip <- basename(url)
file_name_extract <- "post_pl.shp"

download.file(url, file.path(file_path, file_name_gzip))

untar(tarfile = file.path(file_path, file_name_gzip),
  compressed = "gzip",
  exdir = file_path)

# Dataset
# I have the data for all zipcodes available in my region
ds_temp <-
  structure(
list(
  ZipCode = c(1099, 10178, 13125, 21406, 32429, 41569),
  Sales = c(4, 2, 9, 5, 7, 3),
  Revenue = c(12, 9, 100, 80, 90,
  25)
),
.Names = c("ZipCode", "Sales", "Revenue"),
row.names = c(NA,
  6L),
class = "data.frame"
  )
print(ds_temp)

# Prepare graphic
file_name_pdf <- file.path(file_path, "sales-and-revenue-by-zipcodes.pdf")
cairo_pdf(bg = "grey98", file_name_pdf, width = 16, height = 9)

y <- readShapeSpatial(file.path(file_path, file_name_extract),
  proj4string = CRS("+proj=longlat"))
x <- spTransform(y,CRS=CRS("+proj=merc"))

# How do I need to change this line?
# Needs to be replaced by turnover from ds_temp
color <- sample(1:7, length(x), replace=T) 

# Create graphic
plot(x, 
 col = brewer.pal(7, "Oranges")[color],
 border = F)  # How to I tell R to plot turnover from ds_temp?

# Title
mtext(
  "Turnover by Zipcodes",
  side = 3,
  line = -4,
  adj = 0,
  cex = 1.7
)

# Write to disc
dev.off()

# Cleanup
rm("ds_temp", "color", "file_name_extract",
   "file_name_gzip", "file_name_pdf", "file_path",
   "url", "x", "y")
unlink(file.path(file_path, "plz.tar.gz"))
unlink(file.path(file_path, "post_pl.dbf"))
unlink(file.path(file_path, "post_pl.shp"))
unlink(file.path(file_path, "post_pl.shx"))

# unlink(file.path(file_path, "sales-and-revenue-by-zipcodes.pdf"))

What do I need to do to color the amount of turnover or the frequencies of 
sales from the ds_temp dataset in the graph?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Toolbox (Release 2 of 2016-07-21)

2016-07-21 Thread G . Maubach
Hi All,

I have uploaded a new release of the R Toolbox.

R Toolbox is a collection of simple but useful functions which I developed 
for myself to shorten the develoment process. Currently all functions use 
base R. No other packages are needed. One exception is "t_openxlsx" cause 
this module deals explicitly with the openxlsx package.

It is simple to install the functions. Just copy them to an appropriety 
place on your hard disk and adjust the variable "t_toolbox_location" to 
the place you stored the toolbox in. Running "r_toolbox.R" from that 
location will load all modules.

In addition to new functions (see Release Comparison below) some functions 
were improved. The are called with their package names, e. g. 
openxlsx::read.xlsx() instead of "read.xlsx()". This way confusion with 
functions having the same name but comming from other packages is avoided.

Pleae be aware that I have include some not tested function in this 
release. All modules have a variable "t_status" now, stating the 
development status, e. g. "development", "testing", "release". 

Here is a Releae Comparison:

-- cut --

release_comparison <-
   structure(list(Module = c("r_toolbox.R", "t_adjust_packages.R", 
  "t_conventions.r", 
"t_create_variable.R", "t_definitions.R", 
  "t_find_originals_and_duplicates.R", 
"t_get_factor_levels.R", 
  "t_merge_variables.R", "t_n_miss.R", 
"t_n_valid.R", "t_openxlsx_shortcuts.r", 
  "t_rename_variables.R", 
"t_replace_na.R", "t_report_memory.R", 
  "t_select_vars_by_type.R"), Release1 = 
c(TRUE, FALSE, FALSE, 
 FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, 
 FALSE, FALSE), Release2 = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
 TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE)), .Names = c("Module", 
   "Release1", "Release2"), row.names = c(NA, 15L), 
class = "data.frame")
edit(release_comparison)

-- cut ---

Release 1 is of 2016-05-31, Releae 2 of 2016-06-21.

You can download the toolbox from

https://sourceforge.net/projects/r-project-utilities/

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error when installing packages

2016-07-26 Thread G . Maubach
Hi All,

I try to install packages on Debian GNU Linux 8 (Kernel 3.16.0-4-amd64).

My sessionInfo() is

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=de_DE.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=de_DE.UTF-8LC_COLLATE=de_DE.UTF-8
 [5] LC_MONETARY=de_DE.UTF-8LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.3.1

Installing the following packages

Warning in install.packages :
  packages ‘excel.link’, ‘installr’ are not available (for R version 3.3.1)
Warning in install.packages :
  dependencies ‘latticist’, ‘graph’, ‘RBGL’, ‘pkgDepTools’, ‘Rgraphviz’ are not 
available
also installing the dependencies ‘RCurl’, ‘RWekajars’

results in the following messages:

(1)
* installing *source* package ‘RCurl’ ...
checking for curl-config... no
Cannot find curl-config

(2)
* installing *source* package ‘RWekajars’ ...
./configure: 1: ./configure: /usr/lib/jvm/default-java/jre/bin/java: not found
./configure: 50: test: -ge: unexpected operator
./configure: 51: test: -eq: unexpected operator
Need at least Java version 1.6/6.0.
ERROR: configuration failed for package ‘RWekajars’

Annotation: I have openjdk-8-jre installed.

(3)
* installing *source* package ‘cairoDevice’ ...
ERROR: gtk+2. not found by pkg-config.
ERROR: configuration failed for package ‘cairoDevice’

(4)
* installing *source* package ‘rgdal’ ...
configure: CC: gcc -std=gnu99
configure: CXX: g++
configure: rgdal: 1.1-10
checking for /usr/bin/svnversion... no
configure: svn revision: 622
checking for gdal-config... no
no
configure: error: gdal-config not found or not executable.
ERROR: configuration failed for package ‘rgdal’

(5)
* installing *source* package ‘rgeos’ ...
configure: CC: gcc -std=gnu99
configure: CXX: g++
configure: rgeos: 0.3-19
checking for /usr/bin/svnversion... no
configure: svn revision: 524
checking for geos-config... no
no
configure: error: geos-config not found or not executable.
ERROR: configuration failed for package ‘rgeos’

... and much more.

Do all these error messages have something in common?

How could I fix the installation?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Spread data.frame on 2 variables

2016-07-28 Thread G . Maubach
Hi All,

I need to spread a data.frame on 2 variables, e. g. "channel" and "unit".

If I do it in two steps spreads keeps all cases that does not look like 
the one before although it contains the same values for a specific case.

Here is what I have right now:

-- cut --

test1$dummy <- 1
test2 <- spread(data = test1, key = 'channel', value = "dummy")
test2
cat("First spread is OK!")

test2$dummy <- 1
test3 <- spread(data = test2, key = 'unit', value = 'dummy')

test1
# test2
test3
warning(paste0("Second spread is not OK cause spread does not merge 
cases\n",
   "with CustID 700 and 800 into one case,\n",
   "cause they have values on different variables,\n",
   "although the corresponding values of the cases with",
   "custID 700 and 800 are missing."))

cat("What I would like to have is:\n")
target4 <- structure(list(custID = c(100, 200, 300, 500, 600, 700, 800, 
900),
  `10` = c(1, NA, NA, NA, NA, NA, NA, NA),
  `20` = c(1, NA, NA, NA, NA, NA, NA, NA), 
  `30` = c(NA, NA, NA, NA, NA, NA, 1, 1),
  `40` = c(NA, NA, NA, NA, 1, NA, 1, 1),
  `50` = c(NA, NA, 1, NA, NA, NA, 1, 1), 
  `60` = c(NA, NA, NA, NA, NA, 1, NA, NA),
  `70` = c(NA, NA, NA, NA, NA, 1, NA, NA), 
  `99` = c(NA, 1, NA, 1, NA, NA, NA, NA), 
  `1000` = c(1, NA, NA, NA, NA, NA, 1, 1), 
  `2000` = c(NA, NA, NA, NA, 1, 1, 1, NA),
  `3000` = c(NA, NA, 1, NA, NA, 1, NA, NA),
  `4000` = c(NA, NA, 1, NA, NA, NA, NA, NA),
  `6000` = c(NA, NA, NA, NA, 1, NA, NA, NA),
  `` = c(NA, 1, NA, 1, NA, NA, NA, NA)),
.Names = c("custID",
 "10",  "20",  "30",  "40",  "50",  "60",  "70",  "99", 
 "1000",  "2000",  "3000",  "4000",  "6000",  ""),
row.names = c(NA, 8L), class = "data.frame")

target4

cat("What would be a proper way to create target4 from test1?")

-- cut --

What would be the proper way to create target4 from test1?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Re: Spread data.frame on 2 variables (SOLVED)

2016-08-02 Thread G . Maubach
Hi Ulrik,

many thanks for your help.

The problem was that R regards a dataset with a combination like

caseID  custID  channel unit
1   100010  10
2   100020  10
3   100020  30

as two diffrenet sets of cases: 1 set = case 1, 2 set = case 2 and 3 due 
to the different values of unit in case 3 value 30, althought all cases 
should be restructured based just on custID.

To get a dataset like

caseID  custID  channel -10 channel-20  unit-10 
unit-30
1   10001   1   1 1

instead of

caseID  custID  channel -10 channel-20  unit-10 
unit-30
1   10001   1   1 NA
2   1000NA  1   NA 1

I used the approach you suggested:

1. I created a subset of my data with the first variable to be 
restructured:

d_temp1 <- dataset[ , c("custID", "channel"))

2. I deleted all the cases the were dupliates

d_temp1 <- duplicated(d_temp1, c("custID", "channel")

3. I introduced a dummy variable delivering the values for the new 
variables created by dplyr:spread()

d_temp1$dummy <- 1 

4. Then I restructured the subset
d_temp1 <- dplyr::spread(d_temp1, key_variable = "channel", value = 
d_temp1$dummy)

5. I repeaed steps 1 to 4 with the other variable "unit" (instead of 
"channel") creating a new dataset named d_temp2.

6. I deleted the variables used for restructuring in steps 1 to 5 
"channel" and "unit" from the original dataset "dataset".

dataset$channel <- NULL
dataset$unit <- NULL

7. I checked if I still had duplicates

duplicates <- duplicated(dataset, key_variable = c("Debitor"))

sum(duplicates)  # was 0 it this time

8. I merged the datasets back together

dataset_2 <- merge(x = dataset, y = d_temp1, by.x = "Debitor", by.y = 
"Debitor", all.x = TRUE, all.y = TRUE)  # leaving out all.y would be fine
dataset_2 <- merge(x = dataset2, y = d_temp2, by.x = "Debitor", by.y = 
"Debitor", all.x = TRUE, all.y = TRUE)  # leaving out all.y would be fine

There might be a combination of commands and functions doing the same 
thing in one step but I find that this is clear, comprehensible and 
reproducable even at a later date or by other readers willing to use base 
R for their work.

Many thanks again for your help.

Kind regards

Georg





Von:Ulrik Stervbo 
An: g.maub...@weinwolf.de, R-help , 
Datum:  28.07.2016 14:20
Betreff:Re: Re: [R] Spread data.frame on 2 variables



Hi Georg,

it is difficult to figure out what happens between your expectation and 
the outcome if we cannot see a minimal dataset.

Based on your description I did this

library(tidyr)
library(dplyr)

test_df <- data_frame(channel = LETTERS[1:5], unit = letters[1:5], custID 
= c(1:5), dummy = 1)
test_df %>% spread(channel, dummy) %>% mutate(dummy = 1) %>% spread(unit, 
dummy) 

which seems to be working fine as I get wide data. If a combination is 
missing in the long form it will also be missing in the wide form. Maybe 
you are looking for something like this:

channel_wide <- test_df  %>% select(channel, custID) %>% spread(channel, 
custID) 
unit_wide <- test_df  %>% select(unit, custID) %>% spread(unit, custID) 
bind_cols(channel_wide, unit_wide)

Apologies for the HTML - it's gmail

Best wishes,
Ulrik

On Thu, 28 Jul 2016 at 13:54  wrote:
Hi Ulrik,

I have included a reproducable example. I ran the code and it did exactly
what I wanted to show you.

You are right: the solution shall merge cases in the end cause the values
on the variables are either missing or the same.

Example 1: Values are the same
If you look at 6 and 7 and variable 70 the value is 1 in both cases. This
is in this context the same information and cases 6 and 7 with custID can
be merged to 1 for variable 70.

Example 2: Values are missing and not missing
If you look at cases 8 and 9 the value for case 8 at variable 40, 50 and
2000 is missing whereas the variables 40, 50 and 2000 have all 1 for case
9. Case 8 and 9 could be merged together cause the missing values are
overwritten what is correct in this case.

The solution I am looking for is to transform the data from long into wide
form and keep all but missing value information.

Did I explain my problem in a comprehensible way? Are there any further
questions?

Kind regards

Georg





Von:Ulrik Stervbo 
An: g.maub...@weinwolf.de, r-help@r-project.org,
Datum:  28.07.2016 12:59
Betreff:Re: [R] Spread data.frame on 2 variables



Hi Georg,

it's hard to tell without a reproducible example.

Should spread really merge elements? Does spread know anything about
CustID? Maybe you need to make a useful key of the CustIDs first and
spread on that?

Maybe I'm all off, because I'm really just guessing.

Best,
Ulrik

On Thu, 28 Jul 2016 at 12:36  wrote:
Hi All,

I need to spread a data.frame o

[R] Accessing an object using a string

2016-08-15 Thread G . Maubach
Hi All,

I would like to access an object using a sting.

# Create example dataset
var1 <- c(1, 2, 3)
var2 <- c(4, 5, 6)
data1 <- data.frame(var1, var2)

var3 <- c(7, 8, 9)
var4 <- c(10, 11, 12)
data2 <- data.frame(var3, var4)

save(file = "c:/temp/test.RData", list = c("data1", "data2"))

# Define function
t_load_dataset <- function(file_path,
   file_name) {
  file_location <- file.path(file_path, file_name)
 
  print(paste0('Loading ', file_location, " ..."))
  cat("\n")
 
  object_list <- load(file = file_location,
  envir = .GlobalEnv)
 
  print(paste(length(object_list), "dataset(s) loaded from", 
file_location))
  cat("\n")
 
  print("The following objects were loaded:")
  print(object_list)
  cat("\n")
 
  for (i in object_list) {
print(paste0("Object '", i, "' in '", file_name, "' contains:"))
str(i)
names(i)  # does not work
  }
}

I have only the character vector object_list containing the names of the 
objects as strings. I would like to access the objects in object_list to 
be able to print the names of the variables within the object (usuallly a 
data frame).

Is it possible to do this? How is it done?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Accessing an object using a string (SOLVED)

2016-08-15 Thread G . Maubach
Hi All,

I found the function get() which returns an object.

My whole function looks like this:

-- cut --

#---
# Module: t_load_dataset.R
# Author: Georg Maubach
# Date  : 2016-08-15
# Update: 2016-08-15
# Description   : Load dataset and print information on contents
# Source System : R 3.3.0 (64 Bit)
# Target System : R 3.3.0 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7-8

t_module_name = "t_load_dataset"
t_version = "2016-08-15"
t_status = "released"

cat(
  paste0("\n",
 t_module_name, " (Version: ", t_version, ", Status: ", t_status, 
")", "\n", "\n",
 "Copyright (C) Georg Maubach 2016

This software comes with ABSOLUTELY NO WARRANTY.", "\n", "\n"))

# If do_test is not defined globally define it here locally by 
un-commenting it
# Switch t_do_test to TRUE to run test
t_do_test <- FALSE

# [ Function Defintion 
]
t_load_dataset <- function(file_path,
   file_name) {
  # Loads and RData file with all objects in it and prints information on 
its
  # contents
  #
  # Args:
  #  file_path (string):
  #String with path name.
  #  file_name (string):
  #String with file name.
  #
  # Operation:
  #   Loads the RData file with all its objects, stores the objects in the
  #   global environment .GlobalEnv and prints information about the 
objects.
  #
  # Usage:
  #   The function is designed to work only on data frames.
  #
  # Returns:
  #   Nothing, but stores loaded objects directly into the global 
environment.
  #
  # Error handling:
  #   None.
 
#-
 
  cat("--- [ t_load_dataset() ] 
--\n\n")
 
  file_location <- file.path(file_path, file_name)
 
  cat(paste0('Loading ', file_location, " ...\n\n"))
 
  dataset_list <- load(file = file_location,
   envir = .GlobalEnv)
 
  cat(paste0(
length(dataset_list),
" dataset(s) loaded:\n"))
  cat(dataset_list)
  cat("\n\n")

  for (dataset in dataset_list) {
cat(paste0("Dataset '", dataset, "' contains ",
nrow(get(dataset, envir = .GlobalEnv)),
" cases in ",
ncol(get(dataset, envir = .GlobalEnv)),
" variables:\n"))
cat(names(get(dataset, envir = .GlobalEnv)))
cat("\n\n")
  }
 
  cat("-- [ Done ] 
---\n\n")
}

# [ Test Defintion 
]
t_test <- function(do_test = FALSE) {
  if (do_test == TRUE) {
 
# Example dataset
var1 <- c(1, 2, 3)
var2 <- c(4, 5, 6)
d_data1 <- data.frame(var1, var2)
 
var3 <- c(7, 8, 9)
var4 <- c(10, 11, 12)
d_data2 <- data.frame(var3, var4)
 
# Save datasets
v_file_name <- "test_t_load_dataset.RData"
 
save(file = file.path(getwd(),
  v_file_name),
 list = c("d_data1", "d_data2"))
 
# Call function
t_load_dataset(file_path = getwd(), file_name = v_file_name)
 
# Cleanup
unlink(file.path(getwd(), v_file_name))
  }
}

# [ Test Run 
]--
t_test(do_test = t_do_test)

# [ Clean up 
]--
rm("t_module_name", "t_version", "t_status", "t_do_test", "t_test")

# EOF

-- cut --

I will include it later the toolbox of R function on Sourceforge.net.

Kind regards

Georg




Von:g.maub...@weinwolf.de
An: r-help@r-project.org, 
Datum:  15.08.2016 10:51
Betreff:[R] Accessing an object using a string
Gesendet von:   "R-help" 



Hi All,

I would like to access an object using a sting.

# Create example dataset
var1 <- c(1, 2, 3)
var2 <- c(4, 5, 6)
data1 <- data.frame(var1, var2)

var3 <- c(7, 8, 9)
var4 <- c(10, 11, 12)
data2 <- data.frame(var3, var4)

save(file = "c:/temp/test.RData", list = c("data1", "data2"))

# Define function
t_load_dataset <- function(file_path,
   file_name) {
  file_location <- file.path(file_path, file_name)
 
  print(paste0('Loading ', file_location, " ..."))
  cat("\n")
 
  object_list <- load(file = file_location,
  envir = .GlobalEnv)
 
  print(paste(length(object_list), "dataset(s) loaded from", 
file_location))
  cat("\n")
 
  print("The following objects were loaded:")
  print(object_list)
  cat("\n")
 
  for (i in object_list) {
print(paste0("Object '", i, "' in '", file_name, "' contains:"))
str(i)
names(i)  # does not work
  }
}

I have only the character vector object_list containin

[R] Antwort: Re: Accessing an object using a string

2016-08-15 Thread G . Maubach
Hi Greg 
and all others who replied to my question,

many thanks for all your answers and help. Currently I store all my 
objects in .GlobalEnv = Workspace. I am not yet familiar working with 
different environments nor did I see that this would be necessary for my 
analysis.

Could you explain why working with different environments would be 
helpful?

You suggested to read variables into lists rather than storing them in 
global variables. This sounds interesting. Could you provide an example of 
how to define and use this?

Kind regards

Georg



Von:Greg Snow <538...@gmail.com>
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  15.08.2016 20:33
Betreff:Re: [R] Accessing an object using a string



The names function is a primitive, which means that if it does not
already do what you want, it is generally not going to be easy to
coerce it to do it.

However, the names of an object are generally stored as an attribute
of that object, which can be accessed using the attr or attributes
functions.  If you change your code to not use the names function and
instead use attr or attributes to access the names then it should work
for you.


You may also want to consider changing your workflow to have your data
objects read into a list rather than global variables, then process
using lapply/sapply (this would require a change in how your data is
saved from your example, but if you can change that then everything
after can be cleaner/simpler/easier/more fool proof/etc.)


On Mon, Aug 15, 2016 at 2:49 AM,   wrote:
> Hi All,
>
> I would like to access an object using a sting.
>
> # Create example dataset
> var1 <- c(1, 2, 3)
> var2 <- c(4, 5, 6)
> data1 <- data.frame(var1, var2)
>
> var3 <- c(7, 8, 9)
> var4 <- c(10, 11, 12)
> data2 <- data.frame(var3, var4)
>
> save(file = "c:/temp/test.RData", list = c("data1", "data2"))
>
> # Define function
> t_load_dataset <- function(file_path,
>file_name) {
>   file_location <- file.path(file_path, file_name)
>
>   print(paste0('Loading ', file_location, " ..."))
>   cat("\n")
>
>   object_list <- load(file = file_location,
>   envir = .GlobalEnv)
>
>   print(paste(length(object_list), "dataset(s) loaded from",
> file_location))
>   cat("\n")
>
>   print("The following objects were loaded:")
>   print(object_list)
>   cat("\n")
>
>   for (i in object_list) {
> print(paste0("Object '", i, "' in '", file_name, "' contains:"))
> str(i)
> names(i)  # does not work
>   }
> }
>
> I have only the character vector object_list containing the names of the
> objects as strings. I would like to access the objects in object_list to
> be able to print the names of the variables within the object (usuallly 
a
> data frame).
>
> Is it possible to do this? How is it done?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Installation of rJava fails

2016-08-17 Thread G . Maubach
Hi All,

I try to install RWeka on Debian GNU Linux 8 Jessie (uname -a: 3.16.0-4-amd64 
#1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64) which has a dependency 
to "rJava".
I did

apt-get install openjdk-8-jre

which went OK.

Java is installed in:

/var/lib/dpkg/alternatives/java
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
/usr/lib/jvm/java-8-openjdk-amd64/bin/java
/etc/alternatives/java

When doing this

install.packages("rJava")

I get

* installing *source* package ‘rJava’ ...
** Paket ‘rJava’ erfolgreich entpackt und MD5 Summen überprüft

interpreter : '/usr/lib/jvm/default-java/jre/bin/java'
archiver: '/usr/lib/jvm/default-java/bin/jar'
compiler: '/usr/lib/jvm/default-java/bin/javac'
header prep.: '/usr/lib/jvm/default-java/bin/javah'
cpp flags   : '-I/usr/lib/jvm/default-java/include'
java libs   : '-L/usr/lib/jvm/default-java/jre/lib/amd64/server -ljvm'
checking whether Java run-time works... 
./configure: line 3736: /usr/lib/jvm/default-java/jre/bin/java: No such file or 
directory
no
configure: error: Java interpreter '/usr/lib/jvm/default-java/jre/bin/java' 
does not work
ERROR: configuration failed for package ‘rJava’
* removing ‘/usr/local/lib/R/site-library/rJava’
Warning in install.packages :
  installation of package ‘rJava’ had non-zero exit status

Do I need to use another Java version or installation? How do I tell 
install.packages() where my Java installation resides?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Iteration over variables

2016-09-06 Thread G . Maubach
Hi All,

I would like to write a program that iterates over a set of dynamically 
generated variables and produces some stats or prints parts of the data.

# --- data
v_turnover_2011 <- c(10, 20, 30, 40 , 50)
v_customer_2011 <- c(0, 1, NA, 0, 1)
v_turnover_2012 <- c(10, 20, 30, 40 , 50)
v_customer_2012 <- c(0, 1, NA, 0, 1)
d_dataset <- data.frame(v_turnover_2011, v_turnover_2012,
v_customer_2011, v_customer_2012)

# -- Aim is to iterate over dynamically generated variables and compute
# -- statistics or print parts of the data

# -- Does not produce any output
for (year in 2011:2012) {
  head(d_dataset[, c(paste0("v_turnover_", year),
 paste0("v_customer_", year))])
}

# -- Does not produce any output
aux_func <- function(year) {
  head(d_dataset[, c(paste0("v_turnover_", year),
 paste0("v_customer_", year))])
}

for (year in 2011:2012) {
  aux_func(year = year)
}


d_results <- data.frame()
for (year in 2011:2012) {
  d_results <- rbind(d_results,
 paste0("mean", year) = mean(d_dataset[, 
c(paste0("v_turnover_", year))]))
}

Is there a way to iterate over variables and compute statistics and print 
parts of the dataset?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Putting a bunch of Excel files as data.frames into a list fails

2016-09-28 Thread G . Maubach
Hi All,

I need to read a bunch of Excel files and store them in R.

I decided to store the different Excel files in data.frames in a named 
list where the names are the file names of each file (and that is 
different from the sources as far as I can see):

-- cut --
# Sources:
# - 
http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r
# - 
http://stackoverflow.com/questions/9564489/opening-all-files-in-a-folder-and-applying-a-function
# - 
http://stackoverflow.com/questions/12945687/how-to-read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frame-e

v_file_path <- "H:/2016/Analysen/Neukunden/Input"
v_file_pattern <- "*.xlsx"

v_files <- list.files(path = v_file_path,
  pattern = v_file_pattern,
  ignore.case = TRUE)
print(v_files)

v_list_of_files <- list()

for (v_file in v_files) {
  v_list_of_files[v_file] <- openxlsx::read.xlsx(
file.path(v_file_path,
  v_file))
}

This code does not work cause it stores only the first variable of each 
Excel file in a named list.

What do I need to change to get it running?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Filtering Cases with != NA

2015-12-17 Thread G . Maubach
Dear All,

I am new to "R" and search for a solution to exclude cases if a certain 
variable contains NA for a case.

Example

No Name Turnover
1 Smith 1500
2 Mayor 200
3 Miller 
4 Batic 750

I would like to create a subset excluding case 3 Miller NA.

I tried to following:

new_dataset <- subset(dataset, subset = Turnover != NA)

This does not work. The new_dataset contains all variables but not cases 
are left. R responds "Variables with all observations missing".

How could I do it right?

Kind regards

Georg


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging Data Sets with Full Outer Join

2016-04-20 Thread G . Maubach
Hi All,

I would like to match some datasets. Both deliver variables AND cases 
which might or might not be present in all datasets:

This sequence

Kunden <- Kunden_2011 
Kunden <- merge(Kunden, Kunden_2012,
by.x = "Debitor", by.y = "Debitor")

Kunden <- merge(Kunden, Kunden_2013,
by.x = "Debitor", by.y = "Debitor")

Kunden <- merge(Kunden, Kunden_2014,
by.x = "Debitor", by.y = "Debitor")

Kunden <- merge(Kunden, Kunden_2015,
by.x = "Debitor", by.y = "Debitor")

delivers too few cases. So I guess it does an equi-join.

How can I join the datasets and keep the variables as well as the cases?

I am looking forward to your reply.

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating variables on the fly

2016-04-22 Thread G . Maubach
Hi all,

I would like to use a loop for tasks that occurs repeatedly:

# Groups 
# Umsatz <= 0: 1 (NICHT kaufend) 
# Umsatz > 0: 2  (kaufend) 
for (year in c("2011", "2012", "2013", "2014", "2015")) { 
  paste0("Kunden$Kunde_real_", year) <- (paste0("Kunden$Umsatz_", year) <= 0) * 
1 + 
(paste0("Kunden$Umsatz_", year) >  0) * 
2 
  paste0("Kunden$Kunde_real_", year) <- factor(paste0("Kunden$Umsatz_", year), 
   levels = c(1, 2), 
   labels = c("NICHT kaufend", 
"kaufend")) 
  } 

This actually does not work due to the fact that the expression 
"paste0("Kunden$Kunde_real_", year)" ist not interpreted as a variable name by 
the R script language interpreter.

Is there a way to assembly variable names on the fly in R?

Regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Fw: Re: Creating variables on the fly (SOLVED)

2016-04-25 Thread G . Maubach
Hi Don,
Hi to all readers,

many thanks for all your answers and all your help.

I adapted Don's code to my data and Don's code does the trick:

str(Kunden01)

for (year in 2011:2015) {
  Reeller_Kunde <- paste0("Reeller_Kunde_", year)
  Umsatz <- paste0("Umsatz_", year)
  cat('Creating', Reeller_Kunde,'from', Umsatz,'\n')
  Kunden01[[ Reeller_Kunde ]] <- ifelse( Kunden01[[ Umsatz ]] >= 0, 1, 2)
  Kunden01[[ Reeller_Kunde ]] <- factor( Kunden01[[ Reeller_Kunde ]],
 levels=c(1,2),
 labels= c("NICHT kaufend", 
"kaufend")
  )
}

str(Kunden01)

This way a new variable is created by building it from a string 
concatenation.

I also like the cat() function to document the process within the loop 
while running the program.

Many thanks for your help.

Kind regards

Georg




Von:g.maub...@gmx.de
An: g.maub...@weinwolf.de, 
Datum:  25.04.2016 21:37
Betreff:Fw: Re: [R] Creating variables on the fly





> Gesendet: Montag, 25. April 2016 um 19:35 Uhr
> Von: "MacQueen, Don" 
> An: "g.maub...@gmx.de" , "r-help@r-project.org" 

> Betreff: Re: [R] Creating variables on the fly
>
> I'm going to assume that Kunden is a data frame, and it has columns
> (variables) with names like
>   Umstatz_2011
> and that you want to create new columns with names like
>   Kunde_real_2011
> 
> If that is so, then try this (not tested):
> 
> for (year in 2011:2015) {
>   nmK <- paste0("Kunde_real_", year)
>   nmU <- paste0("Umsatz_", year)
>   cat('Creating',nmK,'from',nmU,'\n')
>   Kunden[[ nmK ]] <- ifelse( Kunden[[ nmU ]] <= 0, 1, 2)
>   Kunden[[ nmK ]] <- factor( Kunden[[ nmK ]],
>levels=c(1,2),
>labels= c("NICHT kaufend", "kaufend")
>)
> 
> }
> 
> This little example should illustrate the method:
> 
> 
> > foo <- data.frame(a=1:4)
> > foo
>   a
> 1 1
> 2 2
> 3 3
> 4 4
> > foo[['b']] <- foo[['a']]*3
> > foo
>   a  b
> 1 1  3
> 2 2  6
> 3 3  9
> 4 4 12
> 
> 
> 
> -- 
> Don MacQueen
> 
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
> 
> 
> 
> 
> 
> On 4/22/16, 8:52 AM, "R-help on behalf of g.maub...@gmx.de"
>  wrote:
> 
> >Hi all,
> >
> >I would like to use a loop for tasks that occurs repeatedly:
> >
> ># Groups 
> ># Umsatz <= 0: 1 (NICHT kaufend)
> ># Umsatz > 0: 2  (kaufend)
> >for (year in c("2011", "2012", "2013", "2014", "2015")) {
> >  paste0("Kunden$Kunde_real_", year) <- (paste0("Kunden$Umsatz_", year)
> ><= 0) * 1 + 
> >(paste0("Kunden$Umsatz_", year) 
>
> > 0) * 2 
> >  paste0("Kunden$Kunde_real_", year) <- factor(paste0("Kunden$Umsatz_",
> >year), 
> >   levels = c(1, 2),
> >   labels = c("NICHT
> >kaufend", "kaufend"))
> >  } 
> >
> >This actually does not work due to the fact that the expression
> >"paste0("Kunden$Kunde_real_", year)" ist not interpreted as a variable
> >name by the R script language interpreter.
> >
> >Is there a way to assembly variable names on the fly in R?
> >
> >Regards
> >
> >Georg
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> 
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Missing Values in Logical Expressions

2016-04-26 Thread G . Maubach
Hi All,

I need to evaluate missing values in my data. I am able to filter these 
values and do simple statistics on it. But I do need new variables based 
on variables with missing values in my dataset:

Check_Kunde_2011 <- ifelse(is.na(Umsatz_2011) == TRUE & Kunde_2011 == 1, 
1, 0)
Check_Kunde_2011 <- factor(Check_Kunde_2011, levels = c(1,0), labels = 
c("Check", "OK"))

The new variable is not correctly created. It contains no values:

table(Check_Kunde_2011)
< table of extent 0 >

I searched the web but could not find a solution.

How can I work with variables and missing values in logical expressions?

Where could I find something about this?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: RE: Missing Values in Logical Expressions

2016-04-27 Thread G . Maubach
Hi Petr,
Hi Jim,

many thanks for your help. Today I constructed a sample dataset and tested 
your suggestions. Everything worked OK.

Then I took the code and testet on the original data. And - it worked OK 
this morning also.

I went back to my script of Thuesday and ran it again. OK. Then I used my 
script of Monday and ran it.. OK.

I have no idea what was wrong yesterday. To see that there is a problem 
and not being able to replicate it a day later even if it did not work all 
day before is very strange.

If the problem arises again, I will raise my hand.

Many thanks again for your help.

Kind regards

Georg




Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
"r-help@r-project.org" , 
Datum:  26.04.2016 11:11
Betreff:RE: [R] Missing Values in Logical Expressions



Hm

Based on Jim's data your construction gives me correct result.

> Umsatz_2011<-c(1,2,3,4,5,NA,7,8,NA,10)
> Kunde_2011<-rep(0:1,5)
> Check_Kunde_2011 <- ifelse(is.na(Umsatz_2011) == TRUE & Kunde_2011 == 1, 
1, 0)
> Check_Kunde_2011 <- factor(Check_Kunde_2011, levels = c(1,0), labels = 
c("Check", "OK"))
> table(Check_Kunde_2011)
Check_Kunde_2011
CheckOK
1 9

So I presume that the problem lies in your data.
You should provide some sample of your data either by posting result of

str(yourdata)
or
dput(head(yourdata))

if you want some advice why with correct code you did not get appropriate 
result.

Instead ifelse you can also use

Check_Kunde_2011 <- (is.na(Umsatz_2011)&(Kunde_2011==1))*1

to get desired 0/1 vector.

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Tuesday, April 26, 2016 10:10 AM
> To: r-help@r-project.org
> Subject: [R] Missing Values in Logical Expressions
>
> Hi All,
>
> I need to evaluate missing values in my data. I am able to filter these 
values
> and do simple statistics on it. But I do need new variables based on 
variables
> with missing values in my dataset:
>
> Check_Kunde_2011 <- ifelse(is.na(Umsatz_2011) == TRUE & Kunde_2011 ==
> 1, 1, 0)
> Check_Kunde_2011 <- factor(Check_Kunde_2011, levels = c(1,0), labels =
> c("Check", "OK"))
>
> The new variable is not correctly created. It contains no values:
>
> table(Check_Kunde_2011)
> < table of extent 0 >
>
> I searched the web but could not find a solution.
>
> How can I work with variables and missing values in logical expressions?
>
> Where could I find something about this?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě 
neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho 
kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi 
či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření 
smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve 
výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za 
společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně 
zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly 
adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, 
předloženy nebo jejich existence je adresátovi či osobě jím zastoupené 
známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its 
sender. Delete the contents of this e-mail with all attachments and its 
copies from your system.
If you are not the intended recipient of this e-mail, you are not 
authorized to use, disseminate, copy or disclose this e-mail in any 
manner.
The sender of this e-mail shall not be liable for any possible damage 
caused by modifications of the e-mail or by delay with transfer of the 
email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.

[R] R Script Template

2016-04-27 Thread G . Maubach
Hi All,
 
I am addressing this post to all who are new to R.

When learing R in the last weeks I took some notes for myself to have code 
snippets ready for the data analysis process. I put these snippets 
together as a script template for future use. Almost all of the given command 
prototypes are tested. The template script contains snippets for best practices 
and leaves out the commands that should not be used. Relying on the given 
snippets shall lead to high quality code.

The code is based on examples from the ressources given in the template. I 
highly recommend to read the books or take the online courses to see how 
everything works and fits together.

Despite putting everything together with care, the script is provided as-is 
with no warrenty or liability whatsoever.

Please address any remarks or suggestions for improvement to the R-Help mailing 
list.

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interdependencies of variable types, logical expressions and NA

2016-04-28 Thread G . Maubach
Hi All,

my script tries to do the following on factors:

> ## Check for case 3: Umsatz = 0 & Kunde = 1
> for (year in 2011:2015) {
+   Umsatz <- paste0("Umsatz_", year)
+   Kunde <- paste0("Kunde01_", year)
+   Check <- paste0("Check_U_0__Kd_1_", year)
+ 
+   cat('Creating', Check, 'from', Umsatz, "and", Kunde, '\n')
+ 
+   Kunden01[[ Check ]] <- ifelse(Kunden01[[ Umsatz ]] == 0 &
+ Kunden01[[ Kunde ]] == 1,
+ 1, 0
+ )
+   Kunden01[[ Check ]] <- factor(Kunden01[[ Check ]],
+ levels=c(1, 0),
+ labels= c("Check 0", "OK")
+ )
+ 
+ }
Creating Check_U_0__Kd_1_2011 from Umsatz_2011 and Kunde01_2011 
Creating Check_U_0__Kd_1_2012 from Umsatz_2012 and Kunde01_2012 
Creating Check_U_0__Kd_1_2013 from Umsatz_2013 and Kunde01_2013 
Creating Check_U_0__Kd_1_2014 from Umsatz_2014 and Kunde01_2014 
Creating Check_U_0__Kd_1_2015 from Umsatz_2015 and Kunde01_2015 
> 
> table(Kunden01$Check_U_0__Kd_1_2011, useNA = "ifany")

Check 0  OK 
  1  16  13 
> table(Kunden01$Check_U_0__Kd_1_2012, useNA = "ifany")

Check 0  OK 
  1  17  12 
> table(Kunden01$Check_U_0__Kd_1_2013, useNA = "ifany")

Check 0  OK 
  2  17  13 
> table(Kunden01$Check_U_0__Kd_1_2014, useNA = "ifany")

Check 0  OK 
  1  15  14 
> table(Kunden01$Check_U_0__Kd_1_2015, useNA = "ifany")

Check 0  OK 
  2  15  13 
> 
> Kunden01$Check_U_0__Kd_1_all <- ifelse(Kunden01$Check_U_0__Kd_1_2011 == 
1 |
+Kunden01$Check_U_0__Kd_1_2012 == 
1 |
+Kunden01$Check_U_0__Kd_1_2013 == 
1 |
+Kunden01$Check_U_0__Kd_1_2014 == 
1 |
+Kunden01$Check_U_0__Kd_1_2015 == 
1,
+1, 0)
> 
> table(Kunden01$Check_U_0__Kd_1_all, useNA = "ifany")

0   
723 

(Ann.: I made the values up. But the relations equal real world data.)

I had expected to get back a factor or at least a numeric variable 
containing 0, 1 and NA, instead 1 is not included.

I searched the web for information on the treatment of logical expressions 
when the data contains NA. I found:

1. 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html
Examples
# Some logical operations do not return NA
c(TRUE, FALSE) & NA
c(TRUE, FALSE) | NA

2.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html
NA is a valid logical object. Where a component of x or y is NA, the 
result will be NA if the outcome is ambiguous. In other words NA & TRUE 
evaluates to NA, but NA & FALSE evaluates to FALSE. See the examples 
below. 

## construct truth tables :
x <- c(NA, FALSE, TRUE)
names(x) <- as.character(x)
outer(x, x, "&") ## AND table
outer(x, x, "|") ## OR  table
Ann. Not very useful. How should it be read?

3.
http://www.ats.ucla.edu/stat/r/faq/missing.htm
Good explanation for NA in general and in analysis, but no information 
about NA in logical expressions.

Then I made some tests with different data types and variables with NA:

-- cut --

# 2016-04-27-001_truth_table_for_logicals_and_NA.R

# Test 1
var2 <- c(TRUE, FALSE)
var3 <- c(NA, NA)
var1 <- c(1, 1)
ds <- data.frame(var1, var2, var3)
ds

ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE)
ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE)
ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE)

print(ds)
# Output
# var1  var2 var3 value_and_logical logical_and_na value_and_na
# 11  TRUE   NA  TRUE   TRUE TRUE
# 21 FALSE   NA  TRUE NA TRUE

# Test 2
ds$var1 <- factor(ds$var1, levels = c(0, 1), labels = c("NOT ok", "OK"))
ds$var2 <- factor(ds$var2, levels = c(0, 1), labels = c("NOT ok", "OK"))
ds$var3 <- factor(ds$var3, levels = c(0, 1), labels = c("NOT ok", "OK"))

ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE)
ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE)
ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE)

# Output (abbrev.)
# Warning message:
#  In Ops.factor(ds$var1, ds$var3) : ?|? ist nicht sinnvoll für Faktoren

print(ds)
# Output
# var1 var2 var3 value_and_logical logical_and_na value_and_na
# 1   OK  NA NA   NA
# 2   OK  NA NA   NA

-- cut --

I had expected to get the same result in Test 2 as in Test 1.

Where can I find information and documentation about NA handling in 
logical expressions on different variable types?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guid

[R] Antwort: Re: R Script Template

2016-04-28 Thread G . Maubach
Hi All,

please find enclosed the missing attachment.

Kind regards

Georg

-- cut --

#-[ Header ] 
--
# Program   : Framework for R scripts
# Author: Georg Maubach
# Date  : 2016-03-03
# Update: 2016-04-27
# Description   : Foundation for the analysis process
# Source System : R 3.2.5 (64 Bit)
# Target System : R 3.2.5 (64 Bit)
# Release   : 1
# License   : CC-BY-NC-SA
# File Name : 2016-04-27_Template_Scipt.R
#---

#- [ Purpose of the document ] 

# This document provides a framework for a script able to handle real 
world 
# data throughout the complete analysis process. In each step examples or 
# prototypes of needed or helpful commands are given. Chapters and 
sections in 
# this document can be regarded as a toolbox. The needed tools shall be 
adapted 
# to the processed data. Commands are ordered an a consistent way to 
support the
# user to produce high quality output.
#---

# - [ At hand ] 

# help("function")# Extract or Replace Parts of an Object
# example("function") # Examples on "Extract"
# demo(package = .packages(all.available = TRUE)) # Show demos of packages
#---

# - [ Editing Marks ] 
--
# %ROTA% : Result of the analysis in text form if needed to explain 
further
#  steps
# %ToDo% : ToDo's
#---

# - [ Warrenty Disclaimer ] 

# The software is provided "as-is". The author disclaims to the fullest 
extent
# authorized by law any and all warranties, whether express or implied,
# including, without limitation, any implied warranties of merchantability 
or
# fitness for a particular purpose. Without limitation of the foregoing, 
the
# author expressly does not warrant that:
#
# (a) the software will meet your requirements or expectations;
# (b) the software or the software content will be free of bugs, errors,
# viruses or other defects;
# (c) any results, output, or data provided through or generated by the 
software
# will be accurate, up-to-date, complete or reliable;
# (d) the software will be compatible with third party software;
# (e) any errors in the software will be corrected.
#---

# - [ Limitation of Liability ] 

# In no event will the author be liable for any direct, indirect, 
consequential,
# incidental, special, exemplary, or punitive damages or liabilities 
whatsoever
# arising from or relating to the software, the software content or this
# agreement, whether based on contract, tort (including negligence), 
strict
# liability or other theory, even if the author has been advised of the
# possibility of such damages.
#
# The use of the software goes to the whole risk of the user.
#---

#1-2-3-4-5-6-7-8

#---#
# Setup #
#---#
# Environment
# Please make sure that RTools is installed
Sys.getenv("R_ZIPCMD", "zip")
# needed for openxlsx::write.xlsx()
Sys.setenv(R_ZIPCMD= "C:/R-Project/Rtools/bin/zip")

.libPaths()  # Install directory for libraries
# .libPaths("new path if needed")

# Workplace
sessionInfo()# Environment
list.files(R.home()) # Show R home directory
getwd()  # Get working directory
list.dirs()  # List directories in working directory
list.files() # List files in working directory
library()# List all installed packages
search() # List all loaded packages
ls() # List objects in environment

#---#
# Configure #
#---#
path <- file.path("path", "to","directory")
setwd(path)  # Set working directory
options(width = 65)  # Set output width

#-#
# Install #
#-#
available.packages()

# Desired packages
my_packages <- c(
  "ctv"# Package to install packages based on themes
  "data.table",# Fast manipulation of large datasets
  "dplyr", # Data manipulation for data frames
  "geoR", 
  "haven", # import data from stastical packages
  "Hmisc", 
  "httr",  # package to deal with HTTP requests
  "installr",  # Dependency of openxlsx::write.xlsx()
  "lubridate", 
  "mapdata",   # data for high-quality maps
  "maps",  # draw maps
  "maptools",  # import ESRI data
  "memisc",# package data import 

[R] Antwort: Re: selecting columns from a data frame or data table by type, ie, numeric, integer

2016-05-03 Thread G . Maubach
Hi All,
Hi Carl,

I am not sure if this is useful to you, but I followed your conversation 
and thought of you when I read this:

for (i in 1:ncol(dataset)) {
  if(class(dataset) == "character|numeric|factor|or whatsoever") {
dataset[, i] <- as.factor(dataset[, i])
  }
}
Source: Zumel, Nina / Mount, John: Practical Data Science with R, Manning 
Publications: Shelter Island, 2014, Chapter 2: Loading data into R, p. 25

This way you can select variables of a certain class only and do 
transformations. I found that this approach is not applicable if used with 
statistical functions like head(). Transformations worked fine for me.

I found reading the above given source worthwile.

Kind regards

Georg

PS: I am not related to the above given authors. I am just a reader 
reporting on - at least to me - a valuable ressource.



Von:Carl Sutton via R-help 
An: William Dunlap , 
Kopie:  "r-help@r-project.org" 
Datum:  29.04.2016 22:08
Betreff:Re: [R] selecting columns from a data frame or data table 
by type, ie, numeric, integer
Gesendet von:   "R-help" 



Thank you Bill Dunlap.  So simple I never tried that approach. Tried 
dozens of others though, read manuals till I was getting headaches, and of 
course the answer was simple when one is competent.   Learning, its a 
struggle, but slowly getting there.
Thanks again
 Carl Sutton CPA
 

On Friday, April 29, 2016 10:50 AM, William Dunlap  
wrote:
 
 

 > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ]a   c1   1 1.12   2 
1.0...10 10 0.2


Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help 
 wrote:

Good morning RGuru's
I have a data frame of 575 columns.  I want to extract only those columns 
that are numeric(double) or integer to do some machine learning with.  I 
have searched the web for a couple of days (off and on) and have not found 
anything that shows how to do this.   Lots of ways to extract rows, but 
not columns.  I have attempted to use "(x == y)" indices extraction method 
but that threw error that == was for atomic vectors and lists, and I was 
doing this on a data frame.

My test code is below

#  a technique to get column classes
library(data.table)
a <- 1:10
b <- c("a","b","c","d","e","f","g","h","i","j")
c <- seq(1.1, .2, length = 10)
dt1 <- data.table(a,b,c)
str(dt1)
col.classes <- sapply(dt1, class)
head(col.classes)
dt2 <- subset(dt1, typeof = "double" | "numeric")
str(dt2)
dt2   #  not subset
dt2 <- dt1[, list(typeof = "double")]
str(dt2)
class_data <- dt1[,sapply(dt1,is.integer) | sapply(dt1, is.numeric)]
class_data
sum(class_data)
typeof(class_data)
names(class_data)
str(class_data)
 Any help is appreciated
Carl Sutton CPA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




 
 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Antwort: Re: selecting columns from a data frame or data table by type, ie, numeric, integer

2016-05-04 Thread G . Maubach
Hi Martin,

many thanks for your answer and your broad explanation. 

I am a newbie to "R" and got help on this list and thought I could give 
something back what looked OK to me.

regarding 0)
You're right, it's pseudo code. I assumed that anybody on the list would 
be able to adapt the code to their needs so that it worked. Next time I 
will post runnable code.

regarding 1)
Your right: "[, i]" is missing. My fault. Sorry.

regarding 3)
I got your point and will do better in the future.

One question: What books do you recommend to read to get to know "R" 
better?

Kind regards

Georg




Von:Martin Maechler 
An: , 
Kopie:  Carl Sutton , "r-help@r-project.org" 

Datum:  04.05.2016 09:05
Betreff:[R] Antwort: Re: selecting columns from a data frame or 
data table  by type, ie, numeric, integer



>   
> on Wed, 4 May 2016 08:30:50 +0200 writes:

> Hi All,
> Hi Carl,
> 
> I am not sure if this is useful to you, but I followed your conversation 

> and thought of you when I read this:
> 
> for (i in 1:ncol(dataset)) {
>   if(class(dataset) == "character|numeric|factor|or whatsoever") {
> dataset[, i] <- as.factor(dataset[, i])
>   }
> }

Ouch -- so many problems in such a short piece of R code !!!

> Source: Zumel, Nina / Mount, John: Practical Data Science with R, 
Manning 
> Publications: Shelter Island, 2014, Chapter 2: Loading data into R, p. 
25

Sorry, but after reading the above, I'd strongly recommend getting
better books about R...
   {{maybe do not take those containing "data science" ;-)}}

Compared to the nice and efficient solution of Bill Dunlap,
the above is really bad-bad-bad  in at least four ways :

0) They way you write it above, you cannot use it,
  == "variant1|variant2|..."
   is pseudocode and does not really work

1) Note the missing "[, i]"  in the 2nd line: It should be
 if(class(dataset[, i]) ...

2) A for loop changing each column at a time is really slow for
   largish data sets

3) [last but not at all least!]
   Please ... many of you readers, do learn:
 
 Using checks such as
   if ( class(x) == "numeric" )
 are (almost) always wrong by design !!!

 Instead you really should (almost) always use

  if(inherits(x, "numeric"))

Why?  Because classes in R (S3 or S4) can *extend* other classes.
Example: Many of you know that after   fm <- glm(...)
class(fm) is   c("glm", "lm")   and so

> if(class(fm) == "lm")
+ "yes"
Warning message:
In if (class(fm) == "lm") "yes" :
  the condition has length > 1 and only the first element will be used

Similarly, in your case

y <- 1:10
class(y) <- c("myNumber", "numeric")

when that 'y' is a column in your data frame,
the test for  if(class(dataset[,i]) == "numeric")  will *not*
work but actually produce the above warning.

However, one  could als have had

Num <- setClass("Num", contains="numeric")
N <- Num(1:10)

 > Num <- setClass("Num", contains="numeric")
 > N <- Num(1:10)
 > N
 An object of class "Num"
  [1]  1  2  3  4  5  6  7  8  9 10
 > if(class(N) == "numeric") "yes" else "no"
 [1] "no"
 > 

I hope that many of the readers --- including *MANY* authors of
R packages !! --- have understood the above and will fix their R
code -- and even more their books where applicable !!

Martin Maechler,
ETH Zurich & R Core Team 
 
> 


> This way you can select variables of a certain class only and do 
> transformations. I found that this approach is not applicable if used 
with 
> statistical functions like head(). Transformations worked fine for me.
> 
> I found reading the above given source worthwile.
> 
> Kind regards
> 
> Georg
> 
> PS: I am not related to the above given authors. I am just a reader 
> reporting on - at least to me - a valuable ressource.
> 
> 
> 
> Von:Carl Sutton via R-help 
> An: William Dunlap , 
> Kopie:  "r-help@r-project.org" 
> Datum:  29.04.2016 22:08
> Betreff:Re: [R] selecting columns from a data frame or data 
table 
> by type, ie, numeric, integer
> Gesendet von:   "R-help" 
> 
> 
> 
> Thank you Bill Dunlap.  So simple I never tried that approach. Tried 
> dozens of others though, read manuals till I was getting headaches, and 
of 
> course the answer was simple when one is competent.   Learning, its a 
> struggle, but slowly getting there.
> Thanks again
>  Carl Sutton CPA
> 
> 
> On Friday, April 29, 2016 10:50 AM, William Dunlap 
 
> wrote:
> 
> 
> 
>  > dt1[ vapply(dt1, FUN=is.numeric, FUN.VALUE=NA) ]a   c1   1 1.12 2 

> 1.0...10 10 0.2
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> On Fri, Apr 29, 2016 at 9:19 AM, Carl Sutton via R-help 
>  wrote:
> 
> Good morning RGuru's
> I have a data frame of 575 columns.  I want to extract only those 
columns 
> that are numeric(double) or integer to do some machine learning with.  I 

> have searched the web for a couple of days (off and on) and have not 
found 
> anything that shows how to do this.   Lots of ways

[R] sink(): Cannot open file

2016-05-10 Thread G . Maubach
Hi All,

I would like to route the output to a file using sink(). When using the 
example from the ?sink documentation:

sink("sink-examp.txt")
i <- 1:10
outer(i, i, "*")
sink()
unlink("sink-examp.txt")

## capture all the output to a file.
zz <- file("all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")
sink()
file.show("all.Rout")

I can not open the file in Windows Explorer. The error message is:

"Cannot open file. File is in use be another proces."

How can I close the file in a manner that I can open it right after it was 
created?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: sink(): Cannot open file

2016-05-10 Thread G . Maubach
Hi Jim,

thanks for your reply.

ad 1)
"all.Rout" was created in the correct directory. It exists properly with 
correct file properties on Windows, e.g. creation date and time and file 
size information.

ad 2)
I can not access the file with Notepad.exe directly after it was created 
by R.  The error message is (translated):

"Cannot access file "all.Rout". The file is opened by another process."

ad 3)
If I close R completely the file access is released. Then I can read the 
file using Notepad.exe. The contents is:

Error in log("a") : non-numeric argument to mathematical function

I tried

close(zz)

but the error persists.

To me it looks like R is still accessing the file and not releasing the 
connection for other programs. close(zz) should have solved the problem 
but unfortantely it doesn't.

What else could I try?

Kind regards

Georg




Von:Jim Lemon 
An: g.maub...@weinwolf.de, 
Kopie:  r-help mailing list 
Datum:  10.05.2016 12:50
Betreff:Re: [R] sink(): Cannot open file



Hi Georg,
I don't suppose that you have:

1) checked that the file "all.Rout" exists somewhere?

2) if so, looked at the file with Notepad, perhaps?

3) let us in on the secret by pasting the contents of "all.Rout" into
your message if it is not too big?

At a guess, trying:

 close(zz)

might get you there.

Jim

On Tue, May 10, 2016 at 5:25 PM,   wrote:
> Hi All,
>
> I would like to route the output to a file using sink(). When using the
> example from the ?sink documentation:
>
> sink("sink-examp.txt")
> i <- 1:10
> outer(i, i, "*")
> sink()
> unlink("sink-examp.txt")
>
> ## capture all the output to a file.
> zz <- file("all.Rout", open = "wt")
> sink(zz)
> sink(zz, type = "message")
> try(log("a"))
> ## back to the console
> sink(type = "message")
> sink()
> file.show("all.Rout")
>
> I can not open the file in Windows Explorer. The error message is:
>
> "Cannot open file. File is in use be another proces."
>
> How can I close the file in a manner that I can open it right after it 
was
> created?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Re: sink(): Cannot open file

2016-05-10 Thread G . Maubach
Hi Jim,

I tried:

sink("all.Rout")
try(log("a"))
sink()

The program executes without warning or error. The file "all.Rout" is 
begin created. Nothing will be written to it. The file is accessable 
rights after the execution of the program by notepad.exe.

The program

zz <- file("all.Rout", open = "wt")
sink(zz, type = "message")
try(log("a"))
sink()
close(zz)
unlink(zz)

creates the file, does not write anything to it and is not accessable 
after program execution in R with notepad.exe.

Any ideas what happens behind the szenes?

Kind regards

Georg




Von:Jim Lemon 
An: g.maub...@weinwolf.de, 
Kopie:  r-help mailing list 
Datum:  10.05.2016 13:16
Betreff:Re: Re: [R] sink(): Cannot open file



Have you tried:

sink("all.Rout")
try(log("a"))
sink()

Jim

On Tue, May 10, 2016 at 9:05 PM,   wrote:
> Hi Jim,
>
> thanks for your reply.
>
> ad 1)
> "all.Rout" was created in the correct directory. It exists properly with
> correct file properties on Windows, e.g. creation date and time and file
> size information.
>
> ad 2)
> I can not access the file with Notepad.exe directly after it was created
> by R.  The error message is (translated):
>
> "Cannot access file "all.Rout". The file is opened by another process."
>
> ad 3)
> If I close R completely the file access is released. Then I can read the
> file using Notepad.exe. The contents is:
>
> Error in log("a") : non-numeric argument to mathematical function
>
> I tried
>
> close(zz)
>
> but the error persists.
>
> To me it looks like R is still accessing the file and not releasing the
> connection for other programs. close(zz) should have solved the problem
> but unfortantely it doesn't.
>
> What else could I try?
>
> Kind regards
>
> Georg
>
>
>
>
> Von:Jim Lemon 
> An: g.maub...@weinwolf.de,
> Kopie:  r-help mailing list 
> Datum:  10.05.2016 12:50
> Betreff:Re: [R] sink(): Cannot open file
>
>
>
> Hi Georg,
> I don't suppose that you have:
>
> 1) checked that the file "all.Rout" exists somewhere?
>
> 2) if so, looked at the file with Notepad, perhaps?
>
> 3) let us in on the secret by pasting the contents of "all.Rout" into
> your message if it is not too big?
>
> At a guess, trying:
>
>  close(zz)
>
> might get you there.
>
> Jim
>
> On Tue, May 10, 2016 at 5:25 PM,   wrote:
>> Hi All,
>>
>> I would like to route the output to a file using sink(). When using the
>> example from the ?sink documentation:
>>
>> sink("sink-examp.txt")
>> i <- 1:10
>> outer(i, i, "*")
>> sink()
>> unlink("sink-examp.txt")
>>
>> ## capture all the output to a file.
>> zz <- file("all.Rout", open = "wt")
>> sink(zz)
>> sink(zz, type = "message")
>> try(log("a"))
>> ## back to the console
>> sink(type = "message")
>> sink()
>> file.show("all.Rout")
>>
>> I can not open the file in Windows Explorer. The error message is:
>>
>> "Cannot open file. File is in use be another proces."
>>
>> How can I close the file in a manner that I can open it right after it
> was
>> created?
>>
>> Kind regards
>>
>> Georg
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open file

2016-05-10 Thread G . Maubach
Hi Sarah, John, Jim,
Hi All,

I have set my envrionment variable 

path <- file.path("H:", "2016", "Analysis")
setwd(dir = path)

This works well cause the file is created in that directory.

I have tried

close(zz)
unlink(zz)

and neither worked nor did it work out using them together.

I had this before when working with IBM SPSS Statistics. There was a 
workaround for the problem in SPSS.

Is there one for R?

Kind regards

Georg





Von:Sarah Goslee 
An: g.maub...@weinwolf.de, 
Kopie:  r-help mailing list 
Datum:  10.05.2016 17:17
Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file



Try closing the type of sink you're actually opening:


zz <- file("all.Rout", open = "wt")
sink(zz, type = "message")
try(log("a"))
sink(type = "message")
close(zz)
unlink(zz)


If you look carefully at the example in?sink, there are two close
statements, one for each stream being sent to that file.

Sarah

- Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 10.05.2016 18:29 
-

Von:"John Sorkin" 
An: , , 
Kopie:  
Datum:  10.05.2016 17:20
Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file



George,
I do not know what operating system you are working with, but when I use 
sink() under windows, I need to specify a valid path which I don't see in 
your code. I might, for example specify:

sink("c:\myfile.txt")
 R code goes here
sink()
with the expectation that I would create a file myfile.txt that would 
contain the output of my R program.
 
John


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and 
Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 



On Tue, May 10, 2016 at 11:05 AM,   wrote:
> Hi Jim,
>
> I tried:
>
> sink("all.Rout")
> try(log("a"))
> sink()
>
> The program executes without warning or error. The file "all.Rout" is
> begin created. Nothing will be written to it. The file is accessable
> rights after the execution of the program by notepad.exe.
>
> The program
>
> zz <- file("all.Rout", open = "wt")
> sink(zz, type = "message")
> try(log("a"))
> sink()
> close(zz)
> unlink(zz)
>
> creates the file, does not write anything to it and is not accessable
> after program execution in R with notepad.exe.
>
> Any ideas what happens behind the szenes?
>
> Kind regards
>
> Georg
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Re: Antwort: Re: Re: sink(): Cannot open file

2016-05-11 Thread G . Maubach
Hi Sarah,

yes, I followed your suggestion.

If I do exactly what is in the example of the documentation:

sink("C:/Temp/sink-examp.txt")
i <- 1:10
outer(i, i, "*")
sink()
unlink("C:/Temp/sink-examp.txt")

it does not write anything, i. e. no file is created in "C:/Temp/". The 
script is executed without an error or warning message.

If I run

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")  # I think ,this was your suggestion
sink()
unlink("C:/Temp/all.Rout")

the script is executed without error or warning message, the file is 
created in "C:/Temp/" but if I try to open it right away after the script 
is done the message

DE: "Auf das Dokument "C:\Temp\all.Rout" kann nicht zugegriffen werden, da 
es von einer anderen Anwendung verwendet wird."
EN: "Cannot access the document "C:\Temp\all.Rout" cause it is used by 
another application."

What do I do wrong?

Kind regards

Georg




Von:Sarah Goslee 
An: g.maub...@weinwolf.de, 
Datum:  10.05.2016 18:46
Betreff:Re: Re: [R] Antwort: Re: Re: sink(): Cannot open file



On Tue, May 10, 2016 at 12:34 PM,   wrote:
> sink(type = "message")


But did you do that ^^ as I suggested?


If you start a message sink with
sink(zz, type="message")
as you did, you need to explicitly close that stream. Just using
sink()
doesn't do it.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open file

2016-05-11 Thread G . Maubach
Duncan,

thanks for the hint.

I have done it correctly in R fashion

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")
sink()
unlink("C:/Temp/all.Rout")

But the error persits.

Kind regards

Georg




Von:Duncan Murdoch 
An: John Sorkin , drjimle...@gmail.com, 
g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  10.05.2016 19:03
Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file



On 10/05/2016 11:15 AM, John Sorkin wrote:
> George,
> I do not know what operating system you are working with, but when I use 
sink() under windows, I need to specify a valid path which I don't see in 
your code. I might, for example specify:
>
> sink("c:\myfile.txt")

Note that the backslash should be doubled (so it isn't interpreted as an 
escape for the "m" that follows it), or replaced with a forward slash.

Duncan Murdoch

>   R code goes here
> sink()
>
> with the expectation that I would create a file myfile.txt that would 
contain the output of my R program.
> 
> John
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> >>>  05/10/16 11:10 AM >>>
> Hi Jim,
>
> I tried:
>
> sink("all.Rout")
> try(log("a"))
> sink()
>
> The program executes without warning or error. The file "all.Rout" is
> begin created. Nothing will be written to it. The file is accessable
> rights after the execution of the program by notepad.exe.
>
> The program
>
> zz <- file("all.Rout", open = "wt")
> sink(zz, type = "message")
> try(log("a"))
> sink()
> close(zz)
> unlink(zz)
>
> creates the file, does not write anything to it and is not accessable
> after program execution in R with notepad.exe.
>
> Any ideas what happens behind the szenes?
>
> Kind regards
>
> Georg
>
>
>
>
> Von: Jim Lemon 
> An: g.maub...@weinwolf.de,
> Kopie: r-help mailing list 
> Datum: 10.05.2016 13:16
> Betreff: Re: Re: [R] sink(): Cannot open file
>
>
>
> Have you tried:
>
> sink("all.Rout")
> try(log("a"))
> sink()
>
> Jim
>
> On Tue, May 10, 2016 at 9:05 PM,  wrote:
> > Hi Jim,
> >
> > thanks for your reply.
> >
> > ad 1)
> > "all.Rout" was created in the correct directory. It exists properly 
with
> > correct file properties on Windows, e.g. creation date and time and 
file
> > size information.
> >
> > ad 2)
> > I can not access the file with Notepad.exe directly after it was 
created
> > by R. The error message is (translated):
> >
> > "Cannot access file "all.Rout". The file is opened by another 
process."
> >
> > ad 3)
> > If I close R completely the file access is released. Then I can read 
the
> > file using Notepad.exe. The contents is:
> >
> > Error in log("a") : non-numeric argument to mathematical function
> >
> > I tried
> >
> > close(zz)
> >
> > but the error persists.
> >
> > To me it looks like R is still accessing the file and not releasing 
the
> > connection for other programs. close(zz) should have solved the 
problem
> > but unfortantely it doesn't.
> >
> > What else could I try?
> >
> > Kind regards
> >
> > Georg
> >
> >
> >
> >
> > Von: Jim Lemon 
> > An: g.maub...@weinwolf.de,
> > Kopie: r-help mailing list 
> > Datum: 10.05.2016 12:50
> > Betreff: Re: [R] sink(): Cannot open file
> >
> >
> >
> > Hi Georg,
> > I don't suppose that you have:
> >
> > 1) checked that the file "all.Rout" exists somewhere?
> >
> > 2) if so, looked at the file with Notepad, perhaps?
> >
> > 3) let us in on the secret by pasting the contents of "all.Rout" into
> > your message if it is not too big?
> >
> > At a guess, trying:
> >
> > close(zz)
> >
> > might get you there.
> >
> > Jim
> >
> > On Tue, May 10, 2016 at 5:25 PM,  wrote:
> >> Hi All,
> >>
> >> I would like to route the output to a file using sink(). When using 
the
> >> example from the ?sink documentation:
> >>
> >> sink("sink-examp.txt")
> >> i <- 1:10
> >> outer(i, i, "*")
> >> sink()
> >> unlink("sink-examp.txt")
> >>
> >> ## capture all the output to a file.
> >> zz <- file("all.Rout", open = "wt")
> >> sink(zz)
> >> sink(zz, type = "message")
> >> try(log("a"))
> >> ## back to the console
> >> sink(type = "message")
> >> sink()
> >> file.show("all.Rout")
> >>
> >> I can not open the file in Windows Explorer. The error message is:
> >>
> >> "Cannot open file. File is in use be another proces."
> >>
> >> How can I close the file in a manner that I can open it right after 
it
> > was
> >> created?
> >>
> >> Kind regards
> >>
> >> Georg
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEAS

[R] Antwort: Re: Antwort: Re: Antwort: Re: Re: sink(): Cannot open file (SOLVED)

2016-05-12 Thread G . Maubach
Hi Henrik, Jim, Sarah, Duncan,
Hi All,

I have tried the built-in solution using PowerShell:

$lockedFile="C:\Windows\System32\wshtcpip.dll" 
Get-Process | foreach{$processVar = $_;$_.Modules | foreach{if($_.FileName 
-eq $lockedFile){$processVar.Name + " PID:" + $processVar.id}}}

It did not show any processes.

Then I tried the solution using "RessourceMonitor". There I found two 
processes:

rstudio.exe
rsession.exe

Right-clicking on rstudio.exe and selecting "Warteschlange analysieren" (= 
analyse queue?) showed nothing. Right-clicking on rsession.exe and 
selecting "Warteschlage" said:

"Mindestens ein Thread von rsession.exe wartet auf die Fertigstellung von 
Netzwerk E/A". (= "At least one thread of "rsession.exe" is waiting for 
finishing a network i/o operation").

Putting rsession.exe into the search field of the handles tap of 
RessourceMonitor gave no results. No handles were identified.

I can not follow the suggestions where installation of software is 
required due to security rules of the company I work for.

I had a look at different R versions on my machine:

1) R i386 3.2.2
2) R i386 3.2.4 (revised)
3) R i386 3.2.5
4) R x54 3.2.2
5) R x64 3.2.4 (revised)
6) R x64 3.2.5

I did 

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")
sink()
unlink("C:/Temp/all.Rout")

on R i386 3.2.2 and R x64 3.2.2 directly without RStudio. In both cases 
the file was locked.

Adding

close(zz)

solved the problem in both versions.

Encouraged by this I tired (successivly refered to as "complete code")

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")
sink()
unlink("C:/Temp/all.Rout")
close(zz)

on R i386 3.2.4 (revised) and R x64 3.2.4 (revised) without RStudio. Works 
in both cases. The same with R i386 3.2.5 and R x64 3.2.5 each without 
RStudio.

It did the same with RStudio altering the R version in the RStudio session 
using "complete code". The results are:

R i386 3.2.2: OK
R. x64 3.2.2: OK
R i386 3.2.4 (revised): OK
R x64 3.2.4 (revised): OK
R i386 3.2.5: OK
R x64 3.2.5: OK

This got me lost. I had tried the complete code the last days a hundred 
times. It never worked.

Then I restarted my machine powering up RStudio x64 3.2.5 using the 
"complete code" and ... it worked.

I have no idea what was wrong the last days.

As far as I can say today the documentation of ?sink in R is currently

## capture all the output to a file.
zz <- file("all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")
sink()
file.show("all.Rout")

and should be - in my opinion  - supplemented with

close(zz).

Any thoughts?

Kind regards

Georg




Von:Henrik Bengtsson 
An: g.maub...@weinwolf.de, 
Kopie:  Duncan Murdoch , "r-help@r-project.org" 

Datum:  11.05.2016 21:48
Betreff:Re: [R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open 
file



Sounds like it would be helpful to find out exactly which process is
holding on to the file in order to figure out what's going on. From a
quick look, it seems that

  
http://superuser.com/questions/117902/find-out-which-process-is-locking-a-file-or-folder-in-windows


gives some useful info on how to track down the process that looks the 
file.

/Henrik

On Wed, May 11, 2016 at 9:47 AM,   wrote:
> Duncan,
>
> thanks for the hint.
>
> I have done it correctly in R fashion
>
> ## capture all the output to a file.
> zz <- file("C:/Temp/all.Rout", open = "wt")
> sink(zz)
> sink(zz, type = "message")
> try(log("a"))
> ## back to the console
> sink(type = "message")
> sink()
> unlink("C:/Temp/all.Rout")
>
> But the error persits.
>
> Kind regards
>
> Georg
>
>
>
>
> Von:Duncan Murdoch 
> An: John Sorkin , drjimle...@gmail.com,
> g.maub...@weinwolf.de,
> Kopie:  r-help@r-project.org
> Datum:  10.05.2016 19:03
> Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file
>
>
>
> On 10/05/2016 11:15 AM, John Sorkin wrote:
>> George,
>> I do not know what operating system you are working with, but when I 
use
> sink() under windows, I need to specify a valid path which I don't see 
in
> your code. I might, for example specify:
>>
>> sink("c:\myfile.txt")
>
> Note that the backslash should be doubled (so it isn't interpreted as an
> escape for the "m" that follows it), or replaced with a forward slash.
>
> Duncan Murdoch
>
>>   R code goes here
>> sink()
>>
>> with the expectation that I would create a file myfile.txt that would
> contain the output of my R program.
>>
>> John
>>
>>
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Ba

[R] Antwort: Antwort: Re: Re: Antwort: Re: Re: sink(): Cannot open file

2016-05-12 Thread G . Maubach
Hi Martin,

many thanks for following-up on my question.

I did it again:

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")
sink()
close(zz)

This works.

I tried several other combinations of the commands, e.g.

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
close(zz)

Does not work.

As far as I have understood right now, I have to loosen the connection of 
the streams with sink(zz, type = "message") and sink() before I can close 
the file connection itself.

If I did it like in the last example the connection to the file is lost 
and then the connection to the streams of sink() can not be recovered. 
This will last until the R session is closed and opened again.

To me it looks like I need to learn more about the operation of R under 
the hood.

Kind regards

Georg




Von:Martin Maechler 
An: , 
Kopie:  Sarah Goslee , 
Datum:  12.05.2016 10:40
Betreff:[R] Antwort: Re: Re:  Antwort: Re: Re: sink(): Cannot open 
file




> Hi Sarah,
> yes, I followed your suggestion.

I doubt that you followed it correctly. Sarah's advise is
usually really very sound -- and your code below is *not* :

> If I do exactly what is in the example of the documentation:

> sink("C:/Temp/sink-examp.txt")
> i <- 1:10
> outer(i, i, "*")
> sink()
> unlink("C:/Temp/sink-examp.txt")

> it does not write anything, i. e. no file is created in "C:/Temp/". The 
> script is executed without an error or warning message.

Well, did you ever lookup what unlink() does ?
I save you the time : it does *REMOVE* a file.

So no wonder that you don't see any result after executing the
above R code block..

Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to plot a bunch of dichotomous code variables in one plot using ggplot2

2016-10-05 Thread G . Maubach
Hi All,

I have a bunch of dichotomous code variables which shall be plotted in one 
graph using one of their values, this is "1" in this case.

The dataset looks like this:

-- cut --
var1 <- c(1,0,1,0,0,1,1,1,0,1)
var2 <- c(0,1,1,1,1,0,0,0,0,0)
var3 <- c(1,1,1,1,1,1,1,1,0,1)

ds <- data.frame(var1, var2, var3)
-- cut --

I would like to have a bar plot like this



  *
  *
  *
  *
* *
* *
*  *  *
*  *  *
*  *  *
*  *  *
-
var1  var2   var3

If this possible in R? If so, how can I achieve this?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: RE: How to plot a bunch of dichotomous code variables in one plot using ggplot2

2016-10-05 Thread G . Maubach
Hi Bob,
Hi John,
Hi readers,

many thanks for your reply.

I did

barplot(colSums(dataset %>% select(FirstVar:LastVar)))

and it worked fine.

How would I do it with ggplot2?

Kind regards

Georg




Von:"Fox, John" 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  05.10.2016 15:01
Betreff:RE: [R] How to plot a bunch of dichotomous code variables 
in one plot using ggplot2



Dear Georg,

How about barplot(colSums(ds)) ?

Best,
 John

-
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
Web: socserv.mcmaster.ca/jfox


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: October 5, 2016 8:47 AM
> To: r-help@r-project.org
> Subject: [R] How to plot a bunch of dichotomous code variables in one 
plot
> using ggplot2
> 
> Hi All,
> 
> I have a bunch of dichotomous code variables which shall be plotted in 
one
> graph using one of their values, this is "1" in this case.
> 
> The dataset looks like this:
> 
> -- cut --
> var1 <- c(1,0,1,0,0,1,1,1,0,1)
> var2 <- c(0,1,1,1,1,0,0,0,0,0)
> var3 <- c(1,1,1,1,1,1,1,1,0,1)
> 
> ds <- data.frame(var1, var2, var3)
> -- cut --
> 
> I would like to have a bar plot like this
> 
> 
> 
>   *
>   *
>   *
>   *
> * *
> * *
> *  *  *
> *  *  *
> *  *  *
> *  *  *
> -
> var1  var2   var3
> 
> If this possible in R? If so, how can I achieve this?
> 
> Kind regards
> 
> Georg
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Documenting a function using roxygen2

2016-10-11 Thread G . Maubach
Hi All,

I began to document my functions using roxygen2. This is an example of a 
function I would like to write for training and testing purposes:

t_simple_table <- function(variable,
   useNA = TRUE,
   print = FALSE) {
#' @title Create a simple table for one variable.
#'
#' @description t_simple_table() creates absolute and relative 
#' frequencies, cumulative sums and column sums for both as well as
#' overall statistics about valid N and missing values.
#' 
#' 
#' @param variable (vector, list, data.frame): variable the table is
#' created for.
#' @param useNA (logical): flag to include or exclude missing values
#' from the computation.
#' @param print (logical): flag to print/not print a table before
#' returning it as an object.
#' 
#' @operation
#' Coerces the given variable to a factor.
#' If useNA = TRUE NA is also transformed to a valid value,
#' if useNA = FALSE it is disregarded in all operations.
#' 
#' @return Returns a table with the following statistics:
#' 
#'  Frequencies   Percent   Cumulative
#' Percent
#' Valid . .
#' Missing   . .
#' Total .   100
#' Categories
#'   Cat 1   . ..
#'   Cat 2   . ..
#'   Cat 3   . ..
#'   ... . .  100
#'   Total   .   100
#'
#' @errorhandling None
#' 
#' @version "0.1"
#' 
#' @created "2016-10-11"
#' @updated "2016-10-11"
#' 
#' @status development
#'
#' @see Manderscheid: Sozialwissenschaftliche Datenanalyse mit R, 
#' p. 79ff
#'
#' @author Georg
#'
#' @license GPL-2
 
# function body to be defined

}

Is this a correct header for a function?

How could I do better?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Visibility of libraries called from within functions

2016-10-13 Thread G . Maubach
Hi All,

in my R programs I use different libraries to work with Excel sheets, i. 
e. xlsx, excel.link.

When running chunks of code repeatedly and not always in the order the 
program should run for development purposes I ran into trouble. There were 
conflicts between the methods within these functions causing R to crash.

I thought about defining functions for the different task and calling the 
libraries locally to there functions. Doing this test

-- cut --

f_test <- function() {
library(xlsx)
cat("Loaded packages AFTER loading library")
print(search())
}

cat("Loaded packages BEFORE function call ")
search()

f_test()

cat("Loaded packages AFTER function call -")
search()

-- cut --

showed that the library "xlsx" was loaded into the global environment and 
stayed there although I had expected R to unload the library when leaving 
the function. Thus confilics can occur more often.

I had a look into ?library and saw that there is no argument telling R to 
hold the library in the calling environment.

How can I load libraries locally to the calling functions?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Visibility of libraries called from within functions

2016-10-13 Thread G . Maubach
Hi Duncan,

many thanks for your reply.

Your suggestion of using requireNamespace() together with explicit 
namespace calling using the "::" operator is what I was looking for:

-- cut --

f_test <- function() {
requireNamespace("openxlsx")
cat("Loaded packages AFTER loading library")
print(search())
xlsx::read.xlsx(file = "c:/temp/test.xlsx",
sheetName = "test")
}

cat("Loaded packages BEFORE function call ")
search()

f_test()

cat("Loaded packages AFTER function call -")
search()

-- cut  --

When reading ?requireNamespace I did not really get how R operates behind 
the scenes.

Using "library" attaches the namespace to the search path. Using 
"requireNamespace" does not do that.

But how does R find the namespace then? What kind of list or directory 
used R to to store the namespace and lookup the correct function or 
methods of this namespace?

Kind regards

Georg




Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  13.10.2016 10:43
Betreff:Re: [R] Visibility of libraries called from within 
functions



On 13/10/2016 4:18 AM, g.maub...@weinwolf.de wrote:
> Hi All,
>
> in my R programs I use different libraries to work with Excel sheets, i.
> e. xlsx, excel.link.
>
> When running chunks of code repeatedly and not always in the order the
> program should run for development purposes I ran into trouble. There 
were
> conflicts between the methods within these functions causing R to crash.
>
> I thought about defining functions for the different task and calling 
the
> libraries locally to there functions. Doing this test
>
> -- cut --
>
> f_test <- function() {
> library(xlsx)
> cat("Loaded packages AFTER loading library")
> print(search())
> }
>
> cat("Loaded packages BEFORE function call ")
> search()
>
> f_test()
>
> cat("Loaded packages AFTER function call -")
> search()
>
> -- cut --
>
> showed that the library "xlsx" was loaded into the global environment 
and
> stayed there although I had expected R to unload the library when 
leaving
> the function. Thus confilics can occur more often.
>
> I had a look into ?library and saw that there is no argument telling R 
to
> hold the library in the calling environment.
>
> How can I load libraries locally to the calling functions?

You can detach at the end of your function, but that's tricky to get 
right:  the package might have been on the search list before your 
function was called.  It's better not to touch the search list at all.

The best solution is to use :: notation to get functions without putting 
them on the search list.  For example, use

xlsx::write.xlsx(data, file)

If you are not sure if your user has xlsx installed, you can use 
requireNamespace() to check.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: Antwort: Re: Visibility of libraries called from within functions

2016-10-13 Thread G . Maubach
Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  13.10.2016 12:34
Betreff:Re: Antwort: Re: [R] Visibility of libraries called from 
within functions



On 13/10/2016 6:21 AM, g.maub...@weinwolf.de wrote:
> Hi Duncan,
>
> many thanks for your reply.
>
> Your suggestion of using requireNamespace() together with explicit
> namespace calling using the "::" operator is what I was looking for:
>
> -- cut --
>
> f_test <- function() {
> requireNamespace("openxlsx")
> cat("Loaded packages AFTER loading library")
> print(search())
> xlsx::read.xlsx(file = "c:/temp/test.xlsx",
> sheetName = "test")
> }

Not sure if that's a typo in your message or a real error, but you 
require "openxlsx" and then use "xlsx".

It's a typo!


>
> cat("Loaded packages BEFORE function call ")
> search()
>
> f_test()
>
> cat("Loaded packages AFTER function call -")
> search()
>
> -- cut  --
>
> When reading ?requireNamespace I did not really get how R operates 
behind
> the scenes.
>
> Using "library" attaches the namespace to the search path. Using
> "requireNamespace" does not do that.
>
> But how does R find the namespace then? What kind of list or directory
> used R to to store the namespace and lookup the correct function or
> methods of this namespace?

R has an internal list of packages that are loaded.  Functions in them 
are only visible to user code if the package is *also* on the search 
list, or if the package name prefix is used with ::.

Can I have a look at this internal list like I can do with search() for 
pachages or ls() for objects?

If xlsx is loaded, xlsx::read.xlsx will just use it; if it is not 
loaded, the package will be loaded to make the call.  So you don't need 
the requireNamespace call if you can be sure that xlsx will be found. 
You would normally use its return value (FALSE if the package is not 
found) to test whether it will be safe to make the xlsx::read.xlsx call.

Got it!



Duncan Murdoch

>
> Kind regards
>
> Georg
>
>
>
>
> Von:Duncan Murdoch 
> An: g.maub...@weinwolf.de, r-help@r-project.org,
> Datum:  13.10.2016 10:43
> Betreff:Re: [R] Visibility of libraries called from within
> functions
>
>
>
> On 13/10/2016 4:18 AM, g.maub...@weinwolf.de wrote:
>> Hi All,
>>
>> in my R programs I use different libraries to work with Excel sheets, 
i.
>> e. xlsx, excel.link.
>>
>> When running chunks of code repeatedly and not always in the order the
>> program should run for development purposes I ran into trouble. There
> were
>> conflicts between the methods within these functions causing R to 
crash.
>>
>> I thought about defining functions for the different task and calling
> the
>> libraries locally to there functions. Doing this test
>>
>> -- cut --
>>
>> f_test <- function() {
>> library(xlsx)
>> cat("Loaded packages AFTER loading library")
>> print(search())
>> }
>>
>> cat("Loaded packages BEFORE function call 
")
>> search()
>>
>> f_test()
>>
>> cat("Loaded packages AFTER function call 
-")
>> search()
>>
>> -- cut --
>>
>> showed that the library "xlsx" was loaded into the global environment
> and
>> stayed there although I had expected R to unload the library when
> leaving
>> the function. Thus confilics can occur more often.
>>
>> I had a look into ?library and saw that there is no argument telling R
> to
>> hold the library in the calling environment.
>>
>> How can I load libraries locally to the calling functions?
>
> You can detach at the end of your function, but that's tricky to get
> right:  the package might have been on the search list before your
> function was called.  It's better not to touch the search list at all.
>
> The best solution is to use :: notation to get functions without putting
> them on the search list.  For example, use
>
> xlsx::write.xlsx(data, file)
>
> If you are not sure if your user has xlsx installed, you can use
> requireNamespace() to check.
>
> Duncan Murdoch
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reshaping geographic data

2016-10-17 Thread G . Maubach
Hi All,

I need to reshape an ESRI shape file: http://arnulf.us/PLZ and resp 
http://www.metaspatial.net/download/plz.tar.gz

I found an instruction for T-SQL Server:

https://blog.oraylis.de/2010/05/german-map-spatial-data-for-plz-postal-code-regions/

How can I do this using R?

Kind regards

Georg

-- cut --
Here's my code so far:

download.file(
url = "http://www.metaspatial.net/download/plz.tar.gz";,
destfile = "C:/temp/plz.tar.gz")

untar(tarfile = "C:/temp/plz.tar.gz",
  exdir = "C:/temp",
  compressed = "gzip")

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Storing long string with white space in variable

2016-10-19 Thread G . Maubach
Hi All,

I would like to store a long string with white space in a variable:

-- cut --
  # Create README.md
  readme <- "---
title: "Your project title here"
author: "Author(s) name(s) here"
date: "Current date here"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = FALSE)
```
# Project Context

# Goals

# Approach

# Reference to main program
´´´{r}
source("main_program.R")
´´´

# Information on used system and configuration
```{r}
cat("Gathering system information ...\n)
sessionInfo()
```
"
cat(readme, file = "README.md")

-- cut --

I am looking for an equivalent to Pythons """  """ long string feature.

I searched the web and found this:

http://stackoverflow.com/questions/6329962/split-code-over-multiple-lines-in-an-r-script
https://stat.ethz.ch/pipermail/r-help/2006-October/115358.html

But this is not the solution to the problem.

How can I store long strings with white space in a variable?

Kind regards

Georg

PS: This is a template for a project folder for each project. I would like 
to create it with R script instead of distributing it as a template file. 
This way one needs only the R script to setup a project like this:

#---
# Module: t_setup_project_directory.R
# Author: Georg Maubach
# Date  : 2016-10-19
# Update: 2016-10-19
# Description   : Setup a directory structure for a new analytics
# project
# Source System : R 3.3.0 (64 Bit)
# Target System : R 3.3.0 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7--

t_version = "2016-10-19"
t_module_name = "t_setup_project_directory.R"
t_status = "development"

cat(
paste0(
"\n",
t_module_name,
" (Version: ",
t_version,
", Status: ",
t_status,
")",
"\n",
"\n",
"Copyright (C) Georg Maubach 2016

This software comes with ABSOLUTELY NO WARRANTY.",
"\n",
"\n"
)
)

library(svDialogs)

# If do_test is not defined globally define it here locally by 
un-commenting it
t_do_test <- FALSE

# [ Function Defintion 
]
t_setup_project_directory <- function() {
 
#-
  # Setup a directory structure for a new analytics
  #
  # Args:
  #   None.
  #
  # Operation:
  #   The user can create or select a directory for the projects files.
  #   The function then places all sub directories in this project
  #   folder.
  #   The function saves a RData file with objects containing the path
  #   to project directory and its sub folders.
  #
  # Returns:
  #   Nothing.
  #
  # Error handling:
  #   None.
  #
  # See also:
  #   ./.
 
#-

  # Get and/or create project directory
  v_project_dir <- svDialogs::dlgDir()$res

  # Define names for sub directories
  data  <- "data" # data to be loaded into or
  # saved from R
  documentation <- "documentation"# explanatory material for results
  # (e. g. knitR documents)
  fundamentals  <- "fundamentals" # background knowledge
  input <- "data/input"   # input data eventually manually
  # revised for import
  meta  <- "data/meta"# meta data (e. g. lookup tables)
  output<- "data/output"
  raw   <- "data/raw" # a copy of all input data never
  # touched for safety reasons and
  # not read by R
  program   <- "program"  # all scripts and runnable files
  modules   <- "program/modules"  # project specific packages, files
  # or functions in separate files as
  # well as all other sub routines to
  # be sourced or loaded
  results   <- "results"  # container for all resulring data
  # in an aggregated form
  graphics  <- "results/graphics"
  tables<- "results/tables"
  presentations <- "results/presentations"
  temp  <- "temp"

  v_paths_relative <- list(
project   = v_project_dir,
documentation = documentation,
fundamentals  = fundamentals,
input = input,
meta  = meta,
output= output,
raw   = raw,
program   = program,
modules   = modules,
graphic   = graphics,
table = tables,
presentation  = prese

[R] openxlsx Error: length of rows and cols must be

2016-11-15 Thread G . Maubach
Hi All,

when using 

-- cut --

number_style <- openxlsx::createStyle(
  numFmt = "COMMA"
)

openxlsx::addStyle(
  wb = xlsx_workbook,
  sheet = "Kundenliste",
  style = number_style,
  rows = 2:nrow(customer_list),
  cols = 4:5
  )
--cut --

I get the error

Error in openxlsx::addStyle(wb = xlsx_workbook, sheet = "Kundenliste",  : 
  Length of rows and cols must be equal.

The customer_list can be of any arbritrary length due to subgroup 
definitons. I do not see why the argument "rows" and "cols" should be of 
the same length. This would mean that number formatting can only be done 
for rectangular areas.

What do I need to change to format my numbers in the given area correctly?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Different results when converting a matrix to a data.frame

2016-11-16 Thread G . Maubach
Hi All,

I build an empty dataframe to fill it will values later. I did the 
following:

-- cut --
matrix(NA, 2, 2)
 [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> data.frame(matrix(NA, 2, 2))
  X1 X2
1 NA NA
2 NA NA
> as.data.frame(matrix(NA, 2, 2))
  V1 V2
1 NA NA
2 NA NA
-- cut --

Why does data.frame deliver different results than as.data.frame with 
regard to the variable names (V instead of X)?

Kind regards

Georg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] for loop is looping only once

2016-11-17 Thread G . Maubach
Hi All,

I need to execute a loop on variables to compute several KPIs. 
Unfortunately the for loop is executed only once for the last KPI given. 
The code below illustrates my current solution but is not completely 
necessary to spot the problem. I just give an idea what I am doing 
overall. Looks much but isn't if copied and run in RStudio. The problem 
occurs in function f_create_kpi_table() in lines 150 to 157:

  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)

Here is the complete example code with example data:

- cut --
dataset <-
  structure(
list(
  to_2012 = c(
85,
822,
891,
700,
386,
127,
938,
381,
871,
254,
793,
0,
934,
217,
163,
755,
607,
794,
477
  ),
  to_2013 = c(
289,
0,
963,
243,
608,
47,
0,
941,
998,
775,
326,
0,
0,
470,
248,
439,
212,
0,
0
  ),
  to_2014 = c(0, 0, 71, 0, 0, 434, 0, 282, 0,
  0, 405, 0, 0, 642, 0, 0, 0, 47, 299),
  to_2015 = c(
705,
134,
659,
0,
609,
807,
783,
0,
0,
304,
141,
500,
0,
0,
764,
790,
851,
0,
802
  ),
  kpi1_2013 = c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 1, 1),
  kpi1_2014 = c(1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0,
1, 1, 0, 1, 1, 1, 0, 0),
  kpi1_2015 = c(0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0),
  kpi1_2016 = c(0, 1, 0, 1, 0,
1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1),
  kpi2_2013 = c(1, 0,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0),
  kpi2_2014 = c(0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1),
  kpi2_2015 = c(1,
1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1),
  kpi2_2016 = c(1,
0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0)
),
.Names = c(
  "to_2012",
  "to_2013",
  "to_2014",
  "to_2015",
  "kpi1_2013",
  "kpi1_2014",
  "kpi1_2015",
  "kpi1_2016",
  "kpi2_2013",
  "kpi2_2014",
  "kpi2_2015",
  "kpi2_2016"
),
row.names = c(NA, 19L),
class = "data.frame"
  )

f_compute_kpi <- function(
  years,
  kpi,
  kpi_base)
{
  print(years)
  print(kpi)
  print(kpi_base)

  ds_result <- data.frame()

  for (year in years) {
current_year  <- year
previous_year <- year - 1
result <- sum(dataset[dataset[[paste0(kpi,
  "_",
  current_year)]] == 1 ,
  paste0(kpi_base,
 "_", previous_year)],
  na.rm = TRUE)
ds_result <- rbind(ds_result, result)
  }

  ds_result   <- t(ds_result)
  rownames(ds_result) <- kpi
  colnames(ds_result) <- years

  invisible(ds_result)
}

f_create_kpi_table <- function(
  years,
  kpis,
  kpi_bases)
{
  print(length(kpis))

#-- Problematic loop --
  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)
  }
  # This for loop is executed only once for kpi2 instead of
  # as many times as given kpis in length(kpis), i. e.
  # kpi1 AND kpi2.
  # Why?
  # What do I do wrong?
}
-- cut --

What do I need to change to get the loop work correctly and loop over two 
elements instead of one when calling the function

f_create_kpi_table(years = 2013:2016, kpis = c("kpi1", "kpi2"), kpi_bases 
= c("to", "to"))

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Antwort: Re: for loop is looping only once [SOLVED]

2016-11-17 Thread G . Maubach
Hi Ulrik,

oh no! What a mistake did I make. But I definitely did not see the 
failure.

Many thanks for helping me.

Kind regards

Georg




Von:Ulrik Stervbo 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  17.11.2016 12:24
Betreff:Re: [R] for loop is looping only once



Hi Georg,

Your for loop iterates over just one value, to get it to work as you 
intend use for(item in 1:length(kpis)){}

HTH
Ulrik

On Thu, 17 Nov 2016 at 12:18  wrote:
Hi All,

I need to execute a loop on variables to compute several KPIs.
Unfortunately the for loop is executed only once for the last KPI given.
The code below illustrates my current solution but is not completely
necessary to spot the problem. I just give an idea what I am doing
overall. Looks much but isn't if copied and run in RStudio. The problem
occurs in function f_create_kpi_table() in lines 150 to 157:

  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)

Here is the complete example code with example data:

- cut --
dataset <-
  structure(
list(
  to_2012 = c(
85,
822,
891,
700,
386,
127,
938,
381,
871,
254,
793,
0,
934,
217,
163,
755,
607,
794,
477
  ),
  to_2013 = c(
289,
0,
963,
243,
608,
47,
0,
941,
998,
775,
326,
0,
0,
470,
248,
439,
212,
0,
0
  ),
  to_2014 = c(0, 0, 71, 0, 0, 434, 0, 282, 0,
  0, 405, 0, 0, 642, 0, 0, 0, 47, 299),
  to_2015 = c(
705,
134,
659,
0,
609,
807,
783,
0,
0,
304,
141,
500,
0,
0,
764,
790,
851,
0,
802
  ),
  kpi1_2013 = c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 1, 1),
  kpi1_2014 = c(1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0,
1, 1, 0, 1, 1, 1, 0, 0),
  kpi1_2015 = c(0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0),
  kpi1_2016 = c(0, 1, 0, 1, 0,
1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1),
  kpi2_2013 = c(1, 0,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0),
  kpi2_2014 = c(0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1),
  kpi2_2015 = c(1,
1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1),
  kpi2_2016 = c(1,
0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0)
),
.Names = c(
  "to_2012",
  "to_2013",
  "to_2014",
  "to_2015",
  "kpi1_2013",
  "kpi1_2014",
  "kpi1_2015",
  "kpi1_2016",
  "kpi2_2013",
  "kpi2_2014",
  "kpi2_2015",
  "kpi2_2016"
),
row.names = c(NA, 19L),
class = "data.frame"
  )

f_compute_kpi <- function(
  years,
  kpi,
  kpi_base)
{
  print(years)
  print(kpi)
  print(kpi_base)

  ds_result <- data.frame()

  for (year in years) {
current_year  <- year
previous_year <- year - 1
result <- sum(dataset[dataset[[paste0(kpi,
  "_",
  current_year)]] == 1 ,
  paste0(kpi_base,
 "_", previous_year)],
  na.rm = TRUE)
ds_result <- rbind(ds_result, result)
  }

  ds_result   <- t(ds_result)
  rownames(ds_result) <- kpi
  colnames(ds_result) <- years

  invisible(ds_result)
}

f_create_kpi_table <- function(
  years,
  kpis,
  kpi_bases)
{
  print(length(kpis))

#-- Problematic loop --
  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)
  }
  # This for loop is executed only once for kpi2 instead of
  # as many times as given kpis in length(kpis), i. e.
  # kpi1 AND kpi2.
  # Why?
  # What do I do wrong?
}
-- cut --

What do I need to change to get the loop work correctly and loop over two
elements instead of one when calling the function

f_create_kpi_table(years = 2013:2016, kpis = c("kpi1", "kpi2"), kpi_bases
= c("to", "to"))

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.o

[R] openxlsx: No Formatting of Numbers (TEXT ONLY)

2016-12-05 Thread G . Maubach
Hi All,
Dear Readers,

I am using openxlsx to export data to Microsoft Excel 2013, 32-Bit, German 
Version:

--- schnipp ---

library("openxlsx")

dataset <- structure(
  list(
a = c(1126039.81, 45636.44, 14847.41),
b = c(1194447.5,
  88310.53, 18699.68),
c = c(1560307.73, 34203.73, 24755.99),
d = c(1068790.67,
  67581.86, 12378.55)
  ),
  .Names = c("a", "b", "c", "d"),
  row.names = c(NA,
3L),
  class = "data.frame"
)

xlsx_workbook <- openxlsx::createWorkbook()
openxlsx::addWorksheet(
  wb = xlsx_workbook,
  sheetName = "Numbers")

openxlsx::writeData(
  wb = xlsx_workbook,
  sheet = "Numbers",
  x = dataset,
  rowNames = TRUE,
  colNames = TRUE,
  startRow = 2,
  startCol = 2,
  borders = c("surrounding")
)

myStyle <- openxlsx::createStyle(numFmt = "###.###.##0")

openxlsx::addStyle(wb = xlsx_workbook,
   sheet = "Numbers",
   style = myStyle,
   rows = 1:1,
   cols = 10:10,
   gridExpand = TRUE,
   stack = TRUE)

openxlsx::saveWorkbook(
  wb = xlsx_workbook,
  file = "C:/temp/openxlsx_example.xlsx",
  overwrite = TRUE
)

--- schnipp ---

The problem with this is, that it does not apply the number formats to the 
Excel cell on the sheet. Also, sometimes the boarder of the data on the 
Excel sheet is delete. I could not find out yet what the cause for this 
behaviour is.

My sessionInfo() output is:

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 
[2] LC_CTYPE=German_Germany.1252 
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] tools stats graphics  grDevices utils 
[6] datasets  methods   base 

other attached packages:
[1] tidyr_0.5.1stringr_1.1.0  reshape2_1.4.1
[4] openxlsx_3.0.0 dplyr_0.5.0 

loaded via a namespace (and not attached):
[1] lazyeval_0.2.0 plyr_1.8.4 magrittr_1.5 
[4] R6_2.2.0   assertthat_0.1 DBI_0.4-1 
[7] tibble_1.1 Rcpp_0.12.5stringi_1.1.1 

I do not want to round the numbers in R, cause my clients would like to 
use them as they are in further calculations.

How can I export a dataframe to Excel, print a border around the complete 
table/dataset (not the single cells) and format the numbers like 
123.456.789 (thousand delimiter dot ".", all numbers without decimals)?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >