from:"Michael Pearmain"

[R] Selecting Variables

2008-08-05 Thread Michael Pearmain

Hi All,

i have a dataset that i want to dynamically inspect for the number of
variables that start with "Exposure_"  and then for these count the entries
across each case i.e

ID Exposure_1 Exposure_2 Exposure_3
1y yy
2y y-
3y - -

So the corresponding new variables that would be created are

ID Max_Exposure Unique_Exposure
1   3   3
2   3   2
3   3   1

I know this may seem fairly basic but it will give me the starting point to
develop more advanced things with loop and nat lang

Thanks in advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting Variables

2008-08-05 Thread Michael Pearmain

Thanks for the help guys,

i think i needed to be a bit more explicit however (sorry)

There are lots of variables between each exposure and the values are nominal
with upto 6 values..
And to add to the problem the datasets i deal with range from anything upto
5G.

My guess is that the melt function would be inefficient in this situation.

I was looking at the agrep function to count the number Exposures in the
names() , i wasn't sure of how to count if there was a value in each one but
the y[complete.cases(y),] looks like a nice function.

Is this a good path to follow?

On Tue, Aug 5, 2008 at 3:09 PM, jim holtman <[EMAIL PROTECTED]> wrote:

> I am not sure where the "Max" comes from, but this might be a start for
> you:
>
> > x <- read.table(textConnection("ID Exposure_1 Exposure_2 Exposure_3
> + 1y yy
> + 2y y-
> + 3y - -"), header=TRUE,
> na.strings='-')
> > closeAllConnections()
> > require(reshape)
> > y <- melt(x, id.var='ID')
> > # get rid of NAs
> > y <- y[complete.cases(y),]
> > y
>  ID   variable value
> 1  1 Exposure_1 y
> 2  2 Exposure_1 y
> 3  3 Exposure_1 y
> 4  1 Exposure_2 y
> 5  2 Exposure_2 y
> 7  1 Exposure_3 y
> > cbind(Unique=tapply(y$ID, y$ID, length))
>  Unique
> 1  3
> 2  2
> 3  1
> >
>
>
> On Tue, Aug 5, 2008 at 9:21 AM, Michael Pearmain <[EMAIL PROTECTED]>
> wrote:
> > Hi All,
> >
> > i have a dataset that i want to dynamically inspect for the number of
> > variables that start with "Exposure_"  and then for these count the
> entries
> > across each case i.e
> >
> > ID Exposure_1 Exposure_2 Exposure_3
> > 1y yy
> > 2y y-
> > 3y - -
> >
> > So the corresponding new variables that would be created are
> >
> > ID Max_Exposure Unique_Exposure
> > 1   3   3
> > 2   3   2
> > 3   3   1
> >
> > I know this may seem fairly basic but it will give me the starting point
> to
> > develop more advanced things with loop and nat lang
> >
> > Thanks in advance
> >
> > Mike
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

-- 
Michael Pearmain
Senior Statistical Analyst

1st Floor, 180 Great Portland St. London W1W 5QZ
t +44 (0) 2032191684
[EMAIL PROTECTED]
[EMAIL PROTECTED]

Doubleclick is a part of the Google group of companies

"If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks."

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Flag variable

2008-08-13 Thread Michael Pearmain

Hi All,

I have 4000 case which have string variables in them, i want to do some
fuzzy matching and create a new variable that is of the same length with 0
or 1's
if i use the code
test<- agrep("web Klick",ETC$Exposure.Type , max = 2, ignore.case = TRUE)
it works but i get
> length(test)
[1] 3127

This returns the case values that do match, can someone tell me how to match
this on the dataset (ETC) that i have as 1 and 0 ?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with nls and error messages singular gradient

2009-08-25 Thread Michael Pearmain

Hi All,

I'm trying to run nls on the data from the study by Marske (Biochemical
Oxygen Demand Interpretation Using Sum of Squares Surface. M.S. thesis,
University of Wisconsin, Madison, 1967) and was reported in Bates and Watts
(1988).

Data is as follows, (stored as mydata)

  time  bod
11 0.47
22 0.74
33 1.17
44 1.42
55 1.60
67 1.84
79 2.19
8   11 2.17

I then run the following;
#Plot initial curve
plot(mydata$time, mydata$bod,xlab="Time (in days)",ylab="biochemical oxygen
demand (mg/l) ")

model <- nls(bod ~ beta1/(1 - exp(beta2*time)), data =
mydata, start=list(beta1 = 3, beta2 = -0.1),trace=T)

The start values are recommended, (I have used these values in SPSS without
any problems, SPSS returns values of Beta1 = 2.4979 and Beta2 = -2.02 456)

but return the error message,
Error in nls(bod ~ beta1/(1 - exp(beta2 * time)), data = mydata, start =
list(beta1 = 3,  : singular gradient

Can anyone offer any advice?

Thanks in advance

Mike









-- 
Michael Pearmain
Senior Analytics Research Specialist

Statistics are like women; mirrors of purest virtue and truth, or like
whores to use as one pleases

Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
mpearm...@google.com

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Viewing Function Code

2009-09-15 Thread Michael Pearmain

Hi All,
I'd like to see the function code behind the barplots2() function in the
gplots package, however i come across a bit of a stumbling block of a hidden
function, can anyone help?

> library(gplots)
> methods(barplot2)
[1] barplot2.default*

   Non-visible functions are asterisked
> barplot2
function (height, ...)
UseMethod("barplot2")


Mike



-- 
Michael Pearmain
Senior Analytics Research Specialist

"I abhor averages.  I like the individual case.  A man may have six meals
one day and none the next, making an average of three meals per day, but
that is not a good way to live.  ~Louis D. Brandeis"

Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
mpearm...@google.com

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] List mappings and variable creation

2009-10-12 Thread Michael Pearmain

Hi All,

I have a questions about associative list mappings in R, and if they are
possible?

I have data in the form show below, and want to make a new 'bucket' variable
called combined. Which is the sum of the control and the exposed metric
values
This combined variable is a many to many matching as values only appear in
the file if they have a value > 0.

conversion.type   filteredIDbucketID  Metric   Value
countertrue  control   a  1
countertrue  control   b  1
countertrue  control   c  2
countertrue  control   d  3

countertrue  exposed a 4
countertrue  exposed e 1

ASIDE:

At the minute i read the data into my file and and then create all the
'missing' row values
(in this case,
countertrue  control e 0
countertrue  exposed   b  0
countertrue  exposed   c  0
countertrue  exposed   d  0)


and then run  a sort on the data, and count the number of times control
appears, and then use this as an index matcher.

saw.aggr.data <- [order(saw.aggr.data$bucketID, saw.aggr.data$metric), ]
no.of.metrics <- length(saw.aggr.data$bucketID[grep("control",
saw.aggr.data$bucketID)])

for (i in (1:no.of.metrics)) {
  assign(paste("combined", as.character(saw.aggr.data$metric[i])),
(saw.aggr.data$value[i] + saw.aggr.data$value[i + no.of.metrics]))
}

This does what i want it to but is very very weak and could be open to large
errors, ( error handling currently via grepping the names of the metric[i]
== name of metric [i + no.of.metrics])

Is there a more powerful way of doing this using some kind of list mapping?
I've looked at the older threads in this area and it looks like something
that should be possible but i can't figure out how to do this?
Ideally i'd like a final dataset  / list that is of the following form.

conversion.type   filteredIDbucketID  Metric   Value
countertrue  control   a  1
countertrue  control   b  1
countertrue  control   c  2
countertrue  control   d  3

countertrue  exposed a 4
countertrue  exposed e 1
countertrue  combineda 5
countertrue  combinedb 1
countertrue  combinedc 2
countertrue  combinedd 3
countertrue  combinede 1

So i dont have to create the dummy variables.

does this make sense?

Many thanks in advance

Mike



-- 
Michael Pearmain
"I abhor averages.  I like the individual case.  A man may have six meals
one day and none the next, making an average of three meals per day, but
that is not a good way to live.  ~Louis D. Brandeis"

f you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Numeric formatting question

2009-11-10 Thread Michael Pearmain

Hi All,

I have am using Sweave and the \Sexpr{} to place some numeric variables in
my tex document. I want to format the number prior to entry so they read
slightly more elegantly.
Say i have the following numbers
x <- 0.00487324
y <- 0.00432
z <- 0.567

I would like to have the numbers displayed as follows

x1 <- 0.0049
y1 <- 0.0043
z1 <- 0.57

I've seen i can use sprintf("%.3f", pi) for example to get the formating
after the decimal place, but i can't figure out an elegant way to find the
position of the first non-zero entry to allow me to substitute this value
into the sprintf command.

Can anyone offer any advise?

Thanks in advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Title splitting function

2010-01-22 Thread Michael Pearmain

Hi All,

I'm trying to write a function to automatically split long strings so they
will appear nicely in a chart i'm trying to create,

Say i have a string

title <- "some variety of words that are descriptive"

In this instance i want to place carriage return where there is a space just
prior to a specified number of characters (in this case 15)

title.length <- nchar(title)
no.splits <- floor(title.length / 15)
space.title <- c(gregexpr("[[:space:]]", title)[[1]])

space.title # This tells me the position of all spaces in the title
[1]  5 13 16 22 27 31
> no.splits # This tells me how many carriage returns i will need
[1] 2
> title.length # this tells me teh total length of the title string
[1] 42

I can then check to see where the last value is for each string i.e. where i
should make the break with (no.splits * characters (i.e 15)
 15 < space.title ##(15 * 1 split)
[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE
> 30 < space.title  ## (15 *2 splits)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE
>

(I'm guessing i need to create some loop or apply here)

So i know i need to do a sub at positions 13 and 27 of "" to "\n"

So my final output would appear as
title <- "some variety\nof words that are\ndescriptive"

But i'm getting stuck as to find a way to work out the the positions 13, 27
dynamically and returning them to use later
Can anyone offer any advise?

Thanks All.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Passing function arguments

2011-02-11 Thread Michael Pearmain

Hi All,

Im looking for some help passing function arguments and referencing them,
I've made a replica, less complicated function to show my problem, and how
i've made a work around for this. However i suspect there is a _FAR_ better
way of doing this.

If i do:
BuildDecayModel <- function(x = "this", y = "that", data = model.data) {
  model <- nls(y ~ SSexp(x, y0, b), data = model.data)
  return(model)
}
...

"Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
  0 (non-NA) cases"

This function returns an error because the args are passed as "this" and
"that" to the model, and so fails (correct?)

If  i do the following:
BuildDecayModel <- function(x = "total.reach", y = "lift", data =
model.data) {
  x <- data[[x]]
  y <- data[[y]]
  model.data <- as.data.frame(cbind(x,y))
  model <- nls(y ~ SSexp(x, y0, b), data = model.data)
  return(model)
}

This works for me, but it seems that i'm missing a trick with just
manipulating the args rather than making an entire new data.frame to work
off,

Can anyone offer some advice?

Thanks in advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Passing Arguments in a function

2011-02-15 Thread Michael Pearmain

Hi All,

I'm having some trouble assigning arguments inside a function to
produce a plot from a model

Can anyone help me? Below I've outlined the situation and examples of
failing and working code.

Regards

Mike



## data ##

decay.data <- ...

   behaviors lift reach.uu estimated.conversions.uu total.reach
1  1 432.0083  770  7700.00
2  2 432.008329660296600.03
3  3 429.98643298500.03
4  4 427.859930320300200.03
5  5 424.010530680301100.03
6  6 418.471031180302000.03
7  7 412.002231840303600.03
8  8 405.455332340303500.03
9  9 397.008333260305600.03
1010 393.238233600305800.03
1111 385.543134940311800.03
1212 384.294235050311700.03
1313 374.029936110312600.03
1414 363.305737290313500.03
1515 353.118538450314200.03
1616 342.21904316800.03
1717 338.923240470317400.04
1818 328.866641880318800.04
1919 318.321945510335300.04
2020 308.120047230336800.04

If i use:

library(nlrwr)
# Build a model.
decay.model <- nls(lift ~ SSexp(total.reach, y0, b), data = decay.data)

# plot the model
plot(decay.data[["total.reach"]], decay.data[["lift"]])
xv <- seq(min(decay.data[["lift"]]), max(decay.data[["total.reach"]]), 0.02)
yv <- predict(decay.model, newdata = list(total.reach = xv))
lines(xv,yv)

This works.

If i try and wrap this in a function and pass the argument values i fail when
i reach the "list(total.reach = xv)" i've tried various flavours or paste(),
but again can't figure out where i am going wrong, any help appreciated.

PlotDecayModel <- function(x = "total.reach",
   y = "lift",
   data) {

  decay.model <- BuildDecayModel(x= "total.reach",
 y = "lift",
 data = data)
  # Plot the lift Vs reach plot.
  plot(data[[x]], data[[y]])
  # Add the model curve to the plot.
  xv <- seq(min(data[[x]]), max(data[[x]]), 0.02)
  yv <- predict(decay.model, newdata = list(x = xv))
  lines(xv,yv)
}

I return the error
Error in xy.coords(x, y) : 'x' and 'y' lengths differ

I can see this is because the function ignores the  'newdata = list(x = xv)'
as it is trying ot assign x on the data to be xv,
(which doesn't exist in the model), so how can i use the arg "total.reach" so
the argument 'newdata = list(x = xv)' is evaluated as
 'newdata = list(total.reach = xv)'

Many thanks in advance

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R code for OptiGrid Clustering

2011-08-26 Thread Michael Pearmain

Hi All,

Has anyone coded up the OptiGrid clustering algorithm for high dimensional
space?
If so is anyone willing to share?

Many thanks in Advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Loading List data into R with scan()

2011-06-23 Thread Michael Pearmain

Hi All,

I've been given a data file of the form:
1: 3,4,5,6
2:1,2,3
43: 5,7,8,9,5

and i want to read this data in as a list to create the form:
(guessing final look)
my.list
[[1]]
[1] 3 4 5 6

[[2]]
[1] 1 2 3

[[43]]
[1] 5 7 8 9 5

I can get to a stage using scan:
scan("my.data", what = character(0), quiet = TRUE)
to load
[1] "1: 3,4,5,6"
[2] "2:1,2,3"
[3] "43: 5,7,8,9,5"

but im not sure on how next to proceed to arrange this into a list form, can
anyone offer some advise?

Thanks in advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading List data into R with scan()

2011-06-23 Thread Michael Pearmain

Thanks Uwe,

The list elements was a mistake on my part, i just wanted everything before
the : to be the name of the element.
Thanks for the help, i can play around with this to get what i want.

M


2011/6/23 Uwe Ligges 

>
>
> On 23.06.2011 16:39, Michael Pearmain wrote:
>
>> Hi All,
>>
>> I've been given a data file of the form:
>> 1: 3,4,5,6
>> 2:1,2,3
>> 43: 5,7,8,9,5
>>
>> and i want to read this data in as a list to create the form:
>> (guessing final look)
>> my.list
>> [[1]]
>> [1] 3 4 5 6
>>
>> [[2]]
>> [1] 1 2 3
>>
>> [[43]]
>> [1] 5 7 8 9 5
>>
>> I can get to a stage using scan:
>> scan("my.data", what = character(0), quiet = TRUE)
>> to load
>> [1] "1: 3,4,5,6"
>> [2] "2:1,2,3"
>> [3] "43: 5,7,8,9,5"
>>
>
>
> I don't understand why you want 40 empty list elements, but here is what
> you asked for (not optimized, just hacked in few seconds):
>
> temp <- strsplit(d, ":")
> num <- as.numeric(sapply(temp, "[[", 1))
> L <- vector(mode = "list", length = max(num))
> for(i in seq_along(temp)){
>L[[num[i]]] <- as.numeric(unlist(strsplit(**temp[[i]][2], ",")))
> }
> L
>
> Uwe Ligges
>
>
>
>  but im not sure on how next to proceed to arrange this into a list form,
>> can
>> anyone offer some advise?
>>
>> Thanks in advance
>>
>> Mike
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading List data into R with scan()

2011-06-23 Thread Michael Pearmain

Thanks All,

Henrique, gave me the solution is was looking for, the indexing was a
mistake on my part.

Thanks again

On 23 June 2011 16:37, David Winsemius  wrote:

>
> On Jun 23, 2011, at 11:19 AM, Uwe Ligges wrote:
>
>
>>
>> On 23.06.2011 16:39, Michael Pearmain wrote:
>>
>>> Hi All,
>>>
>>> I've been given a data file of the form:
>>> 1: 3,4,5,6
>>> 2:1,2,3
>>> 43: 5,7,8,9,5
>>>
>>> and i want to read this data in as a list to create the form:
>>> (guessing final look)
>>> my.list
>>> [[1]]
>>> [1] 3 4 5 6
>>>
>>> [[2]]
>>> [1] 1 2 3
>>>
>>> [[43]]
>>> [1] 5 7 8 9 5
>>>
>>> I can get to a stage using scan:
>>> scan("my.data", what = character(0), quiet = TRUE)
>>> to load
>>> [1] "1: 3,4,5,6"
>>> [2] "2:1,2,3"
>>> [3] "43: 5,7,8,9,5"
>>>
>>
>>
>> I don't understand why you want 40 empty list elements, but here is what
>> you asked for (not optimized, just hacked in few seconds):
>>
>> temp <- strsplit(d, ":")
>> num <- as.numeric(sapply(temp, "[[", 1))
>> L <- vector(mode = "list", length = max(num))
>> for(i in seq_along(temp)){
>>   L[[num[i]]] <- as.numeric(unlist(strsplit(**temp[[i]][2], ",")))
>> }
>> L
>>
>
> I wondered about that too. Perhaps he would be satisfied with alpha
> indexing:
>
> d <- c( "1: 3,4,5,6", "2:1,2,3", "43: 5,7,8,9,5")
>  temp <- strsplit(d, ":")
>  num <- sapply(temp, "[[", 1)
>  L <- vector(mode = "list")
>
>  for(i in seq_along(temp)){
>L[[num[i]]] <- as.numeric(unlist(strsplit(**temp[[i]][2], ",")))
>  }
>
> > L
> $`1`
>
> [1] 3 4 5 6
>
> $`2`
> [1] 1 2 3
>
> $`43`
> [1] 5 7 8 9 5
>
>
>  Uwe Ligges
>>
>>
>>
>>  but im not sure on how next to proceed to arrange this into a list form,
>>> can
>>> anyone offer some advise?
>>>
>>> Thanks in advance
>>>
>>> Mike
>>>
>>
>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using Match in a lookup table

2011-06-28 Thread Michael Pearmain

Hi All,

I'm having a few problems using match and a lookup table, previous Googling
show numerous solutions to matching a lookup table to a dataset,
My situation is slightly different as i have multiple lookup tables, (that i
cannot merge - for integrity reasons) that i wish to match against my data,
and each of these files is large, so lots of for / if conditions are not
ideal. (withstanding that i have multiple files of course)


For example,
I have data:

> v <- c("foo", "foo", "bar", "bar", "baz")
> id <- c(1,2)
> id2 <- c(3)
> name <- c("foo", "bar")
> name2 <- c("baz")
> df1 <- data.frame(id, name)
> df2 <- data.frame(id2, name2)

> v <- df1$id[match(v,df1$name)]
> v
[1]  1  1  2  2 NA

So here i actually want to return
> v
[1]  1  1  2  2 "baz"

so next time i can run
v <- df2$id[match(v,df2$name)]

and return:
> v
[1]  1  1  2  2 3

Any help very much appreciated

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using Match in a lookup table

2011-06-28 Thread Michael Pearmain

Thanks for the idea David,
My problem comes from having (say) upto 10 different match files, so nested
ifelse whilst would work doesn't seem and elegant solution,

However if needs must..

Mike

On 28 June 2011 14:39, David Winsemius  wrote:

>
> On Jun 28, 2011, at 6:18 AM, Michael Pearmain wrote:
>
>  Hi All,
>>
>> I'm having a few problems using match and a lookup table, previous
>> Googling
>> show numerous solutions to matching a lookup table to a dataset,
>> My situation is slightly different as i have multiple lookup tables, (that
>> i
>> cannot merge - for integrity reasons) that i wish to match against my
>> data,
>> and each of these files is large, so lots of for / if conditions are not
>> ideal. (withstanding that i have multiple files of course)
>>
>>
>> For example,
>> I have data:
>>
>>  v <- c("foo", "foo", "bar", "bar", "baz")
>>> id <- c(1,2)
>>> id2 <- c(3)
>>> name <- c("foo", "bar")
>>> name2 <- c("baz")
>>> df1 <- data.frame(id, name)
>>> df2 <- data.frame(id2, name2)
>>>
>>
>>  v <- df1$id[match(v,df1$name)]
>>> v
>>>
>> [1]  1  1  2  2 NA
>>
>
> A numeric vector.
>
>
>> So here i actually want to return
>>
>>> v
>>>
>> [1]  1  1  2  2 "baz"
>>
>
> Not possibly a numeric vector.
>
>
>
>> so next time i can run
>> v <- df2$id[match(v,df2$name)]
>>
>> and return:
>>
>>> v
>>>
>> [1]  1  1  2  2 3
>>
>
> You need to keep track of the successful matches in df1 and then ypu
> probably want to interleave them with matches in df2. Perhaps instead use
> ifelse to do the work for you:
>
> > ifelse(!is.na(match(v,df1$**name)), match(v,df1$name),
> match(v,df2$name2) )
>
> [1] 1 1 2 2 1
>
>
>
>> Any help very much appreciated
>>
>> Mike
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Convert ragged list to structured matrix efficiently

2011-12-20 Thread Michael Pearmain

Hi All,

I'm wanting to convert a ragged list of values into a structured matrix for
further analysis later on, i have a solution to this problem (below) but
i'm dealing with datasets upto 1GB in size, (i have 24GB of memory so can
load it) but it takes a LONG time to run the code on a large dataset.  I
was wondering if anyone had any tips or tricks that may make this run
faster?

Below is some sample code of what ive been doing, (in the full version i
use snowfall to spread the work via sfSapply)

bhvs <- c(1,2,3,4,5,6)
ragged.list <- list('23' = c(13,4,5,6,3,65,67,2),
'34' = c(1,2,3,4,56,7,8),
'45' = c(5,6,89,87,56))

# Define the matrix to store results
cluster.data <- as.data.frame(matrix(0, length(bhvs), nrow =
length(ragged.list)))
# Keep the names of the bhvs,
names(cluster.data) <- bhvs
cluster.data <- t(sapply(rep(1:length(ragged.list)), function (i) {
  cluster.data[i,] <- as.numeric(names(cluster.data) %in%
(ragged.list[[i]][]))
  return(cluster.data[i,])
}))
cluster.data <- matrix(unlist(cluster.data),
   ncol = ncol(cluster.data),
   dimnames = list(NULL, colnames(cluster.data)))
> cluster.data
 1 2 3 4 5 6
[1,] 0 1 1 1 1 1
[2,] 1 1 1 1 0 0
[3,] 0 0 0 0 1 1
>

The returned matrix is as i desire it, with the bhv being the colnames and
a binary for each row representing if it was present or not in that list

Many thanks in advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] missing value where TRUE/FALSE needed

2011-12-23 Thread Michael Pearmain

Merry Xmas to all,

I am writing a function and curiously this runs sometimes on one data set
and fails on another and i cannot figure out why.
Any help much appreciated.

If i run the code below with
data <- iris[ ,1:4]
The code runs fine, but if i run on a large dataset i get the following
error (showing data structures as matrix is large)

> str(cluster.data)
 num [1:9985, 1:811] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:811] "1073949105" "1073930585" "1073843224" "1073792624" ...
#(This is intended to be chr)
> str(iris)
'data.frame': 150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1
1 1 1 ...
> str(as.matrix(iris[,1:4]))
 num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

n.cols <- ncol(data)
  n.rows <- nrow(data)
  X <- as.matrix(data)
  stepsize <- 0.05
  c1 <- (2 * pi) ** (n.cols / 2)
  c2 <- n.rows * (smoothing ** (n.cols + 2))
  c3 <- n.rows * (smoothing ** n.cols)

  Kexp <- function(sqs){
return (exp((-1 * sqs) / (2 * smoothing ** 2)))
  }

  FindGradient <- function(x){
XmY <- t(x - t(X))
sqsum <- rowSums(XmY * XmY)
K <- sapply(sqsum, Kexp)
dens <- ((c1 * c3) ** -1) * sum(K)
grad <- -1 * ((c1 * c2) ** -1) * colSums(K * XmY)
return (list(gradient = grad,
 density = dens))
  }

  attractors <- matrix(0, n.rows, n.cols)
  densities <- matrix(0, n.rows)


> density.attractors <-
sapply(rep(1:n.rows), function(i) {
  notconverged <- TRUE
  # For each row loop through and find the attractor and density value.
  x <- (X[i, ])
  iters <- as.integer(1)
  # Run gradient ascent for each point to obtain x*
  while(notconverged == TRUE) {
find.gradient <- FindGradient(x)
next.x <- x + stepsize * find.gradient$gradient
change <- sqrt(sum((next.x - x) * (next.x - x)))
notconverged <- ifelse(change > tol, TRUE, FALSE)
x <- next.x
iters <- iters + 1
  }

  # store the attractor and density value
  return(c(densities[i, ] <- find.gradient$density,
   attractors[i, ] <- x))
})

Error in while (notconverged == TRUE) { :
  missing value where TRUE/FALSE needed
>

Any help would be great

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] missing value where TRUE/FALSE needed

2011-12-23 Thread Michael Pearmain

Apologies, I was using top = 0.0001

I had looked at browser and did show notconverged = NA.  But I couldn't
understand why it worked for one and not the other?



On Friday, 23 December 2011, jim holtman  wrote:
> Does this look similar to the error you are getting:
>
>> while(NA == TRUE) 1
> Error in while (NA == TRUE) 1 : missing value where TRUE/FALSE needed
>
> SO 'notconverged' is probably equal to NA.  BTW, what is the value of
> 'tol'; I do not see it defined.  So when computing 'notconverged' you
> have generated an NA.  You can test it to see if this is true.
>
> You can use the following command:
>
> options(error=utils::recover)
>
> and then learn how to use the 'browser' to examine variables when the
> error occurs.
>
> On Fri, Dec 23, 2011 at 5:44 AM, Michael Pearmain
>  wrote:
>> Merry Xmas to all,
>>
>> I am writing a function and curiously this runs sometimes on one data set
>> and fails on another and i cannot figure out why.
>> Any help much appreciated.
>>
>> If i run the code below with
>> data <- iris[ ,1:4]
>> The code runs fine, but if i run on a large dataset i get the following
>> error (showing data structures as matrix is large)
>>
>>> str(cluster.data)
>>  num [1:9985, 1:811] 0 0 0 0 0 0 0 0 0 0 ...
>>  - attr(*, "dimnames")=List of 2
>>  ..$ : NULL
>>  ..$ : chr [1:811] "1073949105" "1073930585" "1073843224" "1073792624"
...
>> #(This is intended to be chr)
>>> str(iris)
>> 'data.frame': 150 obs. of  5 variables:
>>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
>>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
>>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
>>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
>>  $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1
1 1
>> 1 1 1 ...
>>> str(as.matrix(iris[,1:4]))
>>  num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
>>  - attr(*, "dimnames")=List of 2
>>  ..$ : NULL
>>  ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
>>
>> n.cols <- ncol(data)
>>  n.rows <- nrow(data)
>>  X <- as.matrix(data)
>>  stepsize <- 0.05
>>  c1 <- (2 * pi) ** (n.cols / 2)
>>  c2 <- n.rows * (smoothing ** (n.cols + 2))
>>  c3 <- n.rows * (smoothing ** n.cols)
>>
>>  Kexp <- function(sqs){
>>return (exp((-1 * sqs) / (2 * smoothing ** 2)))
>>  }
>>
>>  FindGradient <- function(x){
>>XmY <- t(x - t(X))
>>sqsum <- rowSums(XmY * XmY)
>>K <- sapply(sqsum, Kexp)
>>dens <- ((c1 * c3) ** -1) * sum(K)
>>grad <- -1 * ((c1 * c2) ** -1) * colSums(K * XmY)
>>return (list(gradient = grad,
>> density = dens))
>>  }
>>
>>  attractors <- matrix(0, n.rows, n.cols)
>>  densities <- matrix(0, n.rows)
>>
>>
>>> density.attractors <-
>>sapply(rep(1:n.rows), function(i) {
>>  notconverged <- TRUE
>>  # For each row loop through and find the attractor and density
value.
>>  x <- (X[i, ])
>>  iters <- as.integer(1)
>>  # Run gradient ascent for each point to obtain x*
>>  while(notconverged == TRUE) {
>>find.gradient <- FindGradient(x)
>>next.x <- x + stepsize * find.gradient$gradient
>>change <- sqrt(sum((next.x - x) * (next.x - x)))
>>notconverged <- ifelse(change > tol, TRUE, FALSE)
>>x <- next.x
>>iters <- iters + 1
>>  }
>>
>>  # store the attractor and density value
>>  return(c(densities[i, ] <- find.gradient$density,
>>   attractors[i, ] <- x))
>>})
>>
>> Error in while (notconverged == TRUE) { :
>>  missing value where TRUE/FALSE needed
>>>
>>
>> Any help would be great
>>
>> Mike
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Assign and cmpfun

2012-01-06 Thread Michael Pearmain

Hi All,

I've just recently discovered the cmpfun function, and was wanting to to
create a function to assign this to all the functions i have created, but
without explicitly naming them.

I've achieved this with:

foo <- function(x) { print(x)}
bar <- function(x) { print(x + 1)}

> foo <- function(x) { print(x)}
> foo
function(x) { print(x)}
> cmpfun(foo)
function(x) { print(x)}

>

find.all.functions <- ls.str(mode = 'function')
 for(i in seq_along(find.all.functions)) {
assign(find.all.functions[i], cmpfun(get(find.all.functions[i])))
  }

But remember told that using assign is generally a bad idea, and ideally i
want to functionalize this to say something like:

CreateCompiledFunctions <- function() {
  find.all.functions <- ls.str(mode = 'function')
  for(i in seq_along(find.all.functions)) {
assign(find.all.functions[i], cmpfun(get(find.all.functions[i])))
  }
}


Does anyone have a better solution?

Thanks in advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merging two dataframes

2008-06-06 Thread Michael Pearmain

Hi All,

Newbie question for you all but i have been looking at the archieves and the
help dtuff to get a rough idea of what i want to do

I would like to merge two dataframes together based on a keyed variable in
one dataframe linking to the other dataframe.  Only some of the cases will
match but i would like to keep the others as well.

My dataframes have 67 and 28 cases respectively and i would like ot end uip
with one file 67 cases long (all 28 are matched cases).


I can use the merge command to merge two datasets together this but i still
get some
odd results, i'm using the code below;

ETC <- read.csv(file="CSV_Data2.csv",head=TRUE,sep=",")
'SURVEY <- read.csv(file="survey.csv",head=TRUE,sep=",")
'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid", by.ETC = "ord")

The merged file seems to have 1800 cases while the ETC data file only
has 67 and the SURVEY file only has 28.  (Reading the help it looks as if it
merges 1 case with all cases in the other file, which is not what i want)

The matching variables fields are the 'ord' field and the 'uid' field
Can anyone advise please?

-- 
Michael Pearmain

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Missing Data and applying

2008-06-09 Thread Michael Pearmain

Hi All,

Newbie question that i'm sure is easy, but i can't seem to apply properly

I read in a datafram from a CSV file and i want to tell R that from coloum
"n_0" to "n_32" the value "-1" is missing data
i was looking at the
is.na(xx) <- c(..,...,) idea but i can't seem to apply it properly, can
anyone offer advice?

On a side issue while i'm asking i have a an XML that i intend to use to add
value labels and variable labels to the dataframe (using a python script)
but i can't seem to find the syntax for adding value labels? i.e  1=Male
2=Female

the labels command doesn't look like the one i want to use, and i've
searched the archives but to no avail (maybe it's a too simple, but i have
looked)

Any help willing accepted

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calling functions

2008-06-17 Thread Michael Pearmain

Another newbie question.

I've written a function and saved the file as Xtabs.R, in a central place on
a network so others will be able ot use the function,
My question is how do i call this function?

I've tried to chance the working directory, and tried to load it via;
> library(Xtabs, lib.loc="//filer/common/technical/surveys/R_test")

but neither seem to work? the function inside is called CrossTable.

Mant thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Z test and proportions

2008-06-17 Thread Michael Pearmain

Hi All,

I have a table based on ordial data and i want to compare proportions and
i've seen in the pwr package i can use
power.prop.test

however i want to find out what the sig. value is based on n1,n2,p1,p2 and
this package doesn't contain this..
Does anyone know of a package that does or is it a case of writting a
function specifically for this?

Many thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Z test and proportions

2008-06-17 Thread Michael Pearmain

Yes my mistake,

I looked at the pwr.2p2n.test but i cannot place both n's and both p values
to determine the sig value
e,g *pwr.2p2n.test(h = , n1 = , n2 = , sig.level = , power = )

or am i missing someting obvious?

i did the sam ein SPSS using a macro and the following code:

COMPUTE n1 = Control_MAX .
COMPUTE n2 = Exposed_max.
COMPUTE x1 = Control.
COMPUTE x2 = Exposed.

COMPUTE p1 = x1/n1.
COMPUTE p2 = x2/n2.
COMPUTE phat = (x1 + x2) / (n1 + n2).
COMPUTE SE_phat = SQRT(phat * (1 - phat) * ((1/n1) + (1/n2))).
COMPUTE z = (p1 - p2) /SE_phat.
COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(z))).
COMPUTE SIGz_LTL = CDFNORM(Z).
COMPUTE SIGz_UTL = 1 - CDFNORM(Z).
COMPUTE SIG_Level = ABS(1-(1-CDFNORM(z))*2).
Compute p1p = p1*100.
Compute p2p = p2*100.
compute diff = p2p-p1p.
EXE.
Var lab p1p "Control Group %".
Var lab p2p "Exposed Group %".



*


On Tue, Jun 17, 2008 at 5:13 PM, Peter Dalgaard <[EMAIL PROTECTED]>
wrote:

> Michael Pearmain wrote:
> > Hi All,
> >
> > I have a table based on ordial data and i want to compare proportions and
> > i've seen in the pwr package i can use
> > power.prop.test
> >
> > however i want to find out what the sig. value is based on n1,n2,p1,p2
> and
> > this package doesn't contain this..
> > Does anyone know of a package that does or is it a case of writting a
> > function specifically for this?
> >
> >
> I think your wired got crossed somewhere:
>
> power.prop.test is not from the pwr package; however, pwr does contain
> pwr.2p2n.test, which looks like it does exactly what you want!
>
>
> > Many thanks in advance
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
>   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
> ~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907
>
>
>


-- 
Michael Pearmain
Senior Statistical Analyst


1st Floor, 180 Great Portland St. London W1W 5QZ
t +44 (0) 2032191684
[EMAIL PROTECTED]
[EMAIL PROTECTED]


Doubleclick is a part of the Google group of companies

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems with basic loop

2008-06-20 Thread Michael Pearmain

I'm having trouble creating a looping variable and i can't see wher ethe
problem arises from any hep gratfully appreciated

First create a table

x<-table(SURVEY$n_0,exposed)
> x
  exposed
   False True
  Under 16241
  16-19   689
  20-24  190   37
  25-34  555  204
  35-44  330   87
  45-54  198   65
  55-64   67   35
  65+ 108

Now ectors to store counts and column proportions

> xT<-x[,"True"]
> xF<-x[,"False"]
> yT<-x[,"True"]/colSums(x)
> yF<-x[,"False"]/colSums(x)

check length for dynamic looping
> length(yT)
[1] 8

now create loop
> for(i in 1:length(yT)){
+ pwr.2p2n.test(2*(asin(sqrt(yT[i]))-asin(sqrt(yF[i]))),n1=xT[i],n2=xF[i])
+ }
Error in pwr.2p2n.test(2 * (asin(sqrt(yT[i])) - asin(sqrt(yF[i]))), n1 =
xT[i],  :
  number of observations in the first group must be at least 2

this confuses me as if i enter the data as values the procedure works?

Thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with basic loop

2008-06-20 Thread Michael Pearmain

Thanks for the reply Peter,

> I did just see that i had put the first error message,(agreed rather an
> obvious error) in and not the second i received
>
> Warning message:
> In asin(sqrt(yF[i])) : NaNs produced
>
> The reason i'm looking at this is advert exposure True and False.
>
> I'm inspecting age to asses weather or not to weight data in order to
> normalise groups for later questions,
> The questions that i am looking at later on are not scale based questions
> so i cannot perform t-tests on these, so i thought the only viable way was
> to look at z-tests for proportions to check for post-hoc differences
>
> Any advise on other methods would be gratefully taken
>
>
>
> On Fri, Jun 20, 2008 at 11:14 AM, Peter Dalgaard <[EMAIL PROTECTED]>
> wrote:
>
>> Michael Pearmain wrote:
>> > I'm having trouble creating a looping variable and i can't see wher ethe
>> > problem arises from any hep gratfully appreciated
>> >
>> > First create a table
>> >
>> > x<-table(SURVEY$n_0,exposed)
>> >
>> >> x
>> >>
>> >   exposed
>> >False True
>> >   Under 16241
>> >   16-19   689
>> >   20-24  190   37
>> >   25-34  555  204
>> >   35-44  330   87
>> >   45-54  198   65
>> >   55-64   67   35
>> >   65+ 108
>> >
>> > Now ectors to store counts and column proportions
>> >
>> >
>> >> xT<-x[,"True"]
>> >> xF<-x[,"False"]
>> >> yT<-x[,"True"]/colSums(x)
>> >> yF<-x[,"False"]/colSums(x)
>> >>
>> >
>> > check length for dynamic looping
>> >
>> >> length(yT)
>> >>
>> > [1] 8
>> >
>> > now create loop
>> >
>> >> for(i in 1:length(yT)){
>> >>
>> > +
>> pwr.2p2n.test(2*(asin(sqrt(yT[i]))-asin(sqrt(yF[i]))),n1=xT[i],n2=xF[i])
>> > + }
>> > Error in pwr.2p2n.test(2 * (asin(sqrt(yT[i])) - asin(sqrt(yF[i]))), n1 =
>> > xT[i],  :
>> >   number of observations in the first group must be at least 2
>> >
>> > this confuses me as if i enter the data as values the procedure works?
>> >
>> > Thanks in advance
>> >
>> Er, the first row "under 16" has a count of 1 in the "True" column and
>> it confuses you that you get an error saying that you need at least 2??
>>
>> But what looks _really_ confused is what you are trying to do in the
>> first place: The p's you are passing to pwr.2p2n are the empirical
>> relative frequencies of the individual age groups. This sort of reverses
>> cause and effect (presumably the exposure does not cause middle age) and
>> it is pretty odd to compare a particular  row in a table with everything
>> else jumbled together but worse, it is post-hoc power calculation, which
>> is just a plain Bad Idea (as several people have pointed out before).
>>
>> --
>>   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
>>  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>>  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
>> ~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907
>>
>>
>>
>
>
> --
> Michael Pearmain
> Senior Statistical Analyst
>
>
> 1st Floor, 180 Great Portland St. London W1W 5QZ
> t +44 (0) 2032191684
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
>
>
> Doubleclick is a part of the Google group of companies

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Deleting multiple variables

2008-09-22 Thread Michael Pearmain

Hi All,
i have searched the web for a simple solution but have been unable to find
one.  Can anyone recommend a neat way of deleting multiple variable?
I see, i need to use dataframe$VAR<-NULL to get rid of one variable,
In my situation i need to delete all vars between two points.

I've used the 'which' function to find these out and have assigned to myvar
>myvars
[1]  2 17

but i can't figure out how i should apply this?

Should i loop through the values? (Psydo code below?)

for (x in c(myvars[1]:myvars[2]))
(M_UC$x<-NULL))

Any help gratful

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Contional

2008-09-23 Thread Michael Pearmain

Hi All,

I'm having trouble selecting rows to delete, that i can't seem to overcome.

Below is some sample data, i am trying to dedup the data based on each user,
and simultaneously the timestamp (at the side i have highlighted expected
row to be removed)

I've looked at the lag function but can't seem to make it work?

My logic ran along the lines of an ifelse statement and then remove after
that, but it doesn't seem to work? Any help appreciated

Let's call the data test

test$lag <- ifelse(test$user_id==lag(test$user_id)
& test$timestamp==lag(test$timestamp),1,0)

Can anyone help on this?

Mike



Source_type   timestampuser_id
75381   0 07-07-2008-21:03:55 848307909687
75379   1 07-07-2008-19:52:55 848307838407
75380   2 07-07-2008-19:54:14 848307838407
75378   1 07-07-2008-15:24:01 848285633277
75374   1 07-07-2008-13:39:17 848273633667
75377   2 07-07-2008-13:39:55 848273633667
75376   2 07-07-2008-13:39:55 848273633667Remove
75375   2 07-07-2008-13:56:05 848273633667
75373   1 07-07-2008-17:11:00 848272661427
75371   1 07-07-2008-13:19:00 848270431847
75372   2 07-07-2008-13:19:14 848270431847
75369   1 07-07-2008-12:49:16 848269676907   Remove
75370   2 07-07-2008-12:49:16 848269676907
75366   1 07-07-2008-13:29:15 848263484847
75368   2 07-07-2008-13:29:44 848263484847

Thanks in advance

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with aggregation

2008-10-02 Thread Michael Pearmain

Hi All,
I seem to be having a few troubles with aggregating data back onto the the
dataframe,
I want to take the max value of a user, and then apply this max value back
against all id's that match (i.e a one to many matching)
Can anyone offer any advice?  is there a better way of doing this?
Dummy data and code are listed below:-

dataset is called Mcookie

user_idc_we_conversion
1 1
1 0
1 0
2 1
2 1
3 0
3 0

new data

user_idc_we_conversionc_we_conversion
1 1  1
1 0  1
1 0  1
2 1  1
2 1  1
3 0  0
3 0  0

library(Hmisc)
myAgg<-summarize(Mcookie$c_we_conversion, by=Mcookie$user_id, FUN=max,
na.rm=TRUE)
names(myAgg)<- c("user_id","c_we_converter")
Mcookie <- merge(Mcookie, myAgg, by.x = "user_id", by.y = "user_id")


Thanks in advance,

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Time conversion

2008-10-02 Thread Michael Pearmain

I'm trying to convert a variable that is imported from CSV into a datetime,I'm
trying to use the strptime function but with no joy, can anyone offer any
advise?

i have a vector
timestamp
07-07-2008-21:03:55
07-07-2008-19:52:55
07-07-2008-19:54:14
07-07-2008-15:24:01
07-07-2008-13:39:17
07-07-2008-13:39:55


timestamp<-strptime(timestamp,"%d-%m-%y-%H:%M:%S")
## then filter on the datetime
time<-ifelse(timestamp> "07-08-2008-00:00:00", TRUE, FALSE)



-- 
Michael Pearmain
Senior Statistical Analyst


Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
[EMAIL PROTECTED]

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Timestamps and manipulations

2008-10-13 Thread Michael Pearmain

Hi All,
I've a couple of questions i've been struggling with using the time
features, can anyone help? sample data

   Timestampuser_id
  27/05/08 22:57 763830873067  27/05/08 23:00 763830873067  27/05/08 23:01
763830873067  27/05/08 23:01 763830873067  05/06/08 11:34 763830873067
 29/05/08
23:08 765253440317  29/05/08 23:06 765253440317  29/05/08 22:52
765253440317  29/05/08
22:52 765253440317  29/05/08 23:04 765253440317  27/06/08 19:34
765253440317  09/07/08
15:45 765329002557  06/07/08 19:24 765329002557  09/07/08 15:46
765329002557  07/07/08
13:05 765329002557  16/05/08 22:40 765329002557  08/06/08 11:24
765329002557  08/06/08
12:33 765329002557

My first question is how can i create a new var creating a filter based on a
date?

I've tried as.POSIXct.strptime below as well but to no avail.. can anyone
give any advice?

>Mcookie$timestamp <- as.POSIXct(strptime(Mcookie$timestamp,"%m/%d/%Y
%H:%M"))
>Mcookie$time <- ifelse(Mcookie$timestamp >
strptime("07-08-2008-00:00","%m-%d-%Y-%H:%M",1,0)

My second questions refers to finding the time difference in seconds between
the first time a user sees something Vs the last.. and engagment time
essentially,
i see there is the difftime function,  is there a more elegant way of
working this out then my thoughts (Pysdo code below)

sort data by user_id and Timestamp
take the head of user_id as new_time_var
take the tail of user_id as new_time_var2
use difftime(new_time_var, new_time_var2, units="secs")

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] comparing with lead function

2008-10-15 Thread Michael Pearmain

Hi All,
I've been trying to compare if the previous value in a variable is equal to
a binary value..(i.e i want to check if the last event was a yes or no)

i've been trying to write some code for this, but it seems overly elaborate,
can anyone suggest a better / shorter / neater way?
The below doesn't quite work but shows my idea of splitting by the factor
id, then creating a new vector that is lead, then i was going to use an
ifelse clause..

But as i suggested this seem very elaborate.. my sample code below


DF <- read.table(textConnection("timestamp;   id
2008-05-27 22:57:00; 763830873067
2008-05-27 23:00:00; 763830873067
2008-05-27 23:01:00; 763830873067
2008-05-27 23:01:00; 763830873067
2008-06-05 11:34:00; 763830873067
2008-05-29 23:08:00; 765253440317
2008-05-29 23:06:00; 765253440317
2008-05-29 22:52:00; 765253440317
2008-05-29 22:52:00; 765253440317
2008-05-29 23:04:00; 765253440317
2008-06-27 19:34:00; 765253440317
2008-07-09 15:45:00; 765329002557
2008-07-06 19:24:00; 765329002557
2008-07-09 15:46:00; 765329002557
2008-07-07 13:05:00; 765329002557
2008-05-16 22:40:00; 765329002557
2008-06-08 11:24:00; 765329002557
2008-06-08 12:33:00; 765329002557"),as.is
=TRUE,sep=";",strip.white=TRUE,header=TRUE)

closeAllConnections()

DF$time <- ifelse(DF$timestamp > as.POSIXct("2008-07-01"), 1, 0)

last_event <- lapply(split(test, test$ID), function(.df){
lead_func_temp <- c(NA,.df$TIME [ - length(.df$TIME)])
temp <-
data.frame(ID=as.character(.df$ID),TIME=.df$TIME,
DIFF=rep(lead_func_temp,nrow(.df)))
return(temp)
 })

DF$last_event <- do.call(rbind, last_event)


Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Paste in a FOR loop

2008-12-31 Thread Michael Pearmain

Hi All,
I've been having a little trouble using R2HTML and a loop, but can't figure
out where the problem lies, any hints gratefully received.

My code at the minute, (Which does work) is in the following:

library(R2HTML)
HTMLStart(outdir =
file.path("C://Example_work","R_projects","Dynamic_creative"),filename =
"RMDC_mockup",Title="Mock up for RMDC")

summary(z.out.1)
summary(s.out.1)
hist(s.out.1$qi$ev)
HTMLplot()
.
.
.
summary(z.out.3)
summary(s.out.3)
hist(s.out.3$qi$ev)
HTMLplot()
HTMLStop()

This seemed a rather long winded way of doing things to me and a simple for
loop should handle this, as later i want it to be dynamic for a number of
groups so my new code is(not working):

library(R2HTML)
HTMLStart(outdir =
file.path("C://Example_work","R_projects","Dynamic_creative"),filename =
"RMDC_mockup",Title="Mock up for RMDC")

for(group in 1:3){
paste("summary(z.out.", group, sep = "")
paste("summary(s.out.", group, sep = "")
paste("s.out.",group,"$qi$ev", sep = "")
HTMLplot()
}
HTMLStop()

Which returns the error
Error in dev.print(device = png, file = AbsGraphFileName, width = Width,  :
  no device to print from

Can anyone offer some advise here?

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] t.test in a loop

2009-01-28 Thread Michael Pearmain

Hi All,
I've been having a little trouble with creating a loop that will run a a
series of t.tests for inspection,
Below is the code i've tried, and some checks i've looked at.

I've used the get(paste()) idea as i was told previously that the use of the
eval should try and be avoided.


I've run a single syntax to check that my systax is correct and works
without any problems
> t.test(channel.data.train$News~channel.data.train$power)

Can anyone offer any advice?

Many thanks

Mike

> str(channel.data.train$power)
 num [1:9913] 0 0 0 0 0 0 0 0 0 0 ...
> summary(channel.data.train$power)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
 0.  0.  0.  0.2368  0.  1.
> names(channel.data.train)
 [1] "News"  "Entertainment" "Communicate"
 [4] "Lifestyle" "Games" "Music"
 [7] "Money" "Celebrity" "Shopping"
[10] "Sport" "Film"  "Travel"
[13] "Cars"  "Property"  "Chat"
[16] "Bet.Play.Win"  "config""exposed"
[19] "site"  "referrer"  "started"
[22] "last_viewed"   "num_views" "secs_since_viewed"
[25] "register"  "secs.na"   "power"
[28] "tt"
> for(i in names(channel.data.train[,c(1:16)])){
+
t.test(get(paste("channel.data.train$",i,"~channel.data.train$power",sep="")))
+ }
Error in get(paste("channel.data.train$", i, "~channel.data.train$power",
 :
  variable "channel.data.train$News~channel.data.train$power" was not found



-- 
Michael Pearmain
Senior Analytics Research Specialist


Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
mpearm...@google.com

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Forecasting with dlm

2009-03-11 Thread Michael Pearmain

Hi All,
I have a problem trying to forecast using the dlm package, can anyone offer
any advise?

I setup my problem as follows, (following the manual as much as possible)

data for example to run code

CostUSD <- c(27.24031,32.97051, 38.72474, 22.78394, 28.58938, 49.85973,
42.93949, 35.92468)
library(dlm)

buildFun <- function(x) {
dlmModPoly(1, dV = exp(x[1]), dW = exp(x[2]))
}
fit <- dlmMLE(CostUSD, parm = c(0,0), build = buildFun)
fit$conv
dlmCostUSD <- buildFun(fit$par)
V(dlmCostUSD)
W(dlmCostUSD)
#For comparison
StructTS(CostUSD, "level")

CostUSDFilt <- dlmFilter(CostUSD, dlmCostUSD)
CostUSDFore <- dlmForecast(CostUSDFilt, nAhead = 1)

after which i return the error message:

Error in mod$m[lastObsIndex, ] : incorrect number of dimensions

Can anyone offer any insight to this problem?

Thanks in advance

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Loop swith String replacement

2008-12-05 Thread Michael Pearmain

Hi All,
I'm trying to split my dataset, into multiple datasets that i'll analyse
later, i wanted to do this dynamically as i might need to rerun the code
later.
I was looking at doing this via a loop, (Are other methods more appropriate?
Would a function be better?)

However i'm not sure in R how do do string replacement within the loop in
order to create unique dataset names based on the number of 'Groups' i have

Many thanks

Mike

my code is as follows:

no.groups <-names(table(Conv$Group))
for (i in length(Conv$no.groups))
{
groupi <- subset(Conv, Conv$Group == i)
}


  Metal  Secs  cost Income stable Group
1   Chrome   6014  3.3458  1 2
2   Chrome   5110  1.8561  0 1
3   Chrome   2412  0.6304  0 1
4   Chrome   38 8  3.4183  1 2
5   Chrome   2512  2.7852  1 3
6   Chrome   6712  2.3866  1 1
7   Chrome   4012  4.2857  0 1
8   Chrome   5610  9.3205  1 1
9   Chrome   3212  3.8797  1 3
10  Chrome   7516  2.7031  1 3
11  Chrome   4615 11.2307  1 2
12  Chrome   5212  8.6696  1 2
13  Chrome   2212  1.7443  0 2
14  Chrome   6012  0.2253  0 2
15  Chrome   2414  4.3348      1 3




-- 
Michael Pearmain
Senior Analytics Research Specialist


Google UK Ltd
Belgrave House
76 Buckingham Palace Road
London SW1W 9TQ
United Kingdom
t +44 (0) 2032191684
[EMAIL PROTECTED]

If you received this communication by mistake, please don't forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person. Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

37 matches

Mail list logo