Re: [R] R with openblas and atlas

2013-11-01 Thread Simon Zehnder
There is no R code following


On 01 Nov 2013, at 05:34, Li Bowen  wrote:

> Hi,
> 
> I have been trying to build R with optimized BLAS library.
> 
> I am using a Ubuntu 13.10 x86_64 desktop, on which I am able to build R
> with openblas without any problem:
> 
> #BEGIN_SRC sh
> ./configure --enable-BLAS-shlib --enable-R-shlib LIBnn=lib --disable-nls 
> --with-blas="-L/usr/lib/openblas-base/ -lopenblas" --enable-memory-profiling
> make
> sudo make install
> #END_SRC
> 
> However, on redhat 5.9, I am not able to install openblas.
> Firstly, there is no pre-built package, even for later version of
> redhat.
> Secondly, I am not able to build openblas from source, actually not even
> able to install newer gcc from source.
> 
> I then tried to install ATLAS on redhat and build R with it. -t 2: 2 threads
> #BEGIN_SRC sh
> ../configure --shared -t 2 -b 64 -D c -DPentiumCPS=1600
> --with-netlib-lapack-tarfile=/path-to-lapack-3.4.2.tgz
> make build
> make check
> make ptcheck
> make time
> make install
> 
> # R
> ../configure --enable-BLAS-shlib --enable-R-shlib --disable-nls
> --enable-memory-profiling --with-blas="-L/usr/local/atlas/lib
> -lptf77blas -lpthread -latlas"
> make
> sudo make install
> #END_SRC
> 
> Installation is successful. However when I run the following code, only
> one thread is used.
> 
> I have looked through lots of manuals and forums and couldn't find an
> answer. Please advise. Thanks a lot.
> 
> -- 
> Sincerely,
> Bowen
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replace element with pattern

2013-11-01 Thread mohan . radhakrishnan
Hi,
 I have a data frame with one column and several rows of the form.

"Peak Usage: init:2359296, used:15859328, committed:15892480, 
max:50331648Current Usage : init:2359296, used:15857920, 
committed:15892480, max:50331648|---|"

I tested the regex 

 Current.*?[\|]

in an online tester which greedily matches upto the first 'pipe' character

Current Usage : init:2359296, used:15857920, committed:15892480, 
max:50331648|

This is what I want.

I tried to replace the entire rows using 

apply( y, 1, function(x) gsub(x,"Current.*?[/|]",x)) which didn't work.

How is this done ? I also want to recursively apply some more patterns one 
by one on the rows till I reduce it to exactly what I want. Is there a way 
to do this without loops ?

Thanks,
Mohan


This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Combinations of values in two columns

2013-11-01 Thread Thomas

I have data that looks like this:

Friend1, Friend2
A, B
A, C
B, A
C, D

And I'd like to generate some more rows and another column. In the new  
column I'd like to add a 1 beside all the existing rows. That bit's  
easy enough.


Then I'd like to add rows for all the possible directed combinations  
of rows not included in the existing data. So for the above I think  
that would be:


A, D
D, A
B, C
C, B
B, D
C, A
D, B
D, C

and then put a 0 in the column beside these.

Can anyone suggest how to do this?

I'm using R version 2.15.3.

Thank you,

Thomas Chesney
This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system, you 
are advised to perform your own checks. Email communications with the 
University of Nottingham may be monitored as permitted by UK legislation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count number of consecutive zeros by group

2013-11-01 Thread PIKAL Petr
Hi

Another option is sapply/split/sum construction

with(data, sapply(split(x, ID), function(x) sum(x==0)))

Regards
Petr


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Carlos Nasher
> Sent: Thursday, October 31, 2013 6:46 PM
> To: S Ellison
> Cc: r-help@r-project.org
> Subject: Re: [R] Count number of consecutive zeros by group
> 
> If I apply your function to my test data:
> 
> ID <- c(1,1,1,2,2,3,3,3,3)
> x <- c(1,0,0,0,0,1,1,0,1)
> data <- data.frame(ID=ID,x=x)
> rm(ID,x)
> 
> f2 <-   function(x) {
>   max( rle(x == 0)$lengths )
> }
> with(data, tapply(x, ID, f2))
> 
> the result is
> 1 2 3
> 2 2 2
> 
> which is not what I'm aiming for. It should be
> 1 2 3
> 2 2 1
> 
> I think f2 does not return the max of consecutive zeros, but the max of
> any consecutve number... Any idea how to fix this?
> 
> 
> 2013/10/31 S Ellison 
> 
> >
> >
> > > -Original Message-
> > > So I want to get the max number of consecutive zeros of variable x
> > > for
> > each
> > > ID. I found rle() to be helpful for this task; so I did:
> > >
> > > FUN <- function(x) {
> > >   rles <- rle(x == 0)
> > > }
> > > consec <- lapply(split(df[,2],df[,1]), FUN)
> >
> > You're probably better off with tapply and a function that returns
> > what you want. You're probably also better off with a data frame name
> > that isn't a function name, so I'll use dfr instead of df...
> >
> > dfr<- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups
> > numbered 1-5, equal size but that doesn't matter for tapply
> >
> > f2 <-   function(x) {
> > max( rle(x == 0)$lengths )
> > }
> > with(dfr, tapply(x, ID, f2))
> >
> >
> > S Ellison
> >
> >
> > ***
> > This email and any attachments are confidential. Any
> > u...{{dropped:24}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combinations of values in two columns

2013-11-01 Thread Simon Zehnder
You could use the data.table package
require(data.table)

DT <- data.table(Friend1 = sample(LETTERS, 10, replace = TRUE), Friend2 = 
sample(LETTERS, 10, replace = TRUE), Indicator = 1)
ALL <- data.table(unique(expand.grid(DT)))
setkey(ALL) 
OTHERS <- ALL[!DT]
OTHERS[, Indicator := 0]

RESULT <- rbind(DT, ALL)

Best

Simon



On 01 Nov 2013, at 10:32, Thomas  wrote:

> I have data that looks like this:
> 
> Friend1, Friend2
> A, B
> A, C
> B, A
> C, D
> 
> And I'd like to generate some more rows and another column. In the new column 
> I'd like to add a 1 beside all the existing rows. That bit's easy enough.
> 
> Then I'd like to add rows for all the possible directed combinations of rows 
> not included in the existing data. So for the above I think that would be:
> 
> A, D
> D, A
> B, C
> C, B
> B, D
> C, A
> D, B
> D, C
> 
> and then put a 0 in the column beside these.
> 
> Can anyone suggest how to do this?
> 
> I'm using R version 2.15.3.
> 
> Thank you,
> 
> Thomas Chesney
> This message and any attachment are intended solely for the addressee and may 
> contain confidential information. If you have received this message in error, 
> please send it back to me, and immediately delete it.   Please do not use, 
> copy or disclose the information contained in this message or in any 
> attachment.  Any views or opinions expressed by the author of this email do 
> not necessarily reflect the views of the University of Nottingham.
> 
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system, 
> you are advised to perform your own checks. Email communications with the 
> University of Nottingham may be monitored as permitted by UK legislation.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combinations of values in two columns

2013-11-01 Thread Chris Campbell
Hi Thomas,
   
It depends whether you'd like to include all levels of each column in every 
column. For including all values you could try something like this:
   
isAllDifferent <- function(z) !any(duplicated(z))   

myData <- data.frame(Friend1=c("a", "a", "b", "c"), Friend2=c("b", "c", "a", 
"d"), stringsAsFactors=FALSE)   
   
friends <- unique(unlist(myData, use.names=FALSE))   
  
allCombs <- do.call(expand.grid, rep(list(friends), ncol(myData)))
   
colnames(allCombs) <- colnames(myData)   
  
allCombs <- allCombs[apply(allCombs, 1, isAllDifferent),]
  
output <- cbind(allCombs, included=1*do.call(paste, allCombs)%in%do.call(paste, 
myData)) 
  
output[order(output$included, decreasing=TRUE),]  
   Friend1 Friend2 included   
2b   a1   
5a   b1   
9a   c1   
15   c   d1   
3c   a0   
4d   a0   
7c   b0   
8d   b0   
10   b   c0   
12   d   c0   
13   a   d0   
14   b   d0   
   

If you only want each column to contain its corresponding values, you could try 
something like this:   
   
myData <- data.frame(Friend1=c("a", "a", "b", "c"),   
Friend2=c("b", "c", "a", "d"), new = 1)  
   
newData <- expand.grid(Friend1 = unique(myData$Friend1),   
Friend2 = unique(myData$Friend2))  
   
output <- merge(myData, newData, all = TRUE)  
output$new[is.na(output$new)] <- 0
  
output   
   Friend1 Friend2 new   
1a   a   0   
2a   b   1   
3a   c   1  
4a   d   0   
5b   a   1  
6b   b   0   
7b   c   0   
8b   d   0  
9c   a   0  
10   c   b   0   
11   c   c   0
12   c   d   1
   
I hope this helps.   
   
Best wishes   
  
Chris   

Chris Campbell, PhD   
Tel. +44 (0) 1249 705 450 | Mobile. +44 (0) 7929 628349   
ccampb...@mango-solutions.com | http://www.mango-solutions.com   
Data Analysis that Delivers   
Mango Solutions 
2 Methuen Park, Chippenham, Wiltshire. SN14 OGB UK
   
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Thomas
Sent: 01 November 2013 09:32
To: r-help@r-project.org
Subject: [R] Combinations of values in two columns

I have data that looks like this:

Friend1, Friend2
A, B
A, C
B, A
C, D

And I'd like to generate some more rows and another column. In the new column 
I'd like to add a 1 beside all the existing rows. That bit's easy enough.

Then I'd like to add rows for all the possible directed combinations of rows 
not included in the existing data. So for the above I think that would be:

A, D
D, A
B, C
C, B
B, D
C, A
D, B
D, C

and then put a 0 in the column beside these.

Can anyone suggest how to do this?

I'm using R version 2.15.3.

Thank you,

Thomas Chesney
This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment may 
still contain software viruses which could damage your computer system, you are 
advised to perform your own checks. Email communications with the University of 
Nottingham may be monitored as permitted by UK legislation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--

LEGAL NOTICE\ \ This message is intended for the use of ...{{dropped:18}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with ggplot legend specification

2013-11-01 Thread Ista Zahn
You can override the legend aesthetics, e.g.,

ggplot(df,aes(x=Importance,y=Performance,fill=PBF,size=gapsize))+
geom_point(shape=21,colour="black")+
scale_size_area(max_size=pointsizefactor) +
scale_fill_discrete(guide = guide_legend(override.aes = list(size = 4)))

Best,
Ista

On Thu, Oct 31, 2013 at 4:08 PM, Conklin, Mike (GfK)
 wrote:
> I am creating a scatterplot with the following code.
>
>   pl<-ggplot(df,aes(x=Importance,y=Performance,fill=PBF,size=gapsize))+
>   
> geom_point(shape=21,colour="black")+scale_size_area(max_size=pointsizefactor)
>
> points are plotted where the size of the point is related to a metric 
> variable gapsize and the fill color on the point is related to the variable 
> PBF which is a 4 level factor.  This works exactly as I want with the points 
> varying in size based on the metric and being color coded.  I get 2 legends 
> on the side of the plot, one related to the size of the dot and the other 
> showing the color coding. The problem is that the dots on the color coding 
> legend are so small that it is impossible to discern what color they are. The 
> dots in the plot are large, so it is clear what colors they are, but the 
> legend is useless.  How can I increase the size of the points in the color 
> legend.
>
> pointsizefactor<-5
>
> df
>
> Importance Performance gapsize labels   PBF
> q50451   0.7079463  -0.7213622   2  a W
> q50452   0.4489164  -0.5552116   1  b G
> q50453   0.7714138  -0.6940144   5  c F
> q50454   0.6284830  -0.6011352   3  d S
> q50455   0.7131063  -0.6800826   4  e G
> q50456   0.7038184  -0.6026832   6  f S
> q50457   0.5201238  -0.3539732   8  g G
> q50458   0.9195046  -0.8214654   2  h F
> q50459   0.3797730  -0.4184727   1  i W
> q504510  0.8065015  -0.6305470   7  j G
> q504511  0.6062951  -0.4442724   6  k S
> q504512  0.6253870  -0.4478844   8  l G
> q504513  0.3813209  -0.4102167   2  m W
> q504514  0.3813209  -0.3436533   3  n F
> q504515  0.5185759  -0.4365325   5  o G
> q504516  0.5872033  -0.4556244   6  p S
> q504518  0.5397317  -1.000   1  q S
> q504519  0.5882353  -0.4674923   9  r S
> q504520  0.4205366  -0.4164087   4  s W
> q504521  0.7616099  -0.3323013  10  t F
> q504522  0.7213622  -0.6088751   7  u G
> q504523  0.6780186  -0.6130031   8  v G
> q504524  0.6904025  -0.3937049  10  w W
> q504525  0.4143447  -0.4669763   4  x W
> q504526  0.5779154  -0.2982456   9  y F
> q504527  0.6718266  -0.3457172  10  z G
>
>
> Thanks all
>
> //Mike
>
> W. Michael Conklin
> Executive Vice President | Marketing Science
> GfK Custom Research, LLC | 8401 Golden Valley Road | Minneapolis, MN, 55427
> T +1 763 417 4545 | M +1 612 567 8287
> www.gfk.com
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting values from a ecdf (empirical cumulative distribution function) curve

2013-11-01 Thread Manoranjan Muthusamy
Thanks, Bill & Duncan. Actually I tried values which are inside the defined
region. please find below the extracted script

> xnew<-rlnorm(seq(0,400,1), meanlog=9.7280055, sdlog=2.0443945)
> f <- ecdf(xnew)
> y <- f(x)
> y1<-f(200)## finding y for a given xnew value of
200
> y1
[1] 0.9950125## It works.

> inv_ecdf <- function(f){
+ xnew <- environment(f)$xnew
+ y <- environment(f)$y
+ approxfun(y, xnew)
+ }
## Interpolation to find xnew for a known y value.

> g <- inv_ecdf(f)
> g(0.9950125)
[1] NA
> g(0.99)  ## It doesn't
[1] NA
> g(0.5)
[1] NA ## again
> g(0.2)
[1] NA ## and again


I am stuck here. Any help is appreciated.

Mano.


On Fri, Nov 1, 2013 at 2:48 AM, William Dunlap  wrote:

> > it gives 'NA' (for whatever y value).
>
> What 'y' values were you using?  inf_f maps probabilities (in [0,1]) to
> values in the range of the orginal data, x, but it will have problems for
> a probability below 1/length(x) because the original data didn't tell
> you anything about the ecdf in that region.
>
>> X <- c(101, 103, 107, 111)
>> f <- ecdf(X)
>> inv_f <- inv_ecdf(f)
>> inv_f(seq(0, 1, by=1/8))
>[1]  NA  NA 101 102 103 105 107 109 111
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Manoranjan Muthusamy
> > Sent: Thursday, October 31, 2013 6:18 PM
> > To: Rui Barradas
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Extracting values from a ecdf (empirical cumulative
> distribution function)
> > curve
> >
> > Thank you, Barradas. It works when finding y, but when I tried to find x
> > using interpolation for a known y it gives 'NA' (for whatever y value). I
> > couldn't find out the reason. Any help is really appreciated.
> >
> > Thanks,
> > Mano
> >
> >
> > On Thu, Oct 31, 2013 at 10:53 PM, Rui Barradas 
> wrote:
> >
> > > Hello,
> > >
> > > As for the problem of finding y given the ecdf and x, it's very easy,
> just
> > > use the ecdf:
> > >
> > > f <- ecdf(rnorm(100))
> > >
> > > x <- rnorm(10)
> > > y <- f(x)
> > >
> > > If you want to get the x corresponding to given y, use linear
> > > interpolation.
> > >
> > > inv_ecdf <- function(f){
> > > x <- environment(f)$x
> > > y <- environment(f)$y
> > > approxfun(y, x)
> > > }
> > >
> > > g <- inv_ecdf(f)
> > > g(0.5)
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > Em 31-10-2013 12:25, Manoranjan Muthusamy escreveu:
> > >
> > >> Hi R users,
> > >>
> > >> I am a new user, still learning basics of R. Is there anyway to
> extract y
> > >> (or x) value for a known x (or y) value from ecdf (empirical
> cumulative
> > >> distribution function) curve?
> > >>
> > >> Thanks in advance.
> > >> Mano.
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
> > >> __**
> > >> R-help@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/**listinfo/r-
> > help
> > >> PLEASE do read the posting guide http://www.R-project.org/**
> > >> posting-guide.html 
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace element with pattern

2013-11-01 Thread jim holtman
try this:

> x <- rbind("Peak Usage: init:2359296, used:15859328, 
> committed:15892480,max:50331648Current Usage : init:2359296, 
> used:15857920,committed:15892480, max:50331648|---|")
> apply(x, 1, function(a) sub("(Current.*?[/|]).*", "\\1", a))
[1] "Peak Usage: init:2359296, used:15859328,
committed:15892480,max:50331648Current Usage : init:2359296,
used:15857920,committed:15892480, max:50331648|"
>
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Nov 1, 2013 at 4:09 AM,   wrote:
> Hi,
>  I have a data frame with one column and several rows of the form.
>
> "Peak Usage: init:2359296, used:15859328, committed:15892480,
> max:50331648Current Usage : init:2359296, used:15857920,
> committed:15892480, max:50331648|---|"
>
> I tested the regex
>
>  Current.*?[\|]
>
> in an online tester which greedily matches upto the first 'pipe' character
>
> Current Usage : init:2359296, used:15857920, committed:15892480,
> max:50331648|
>
> This is what I want.
>
> I tried to replace the entire rows using
>
> apply( y, 1, function(x) gsub(x,"Current.*?[/|]",x)) which didn't work.
>
> How is this done ? I also want to recursively apply some more patterns one
> by one on the rows till I reduce it to exactly what I want. Is there a way
> to do this without loops ?
>
> Thanks,
> Mohan
>
>
> This e-Mail may contain proprietary and confidential information and is sent 
> for the intended recipient(s) only.  If by an addressing or transmission 
> error this mail has been misdirected to you, you are requested to delete this 
> mail immediately. You are also hereby notified that any use, any form of 
> reproduction, dissemination, copying, disclosure, modification, distribution 
> and/or publication of this e-mail message, contents or its attachment other 
> than by its intended recipient/s is strictly prohibited.
>
> Visit us at http://www.polarisFT.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace element with pattern

2013-11-01 Thread mohan . radhakrishnan
Thanks.

I converted my data structure( that is most of the confusion in my case ) 
into a data frame and then applied this function

y <- apply( y, 1, function(z) str_extract(z,"Current.*?[/|]"))

to get

"Current Usage : init:2359296, used:15857920, committed:15892480, 
max:50331648|"

Mohan



From:   jim holtman 
To: mohan.radhakrish...@polarisft.com
Cc: R mailing list 
Date:   11/01/2013 05:17 PM
Subject:Re: [R] Replace element with pattern



try this:

> x <- rbind("Peak Usage: init:2359296, used:15859328, 
committed:15892480,max:50331648Current Usage : init:2359296, 
used:15857920,committed:15892480, max:50331648|---|")
> apply(x, 1, function(a) sub("(Current.*?[/|]).*", "\\1", a))
[1] "Peak Usage: init:2359296, used:15859328,
committed:15892480,max:50331648Current Usage : init:2359296,
used:15857920,committed:15892480, max:50331648|"
>
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Nov 1, 2013 at 4:09 AM,   
wrote:
> Hi,
>  I have a data frame with one column and several rows of the 
form.
>
> "Peak Usage: init:2359296, used:15859328, committed:15892480,
> max:50331648Current Usage : init:2359296, used:15857920,
> committed:15892480, max:50331648|---|"
>
> I tested the regex
>
>  Current.*?[\|]
>
> in an online tester which greedily matches upto the first 'pipe' 
character
>
> Current Usage : init:2359296, used:15857920, committed:15892480,
> max:50331648|
>
> This is what I want.
>
> I tried to replace the entire rows using
>
> apply( y, 1, function(x) gsub(x,"Current.*?[/|]",x)) which didn't work.
>
> How is this done ? I also want to recursively apply some more patterns 
one
> by one on the rows till I reduce it to exactly what I want. Is there a 
way
> to do this without loops ?
>
> Thanks,
> Mohan
>
>
> This e-Mail may contain proprietary and confidential information and is 
sent for the intended recipient(s) only.  If by an addressing or 
transmission error this mail has been misdirected to you, you are 
requested to delete this mail immediately. You are also hereby notified 
that any use, any form of reproduction, dissemination, copying, 
disclosure, modification, distribution and/or publication of this e-mail 
message, contents or its attachment other than by its intended recipient/s 
is strictly prohibited.
>
> Visit us at http://www.polarisFT.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 - how to get rid of bar boarder lines

2013-11-01 Thread Dimitri Liakhovitski
Thank you very much, guys - it worked beautifully.


On Thu, Oct 31, 2013 at 7:55 AM, John Kane  wrote:

> At a guess, don't use colour.
>
> John Kane
> Kingston ON Canada
>
>
> > -Original Message-
> > From: dimitri.liakhovit...@gmail.com
> > Sent: Wed, 30 Oct 2013 14:11:37 -0400
> > To: r-help@r-project.org
> > Subject: [R] ggplot2 - how to get rid of bar boarder lines
> >
> > Hello!
> >
> > I am using ggplot2:
> >
> > ggplot(myplotdata, aes(x=att_levels, y=WTP)) +
> > geom_bar(stat="identity",fill="dark
> > orange",colour="black",
> > alpha = 1,position = "identity") +
> >
> geom_text(aes(label=WTP),colour="black",size=4,hjust=1.1,position='dodge')
> > +
> > coord_flip() +
> > xlab("") +
> > ylab("")
> >
> > How could I get rid of the border lines on the bars (just leave the fill,
> > but no border)?
> > Thank you!
> >
> > --
> > Dimitri Liakhovitski
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> 
> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
> http://www.inbox.com/smileys
> Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and
> most webmails
>
>
>


-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] computation of hessian matrix

2013-11-01 Thread IZHAK shabsogh
below is a code to compute hessian matrix , which i need to generate 29 number 
of different matrices for example first element in x1 and x2 is use to generate 
let say matrix (M1) and second element in x1 and x2 give matrix (M2) upto  
matrix (M29) corresponding to the total number of observations and b1 and b2 
are constant. 
can some one guide me or help to implement this please. I did not understand 
how to construct the loop which i think it should be

about 3 dfferent loops
example i = 1 to 29 number of matrices
    j1=1 t0 2 row of matirx
                j2= 1 to 2 ncol of matrix


x1<-c(5.548,4.896,1.964,3.586,3.824,3.111,3.607,3.557,2.989,18.053,3.773,1.253,2.094,2.726,1.758,5.011,2.455,0.913,0.890,2.468,4.168,4.810,34.319,1.531,1.481,2.239,4.204,3.463,1.727)
y<-c(2.590,3.770,1.270,1.445,3.290,0.930,1.600,1.250,3.450,1.096,1.745,1.060,0.890,2.755,1.515,4.770,2.220,0.590,0.530,1.910,4.010,1.745,1.965,2.555,0.770,0.720,1.730,2.860,0.760)
x2<-c(0.137,2.499,0.419,1.699,0.605,0.677,0.159,1.699,0.340,2.899,0.082,0.425,0.444,0.225,0.241,0.099,0.644,0.266,0.351,0.027,0.030,3.400,1.499,0.351,0.082,0.518,0.471,0.036,0.721)
b1<-4.286b2<-1.362

n<-29
for(i in 1:n){
    gh<-matrix(0,2,2)
exp0<-(1+b1*x2^b2)
exp1<-x1*x2^b2*log(x2)
exp3<-x1*b1*x2^b2*(log(x2))^2
gh[1,1]<-2*x2^(2*b2)*exp0/exp0^4
gh[1,2]<--(exp0^2*exp1 - 2*b1*x2^b2*exp0*exp1)/exp0^4
gh[2,1]<--(exp3*exp0^2-2*exp0*b1^2*x2^b2*log(x2)*exp1)/exp0^4
gh[2,2]<--(exp1*exp0^2-2*exp0*x2^b2*b1*exp1)/exp0^4
}
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Load Tawny package on R 2.15.3

2013-11-01 Thread Tstudent


I have R version 2.15.3 When i try to load it:

library (tawny)

i receive this response:

package ‘parser’ could not be loaded

The package Parser in not on Cran anymore, it seems a dead project!
http://cran.r-project.org/web/packages/parser/index.html

If i try to manual install parser_0.1.tar.gz i receive an error and can't
install it.

The question is. Is today impossible to use tawny package? Is there a way to
solve this problem that seems caused by parser package?

Thank you very much

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate function output

2013-11-01 Thread Daniel Fernandes
 

Hello,

 

I´m using function aggregate in R 3.0.2.  If I run the instruction
x<-aggregate(cbind(mpg,hp)~cyl+gear,data=mtcars,quantile) I get the
result the following data.frame:

 

cyl

gear

mpg.0%

mpg.25%

mpg.50%

mpg.75%

mpg.100%

hp.0%

hp.25%

hp.50%

hp.75%

hp.100%

4

3

21.5

21.5

21.5

21.5

21.5

97

97

97

97

97

6

3

18.1

18.925

19.75

20.575

21.4

105

106.25

107.5

108.75

110

8

3

10.4

14.05

15.2

16.625

19.2

150

175

180

218.75

245

4

4

21.4

22.8

25.85

30.9

33.9

52

64.25

66

93.5

109

6

4

17.8

18.85

20.1

21

21

110

110

116.5

123

123

4

5

26

27.1

28.2

29.3

30.4

91

96.5

102

107.5

113

6

5

19.7

19.7

19.7

19.7

19.7

175

175

175

175

175

8

5

15

15.2

15.4

15.6

15.8

264

281.75

299.5

317.25

335

 

So far so good, however the strange part happens when I run dim(x) or
names(x), because the results are 8 4 (dim(x)) and "cyl"  "gear" "mpg"
"hp" (names(x)). Why this occurs and how do I transform it in a regular
data.frame with 12 columns?

 

Thank you in advance, 

 

Daniel

 

> sessionInfo()

R version 3.0.2 (2013-09-25)

Platform: i386-w64-mingw32/i386 (32-bit)

 

locale:

[1] LC_COLLATE=Portuguese_Portugal.1252
LC_CTYPE=Portuguese_Portugal.1252
LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C   


[5] LC_TIME=Portuguese_Portugal.1252

 

attached base packages:

[1] grDevices datasets  splines   graphics  stats tcltk utils   
methods   base 

 

other attached packages:

[1] svSocket_0.9-55 TinnR_1.0-5 R2HTML_2.2.1Hmisc_3.12-2
Formula_1.1-1   survival_2.37-4

 

loaded via a namespace (and not attached):

[1] cluster_1.14.4  grid_3.0.2  lattice_0.20-23 rpart_4.1-3
svMisc_0.9-69   tools_3.0.2   

 



"Confidencialidade: Esta mensagem (e eventuais ficheiros anexos) é destinada 
exclusivamente às pessoas nela indicadas e tem natureza confidencial. Se 
receber esta mensagem por engano, por favor contacte o remetente e elimine a 
mensagem e ficheiros, sem tomar conhecimento do respectivo conteúdo e sem 
reproduzi-la ou divulgá-la.

Confidentiality Warning: This e-mail message (and any attached files) is 
confidential and is intended solely for the use of the individual or entity to 
whom it is addressed. lf you are not the intended recipient of this message 
please notify the sender and delete and destroy all copies immediately."


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace element with pattern

2013-11-01 Thread arun


Hi,

Try this:

Lines1 <- readLines(textConnection("Peak Usage    : init:2359296, 
used:15859328, committed:15892480,max:50331648Current Usage : init:2359296 
used:15857920,committed:15892480,max:50331648|---|
Peak Usage    : init:2359296, used:15859328, 
committed:15892480,max:50331648Current Usage : init:2359296 
used:15857920,committed:15892480,max:50331648|---|"))

data.frame(Col1=as.matrix(gsub("(.*?[/|])","\\1",Lines1))) #Assuming that you 
want to read it from Peak Usage to the first "|":

#If it is from Current Usage to "|"

data.frame(Col1=as.matrix(gsub("^.*(Current.*?[/|]).*","\\1",Lines1)))


A.K.


On Friday, November 1, 2013 4:11 AM, "mohan.radhakrish...@polarisft.com" 
 wrote:
Hi,
         I have a data frame with one column and several rows of the form.

"Peak Usage    : init:2359296, used:15859328, committed:15892480, 
max:50331648Current Usage : init:2359296, used:15857920, 
committed:15892480, max:50331648|---|"

I tested the regex 

Current.*?[\|]

in an online tester which greedily matches upto the first 'pipe' character

Current Usage : init:2359296, used:15857920, committed:15892480, 
max:50331648|

This is what I want.

I tried to replace the entire rows using 

apply( y, 1, function(x) gsub(x,"Current.*?[/|]",x)) which didn't work.

How is this done ? I also want to recursively apply some more patterns one 
by one on the rows till I reduce it to exactly what I want. Is there a way 
to do this without loops ?

Thanks,
Mohan


This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count number of consecutive zeros by group

2013-11-01 Thread arun
I think this gives a different result than the one OP asked for:

df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(1, 0, 
0, 1, 0, 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0)), .Names = c("ID", 
"x"), row.names = c(NA, -22L), class = "data.frame")

with(df1, sapply(split(x, ID), function(x) sum(x==0)))

with(df1,tapply(x,list(ID),function(y) {rl <- rle(!y); 
max(c(0,rl$lengths[rl$values]))}))


A.K.


On Friday, November 1, 2013 6:01 AM, PIKAL Petr  wrote:
Hi

Another option is sapply/split/sum construction

with(data, sapply(split(x, ID), function(x) sum(x==0)))

Regards
Petr


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Carlos Nasher
> Sent: Thursday, October 31, 2013 6:46 PM
> To: S Ellison
> Cc: r-help@r-project.org
> Subject: Re: [R] Count number of consecutive zeros by group
> 
> If I apply your function to my test data:
> 
> ID <- c(1,1,1,2,2,3,3,3,3)
> x <- c(1,0,0,0,0,1,1,0,1)
> data <- data.frame(ID=ID,x=x)
> rm(ID,x)
> 
> f2 <-   function(x) {
>   max( rle(x == 0)$lengths )
> }
> with(data, tapply(x, ID, f2))
> 
> the result is
> 1 2 3
> 2 2 2
> 
> which is not what I'm aiming for. It should be
> 1 2 3
> 2 2 1
> 
> I think f2 does not return the max of consecutive zeros, but the max of
> any consecutve number... Any idea how to fix this?
> 
> 
> 2013/10/31 S Ellison 
> 
> >
> >
> > > -Original Message-
> > > So I want to get the max number of consecutive zeros of variable x
> > > for
> > each
> > > ID. I found rle() to be helpful for this task; so I did:
> > >
> > > FUN <- function(x) {
> > >   rles <- rle(x == 0)
> > > }
> > > consec <- lapply(split(df[,2],df[,1]), FUN)
> >
> > You're probably better off with tapply and a function that returns
> > what you want. You're probably also better off with a data frame name
> > that isn't a function name, so I'll use dfr instead of df...
> >
> > dfr<- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups
> > numbered 1-5, equal size but that doesn't matter for tapply
> >
> > f2 <-   function(x) {
> >         max( rle(x == 0)$lengths )
> > }
> > with(dfr, tapply(x, ID, f2))
> >
> >
> > S Ellison
> >
> >
> > ***
> > This email and any attachments are confidential. Any
> > u...{{dropped:24}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-01 Thread Adams, Jean
Lorenzo,

I may be able to help you get started.  You can use the XML package to grab
the information off the internet.

library(XML)

mylines <- readLines(url("http://bit.ly/1coCohq";))
closeAllConnections()
mylist <- readHTMLTable(mylines, asText=TRUE)
mytable <- mylist1$xTable

However, when I look at the resulting object, mytable, it doesn't have
informative row or column headings.  Perhaps someone else can figure out
how to get that information.

Jean





On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
wrote:

> Dear All,
> I often need to do some work on some data which is publicly available on
> the EUROSTAT website.
> I saw several ways to download automatically mainly the bulk data from
> EUROSTAT to later on postprocess it with R, for instance
>
> http://bit.ly/HrDICj
> http://bit.ly/HrDL10
> http://bit.ly/HrDTgT
>
> However, what I would like to do is to be able to download directly the
> csv file corresponding to a properly formatted dataset (typically a dynamic
> dataset) from EUROSTAT.
> To fix the ideas, please consider the dataset at the following link
>
> http://bit.ly/1coCohq
>
> what I would like to do is to automatically read its content into R, or at
> least to automatically download it as a csv file (full extraction, single
> file, no flags and footnotes) which I can then manipulate easily.
> Any suggestion is appreciated.
> Cheers
>
> Lorenzo
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function output

2013-11-01 Thread Adams, Jean
Daniel,

You can see better what is going on if you look at

as.list(x)

There you can see that cyl and gear are vectors but mpg and hp are matrices.
You can rearrange them using the do.call() function

x2 <- do.call(cbind, x)
dim(x2)

Jean


On Fri, Nov 1, 2013 at 7:08 AM, Daniel Fernandes wrote:

>
>
> Hello,
>
>
>
> I´m using function aggregate in R 3.0.2.  If I run the instruction
> x<-aggregate(cbind(mpg,hp)~cyl+gear,data=mtcars,quantile) I get the
> result the following data.frame:
>
>
>
> cyl
>
> gear
>
> mpg.0%
>
> mpg.25%
>
> mpg.50%
>
> mpg.75%
>
> mpg.100%
>
> hp.0%
>
> hp.25%
>
> hp.50%
>
> hp.75%
>
> hp.100%
>
> 4
>
> 3
>
> 21.5
>
> 21.5
>
> 21.5
>
> 21.5
>
> 21.5
>
> 97
>
> 97
>
> 97
>
> 97
>
> 97
>
> 6
>
> 3
>
> 18.1
>
> 18.925
>
> 19.75
>
> 20.575
>
> 21.4
>
> 105
>
> 106.25
>
> 107.5
>
> 108.75
>
> 110
>
> 8
>
> 3
>
> 10.4
>
> 14.05
>
> 15.2
>
> 16.625
>
> 19.2
>
> 150
>
> 175
>
> 180
>
> 218.75
>
> 245
>
> 4
>
> 4
>
> 21.4
>
> 22.8
>
> 25.85
>
> 30.9
>
> 33.9
>
> 52
>
> 64.25
>
> 66
>
> 93.5
>
> 109
>
> 6
>
> 4
>
> 17.8
>
> 18.85
>
> 20.1
>
> 21
>
> 21
>
> 110
>
> 110
>
> 116.5
>
> 123
>
> 123
>
> 4
>
> 5
>
> 26
>
> 27.1
>
> 28.2
>
> 29.3
>
> 30.4
>
> 91
>
> 96.5
>
> 102
>
> 107.5
>
> 113
>
> 6
>
> 5
>
> 19.7
>
> 19.7
>
> 19.7
>
> 19.7
>
> 19.7
>
> 175
>
> 175
>
> 175
>
> 175
>
> 175
>
> 8
>
> 5
>
> 15
>
> 15.2
>
> 15.4
>
> 15.6
>
> 15.8
>
> 264
>
> 281.75
>
> 299.5
>
> 317.25
>
> 335
>
>
>
> So far so good, however the strange part happens when I run dim(x) or
> names(x), because the results are 8 4 (dim(x)) and "cyl"  "gear" "mpg"
> "hp" (names(x)). Why this occurs and how do I transform it in a regular
> data.frame with 12 columns?
>
>
>
> Thank you in advance,
>
>
>
> Daniel
>
>
>
> > sessionInfo()
>
> R version 3.0.2 (2013-09-25)
>
> Platform: i386-w64-mingw32/i386 (32-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=Portuguese_Portugal.1252
> LC_CTYPE=Portuguese_Portugal.1252
> LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
>
>
> [5] LC_TIME=Portuguese_Portugal.1252
>
>
>
> attached base packages:
>
> [1] grDevices datasets  splines   graphics  stats tcltk utils
> methods   base
>
>
>
> other attached packages:
>
> [1] svSocket_0.9-55 TinnR_1.0-5 R2HTML_2.2.1Hmisc_3.12-2
> Formula_1.1-1   survival_2.37-4
>
>
>
> loaded via a namespace (and not attached):
>
> [1] cluster_1.14.4  grid_3.0.2  lattice_0.20-23 rpart_4.1-3
> svMisc_0.9-69   tools_3.0.2
>
>
>
>
>
> "Confidencialidade: Esta mensagem (e eventuais ficheiros anexos) é
> destinada exclusivamente às pessoas nela indicadas e tem natureza
> confidencial. Se receber esta mensagem por engano, por favor contacte o
> remetente e elimine a mensagem e ficheiros, sem tomar conhecimento do
> respectivo conteúdo e sem reproduzi-la ou divulgá-la.
>
> Confidentiality Warning: This e-mail message (and any attached files) is
> confidential and is intended solely for the use of the individual or entity
> to whom it is addressed. lf you are not the intended recipient of this
> message please notify the sender and delete and destroy all copies
> immediately."
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread Magnus Thor Torfason

Pretty much what the subject says:

I used an env as the basis for a Hashtable in R, based on information 
that this is in fact the way environments are implemented under the hood.


I've been experimenting with doubling the number of entries, and so far 
it has seemed to be scaling more or less linearly, as expected.


But as I went from 17 million entries to 34 million entries, the 
completion time has gone from 18 hours, to 5 days and counting.



The keys and values are in all cases strings of equal length.

One might suspect that the slow-down might have to do with the memory 
being swapped to disk, but from what I know about my computing 
environment, that should not be the case.


So my first question:
Is anyone familiar with anything in the implementation of environments 
that would limit their use or slow them down (faster than O(nlog(n)) as 
the number of entries is increased?


And my second question:
I realize that this is not strictly what R environments were designed 
for, but this is what my algorithm requires: I must go through these 
millions of entries, storing them in the hash table and sometimes 
retrieving them along the way, in a more or less random manner, which is 
contingent on the data I am encountering, and on the contents of the 
hash table at each moment.


Does anyone have a good recommendation for alternatives to implement 
huge, fast, table-like structures in R?


Best,
Magnus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function output

2013-11-01 Thread arun
Hi,
Try:
do.call(data.frame,c(x,check.names=FALSE))
A.K.


Hello, 

  

I´m using function aggregate in R 3.0.2.  If I run the instruction 
x<-aggregate(cbind(mpg,hp)~cyl+gear,data=mtcars,quantile) I get the 
result the following data.frame: 

  

cyl 

gear 

mpg.0% 

mpg.25% 

mpg.50% 

mpg.75% 

mpg.100% 

hp.0% 

hp.25% 

hp.50% 

hp.75% 

hp.100% 

4 

3 

21.5 

21.5 

21.5 

21.5 

21.5 

97 

97 

97 

97 

97 

6 

3 

18.1 

18.925 

19.75 

20.575 

21.4 

105 

106.25 

107.5 

108.75 

110 

8 

3 

10.4 

14.05 

15.2 

16.625 

19.2 

150 

175 

180 

218.75 

245 

4 

4 

21.4 

22.8 

25.85 

30.9 

33.9 

52 

64.25 

66 

93.5 

109 

6 

4 

17.8 

18.85 

20.1 

21 

21 

110 

110 

116.5 

123 

123 

4 

5 

26 

27.1 

28.2 

29.3 

30.4 

91 

96.5 

102 

107.5 

113 

6 

5 

19.7 

19.7 

19.7 

19.7 

19.7 

175 

175 

175 

175 

175 

8 

5 

15 

15.2 

15.4 

15.6 

15.8 

264 

281.75 

299.5 

317.25 

335 

  

So far so good, however the strange part happens when I run dim(x) or 
names(x), because the results are 8 4 (dim(x)) and "cyl"  "gear" "mpg" 
"hp" (names(x)). Why this occurs and how do I transform it in a regular 
data.frame with 12 columns? 

  

Thank you in advance, 

  

Daniel 

  

> sessionInfo() 

R version 3.0.2 (2013-09-25) 

Platform: i386-w64-mingw32/i386 (32-bit) 

  

locale: 

[1] LC_COLLATE=Portuguese_Portugal.1252 
LC_CTYPE=Portuguese_Portugal.1252 
LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C                       


[5] LC_TIME=Portuguese_Portugal.1252     

  

attached base packages: 

[1] grDevices datasets  splines   graphics  stats     tcltk     utils   
methods   base     

  

other attached packages: 

[1] svSocket_0.9-55 TinnR_1.0-5     R2HTML_2.2.1    Hmisc_3.12-2 
Formula_1.1-1   survival_2.37-4 

  

loaded via a namespace (and not attached): 

[1] cluster_1.14.4  grid_3.0.2      lattice_0.20-23 rpart_4.1-3 
svMisc_0.9-69   tools_3.0.2   

  



"Confidencialidade: Esta mensagem (e eventuais ficheiros
 anexos) é destinada exclusivamente às pessoas nela indicadas e tem 
natureza confidencial. Se receber esta mensagem por engano, por favor 
contacte o remetente e elimine a mensagem e ficheiros, sem tomar 
conhecimento do respectivo conteúdo e sem reproduzi-la ou divulgá-la. 

Confidentiality Warning: This e-mail message (and any attached 
files) is confidential and is intended solely for the use of the 
individual or entity to whom it is addressed. lf you are not the 
intended recipient of this message please notify the sender and delete 
and destroy all copies immediately."

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread jim holtman
It would be nice if you followed the posting guidelines and at least
showed the script that was creating your entries now so that we
understand the problem you are trying to solve.  A bit more
explanation of why you want this would be useful.  This gets to the
second part of my tag line:  Tell me what you want to do, not how you
want to do it.  There may be other solutions to your problem.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Nov 1, 2013 at 9:32 AM, Magnus Thor Torfason
 wrote:
> Pretty much what the subject says:
>
> I used an env as the basis for a Hashtable in R, based on information that
> this is in fact the way environments are implemented under the hood.
>
> I've been experimenting with doubling the number of entries, and so far it
> has seemed to be scaling more or less linearly, as expected.
>
> But as I went from 17 million entries to 34 million entries, the completion
> time has gone from 18 hours, to 5 days and counting.
>
>
> The keys and values are in all cases strings of equal length.
>
> One might suspect that the slow-down might have to do with the memory being
> swapped to disk, but from what I know about my computing environment, that
> should not be the case.
>
> So my first question:
> Is anyone familiar with anything in the implementation of environments that
> would limit their use or slow them down (faster than O(nlog(n)) as the
> number of entries is increased?
>
> And my second question:
> I realize that this is not strictly what R environments were designed for,
> but this is what my algorithm requires: I must go through these millions of
> entries, storing them in the hash table and sometimes retrieving them along
> the way, in a more or less random manner, which is contingent on the data I
> am encountering, and on the contents of the hash table at each moment.
>
> Does anyone have a good recommendation for alternatives to implement huge,
> fast, table-like structures in R?
>
> Best,
> Magnus
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Load Tawny package on R 2.15.3

2013-11-01 Thread R. Michael Weylandt
The release version of tawny has no such dependency and builds just fine on 
CRAN. Try updating that instead. 

Michael

On Nov 1, 2013, at 7:10, Tstudent  wrote:

> 
> 
> I have R version 2.15.3 When i try to load it:
> 
> library (tawny)
> 
> i receive this response:
> 
> package ‘parser’ could not be loaded
> 
> The package Parser in not on Cran anymore, it seems a dead project!
> http://cran.r-project.org/web/packages/parser/index.html
> 
> If i try to manual install parser_0.1.tar.gz i receive an error and can't
> install it.
> 
> The question is. Is today impossible to use tawny package? Is there a way to
> solve this problem that seems caused by parser package?
> 
> Thank you very much
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Load Tawny package on R 2.15.3

2013-11-01 Thread Uwe Ligges

Install a recent version of tawny that does not depend on the other package?

Best,
Uwe Ligges




On 01.11.2013 12:10, Tstudent wrote:



I have R version 2.15.3 When i try to load it:

library (tawny)

i receive this response:

package ‘parser’ could not be loaded

The package Parser in not on Cran anymore, it seems a dead project!
http://cran.r-project.org/web/packages/parser/index.html

If i try to manual install parser_0.1.tar.gz i receive an error and can't
install it.

The question is. Is today impossible to use tawny package? Is there a way to
solve this problem that seems caused by parser package?

Thank you very much

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function output

2013-11-01 Thread arun
You could also try:
library(plyr)
 newdf <- function(.data, ...) {
   eval(substitute(data.frame(...)), .data, parent.frame())
 }

x1 <- ddply(mtcars,.(cyl,gear), newdf, mgp=t(quantile(mpg)),hp=t(quantile(hp))) 
#(found in one of the google group discussions)
#or



library(data.table)
dt1 <- data.table(mtcars,key=c('cyl','gear'))
 dt2 <- dt1[,c(as.list(quantile(mpg)),as.list(quantile(hp))),by=key(dt1)]
 indx <- grep("%",names(dt2))
 x2 <- as.data.frame(dt2)
names(x2)[indx] <- paste(rep(c("mpg", "hp"),each=5), names(x2)[indx],sep=".")
A.K.





On Friday, November 1, 2013 9:35 AM, arun  wrote:
Hi,
Try:
do.call(data.frame,c(x,check.names=FALSE))
A.K.


Hello, 

  

I´m using function aggregate in R 3.0.2.  If I run the instruction 
x<-aggregate(cbind(mpg,hp)~cyl+gear,data=mtcars,quantile) I get the 
result the following data.frame: 

  

cyl 

gear 

mpg.0% 

mpg.25% 

mpg.50% 

mpg.75% 

mpg.100% 

hp.0% 

hp.25% 

hp.50% 

hp.75% 

hp.100% 

4 

3 

21.5 

21.5 

21.5 

21.5 

21.5 

97 

97 

97 

97 

97 

6 

3 

18.1 

18.925 

19.75 

20.575 

21.4 

105 

106.25 

107.5 

108.75 

110 

8 

3 

10.4 

14.05 

15.2 

16.625 

19.2 

150 

175 

180 

218.75 

245 

4 

4 

21.4 

22.8 

25.85 

30.9 

33.9 

52 

64.25 

66 

93.5 

109 

6 

4 

17.8 

18.85 

20.1 

21 

21 

110 

110 

116.5 

123 

123 

4 

5 

26 

27.1 

28.2 

29.3 

30.4 

91 

96.5 

102 

107.5 

113 

6 

5 

19.7 

19.7 

19.7 

19.7 

19.7 

175 

175 

175 

175 

175 

8 

5 

15 

15.2 

15.4 

15.6 

15.8 

264 

281.75 

299.5 

317.25 

335 

  

So far so good, however the strange part happens when I run dim(x) or 
names(x), because the results are 8 4 (dim(x)) and "cyl"  "gear" "mpg" 
"hp" (names(x)). Why this occurs and how do I transform it in a regular 
data.frame with 12 columns? 

  

Thank you in advance, 

  

Daniel 

  

> sessionInfo() 

R version 3.0.2 (2013-09-25) 

Platform: i386-w64-mingw32/i386 (32-bit) 

  

locale: 

[1] LC_COLLATE=Portuguese_Portugal.1252 
LC_CTYPE=Portuguese_Portugal.1252 
LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C                       


[5] LC_TIME=Portuguese_Portugal.1252     

  

attached base packages: 

[1] grDevices datasets  splines   graphics  stats     tcltk     utils   
methods   base     

  

other attached packages: 

[1] svSocket_0.9-55 TinnR_1.0-5     R2HTML_2.2.1    Hmisc_3.12-2 
Formula_1.1-1   survival_2.37-4 

  

loaded via a namespace (and not attached): 

[1] cluster_1.14.4  grid_3.0.2      lattice_0.20-23 rpart_4.1-3 
svMisc_0.9-69   tools_3.0.2   

  



"Confidencialidade: Esta mensagem (e eventuais ficheiros
anexos) é destinada exclusivamente às pessoas nela indicadas e tem 
natureza confidencial. Se receber esta mensagem por engano, por favor 
contacte o remetente e elimine a mensagem e ficheiros, sem tomar 
conhecimento do respectivo conteúdo e sem reproduzi-la ou divulgá-la. 

Confidentiality Warning: This e-mail message (and any attached 
files) is confidential and is intended solely for the use of the 
individual or entity to whom it is addressed. lf you are not the 
intended recipient of this message please notify the sender and delete 
and destroy all copies immediately."

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spsurvey analysis

2013-11-01 Thread Tim Howard
All,
I've used the excellent package, spsurvey, to create spatially balanced samples 
many times in the past. I'm now attempting to use the analysis portion of the 
package, which compares CDFs among sub-populations to test for differences in 
sub-population metrics. 
 
- My data (count data) have many zeros, following a negative binomial or even 
zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 
2. Essentially the same sample areas and number of samples.

The problem:
- cont.cdftest  throws a warning and does not complete for most (but not all) 
species sampled. Warning message: "The combined number of values in at least 
one class is less than five. Action: The user should consider using a smaller 
number of classes."

- There are plenty of samples in my two time periods (the dummy set below: 
Yr1=27, Yr2=31 non-zero values). 
 
My Question:
Why is it throwing this error and is there a way to get around it?



Reproduceable example (change path to spsurvey sample data), requires us to use 
spsurvey to generate sample points:

### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd("C:/Program Files/R/R-3.0.2/library/spsurvey/doc")
att <- read.dbf("UT_ecoregions")
shp <- read.shape("UT_ecoregions")

set.seed(4447864)

# Create the design list
Stratdsgn <- list("Central Basin and Range"=list(panel=c(PanelOne=25), 
seltype="Equal"),
  "Colorado Plateaus"=list(panel=c(PanelOne=25), 
seltype="Equal"),
  "Mojave Basin and Range"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Northern Basin and Range"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Southern Rockies"=list(panel=c(PanelOne=14), 
seltype="Equal"),
  "Wasatch and Uinta Mountains"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Wyoming Basin"=list(panel=c(PanelOne=6), seltype="Equal"))

# Select the sample design for each year
Stratsites_Yr1 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
   type.frame="area", src.frame="sp.object",
   sp.object=shp, att.frame=att, stratum="Level3_Nam", 
shapefile=FALSE)

Stratsites_Yr2 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
   type.frame="area", src.frame="sp.object",
   sp.object=shp, att.frame=att, stratum="Level3_Nam", 
shapefile=FALSE)

#extract the core information, add year as a grouping variable, add a plot ID 
to link with dummy data
Yr1 <- cbind(pltID = 1001:1100, Stratsites_Yr1@data[,c(1,2,3,5)], grp = "Yr1")
Yr2 <- cbind(pltID = 2001:2100, Stratsites_Yr2@data[,c(1,2,3,5)], grp = "Yr2")  
   
sitedat <- rbind(Yr1, Yr2)

# create dummy sampling data. Lots of zeros!
bn.a <- rnbinom(size = 0.06, mu = 19.87, n=100)
bn.b <- rnbinom(size = 0.06, mu = 20.15, n=100)
dat.a <- data.frame(pltID = 1001:1100, grp = "Yr1",count = bn.a)
dat.b <- data.frame(pltID = 2001:2100, grp = "Yr2",count = bn.b)
dat <- rbind(dat.a, dat.b)


## Analysis begins here

data.cont <- data.frame(siteID = dat$pltID, Density=dat$count)
sites <- data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat)))
subpop <- data.frame(siteID = dat$pltID, 
All_years=(rep("allYears",nrow(dat))),
Year = dat$grp)
design <- data.frame(siteID = sitedat$pltID,
wgt = sitedat$wgt,
xcoord = sitedat$xcoord,
ycoord = sitedat$ycoord)
framesize <- c("Yr1"=888081202000, "Yr2"=888081202000)

## There seem to be pretty good estimates
CDF_Estimates <- cont.analysis(sites, subpop, design, data.cont, 
popsize = list(All_years=sum(framesize),
Year = as.list(framesize)))

print(CDF_Estimates$Pct)

## this test fails
CDF_Tests <- cont.cdftest(sites, subpop[,c(1,3)], design, data.cont,
   popsize=list(Year=as.list(framesize)))
warnprnt()

## how many records have values greater than zero, by year?   Probably 
irrelevant!
notZero <- dat[dat$count > 0,]
table(notZero$grp)

### end

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
 
locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C  
[5] LC_TIME=English_United States.1252

Thanks in advance.
Tim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting values from a ecdf (empirical cumulative distribution function) curve

2013-11-01 Thread William Dunlap
You are not using the inv_ecdf function that Rui sent.  His was
   inv_ecdf_orig <-
   function (f)
   {
   x <- environment(f)$x
   y <- environment(f)$y
   approxfun(y, x)
   }
(There is no 'xnew' in the environment of f.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

From: Manoranjan Muthusamy [mailto:ranjanmano...@gmail.com]
Sent: Friday, November 01, 2013 4:38 AM
To: William Dunlap; dulca...@bigpond.com
Cc: Rui Barradas; r-help@r-project.org
Subject: Re: [R] Extracting values from a ecdf (empirical cumulative 
distribution function) curve

Thanks, Bill & Duncan. Actually I tried values which are inside the defined 
region. please find below the extracted script

> xnew<-rlnorm(seq(0,400,1), meanlog=9.7280055, sdlog=2.0443945)
> f <- ecdf(xnew)
> y <- f(x)
> y1<-f(200)## finding y for a given xnew value of 200
> y1
[1] 0.9950125## It works.

> inv_ecdf <- function(f){
+ xnew <- environment(f)$xnew
+ y <- environment(f)$y
+ approxfun(y, xnew)
+ }
## Interpolation to find xnew for a known y value.

> g <- inv_ecdf(f)
> g(0.9950125)
[1] NA
> g(0.99)  ## It doesn't
[1] NA
> g(0.5)
[1] NA ## again
> g(0.2)
[1] NA ## and again


I am stuck here. Any help is appreciated.

Mano.

On Fri, Nov 1, 2013 at 2:48 AM, William Dunlap 
mailto:wdun...@tibco.com>> wrote:
> it gives 'NA' (for whatever y value).
What 'y' values were you using?  inf_f maps probabilities (in [0,1]) to
values in the range of the orginal data, x, but it will have problems for
a probability below 1/length(x) because the original data didn't tell
you anything about the ecdf in that region.

   > X <- c(101, 103, 107, 111)
   > f <- ecdf(X)
   > inv_f <- inv_ecdf(f)
   > inv_f(seq(0, 1, by=1/8))
   [1]  NA  NA 101 102 103 105 107 109 111

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Manoranjan Muthusamy
> Sent: Thursday, October 31, 2013 6:18 PM
> To: Rui Barradas
> Cc: r-help@r-project.org
> Subject: Re: [R] Extracting values from a ecdf (empirical cumulative 
> distribution function)
> curve
>
> Thank you, Barradas. It works when finding y, but when I tried to find x
> using interpolation for a known y it gives 'NA' (for whatever y value). I
> couldn't find out the reason. Any help is really appreciated.
>
> Thanks,
> Mano
>
>
> On Thu, Oct 31, 2013 at 10:53 PM, Rui Barradas 
> mailto:ruipbarra...@sapo.pt>> wrote:
>
> > Hello,
> >
> > As for the problem of finding y given the ecdf and x, it's very easy, just
> > use the ecdf:
> >
> > f <- ecdf(rnorm(100))
> >
> > x <- rnorm(10)
> > y <- f(x)
> >
> > If you want to get the x corresponding to given y, use linear
> > interpolation.
> >
> > inv_ecdf <- function(f){
> > x <- environment(f)$x
> > y <- environment(f)$y
> > approxfun(y, x)
> > }
> >
> > g <- inv_ecdf(f)
> > g(0.5)
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Em 31-10-2013 12:25, Manoranjan Muthusamy escreveu:
> >
> >> Hi R users,
> >>
> >> I am a new user, still learning basics of R. Is there anyway to extract y
> >> (or x) value for a known x (or y) value from ecdf (empirical cumulative
> >> distribution function) curve?
> >>
> >> Thanks in advance.
> >> Mano.
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __**
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/**listinfo/r-
> help
> >> PLEASE do read the posting guide http://www.R-project.org/**
> >> posting-guide.html 
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 - value labels + "adjusting position using y instead"

2013-11-01 Thread Dimitri Liakhovitski
I am building a horizontal bar plot using ggplot2 - see the code below.

A couple of questions:
1. On the right side of the graph the value labels are inside the bars. How
could I move them to be outside the bars - the way they are on the left
side?
2. How can I make sure that the scale on my X axis is such that I can
always the full value labels (even for the smallest or the largest values)?
3. What can I do to avoid the following warning: "ymax not defined:
adjusting position using y instead". My challenge is that I never now in
advance what the values of the chart will be. But I need to make a change
so that the warning does not need to come up.

Thanks a lot!

# My data set:

multiplier<-sample(c(1,2,10),20, replace=T)
set.seed(123)
DF <- data.frame(x = sample(LETTERS,20, replace = F),y=rnorm(20)*multiplier)
(DF)

# My plot:
library(ggplot2)
ggplot(DF, aes(x = factor(x,levels=rev(unique(x))),y=y)) +
geom_bar(stat="identity",fill="dark orange", color = NA,
alpha=1,position="identity")+
geom_text(aes(label=round(y,2)),colour="black",size=4,hjust=1.1,position='dodge')+
coord_flip()+
xlab("") +
ylab("")



-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread Magnus Thor Torfason

Sure,

I was attempting to be concise and boiling it down to what I saw as the 
root issue, but you are right, I could have taken it a step further. So 
here goes.


I have a set of around around 20M string pairs. A given string (say, A) 
can either be equivalent to another string (B) or not. If A and B occur 
together in the same pair, they are equivalent. But equivalence is 
transitive, so if A and B occur together in one pair, and A and C occur 
together in another pair, then A and C are also equivalent. I need a way 
to quickly determine if any two strings from my data set are equivalent 
or not.


The way I do this currently is to designate the smallest 
(alphabetically) string in each known equivalence set as the "main" 
entry. For each pair, I therefore insert two entries into the hash 
table, both pointing at the mail value. So assuming the input data:


A,B
B,C
D,E

I would then have:

A->A
B->A
C->B
D->D
E->D

Except that I also follow each chain until I reach the end (key==value), 
and insert pointers to the "main" value for every value I find along the 
way. After doing that, I end up with:


A->A
B->A
C->A
D->D
E->D

And I can very quickly check equivalence, either by comparing the hash 
of two strings, or simply by transforming each string into its hash, and 
then I can use simple comparison from then on. The code for generating 
the final hash table is as follows:


h : Empty hash table created with hash.new()
d : Input data
hash.deep.get : Function that iterates through the hash table until it 
finds a key whose value is equal to itself (until hash.get(X)==X), then 
returns all the values in a vector



h = hash.new()
for ( i in 1:nrow(d) )
{
deep.a  = hash.deep.get(h, d$a[i])
deep.b  = hash.deep.get(h, d$b[i])
equivalents = sort(unique(c(deep.a,deep.b)))
equiv.id= min(equivalents)
for ( equivalent in equivalents )
{
hash.put(h, equivalent, equiv.id)
}
}


I would so much appreciate if there was a simpler and faster way to do 
this. Keeping my fingers crossed that one of the R-help geniuses who 
sees this is sufficiently interested to crack the problem


Best,
Magnus

On 11/1/2013 1:49 PM, jim holtman wrote:

It would be nice if you followed the posting guidelines and at least
showed the script that was creating your entries now so that we
understand the problem you are trying to solve.  A bit more
explanation of why you want this would be useful.  This gets to the
second part of my tag line:  Tell me what you want to do, not how you
want to do it.  There may be other solutions to your problem.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Nov 1, 2013 at 9:32 AM, Magnus Thor Torfason
 wrote:

Pretty much what the subject says:

I used an env as the basis for a Hashtable in R, based on information that
this is in fact the way environments are implemented under the hood.

I've been experimenting with doubling the number of entries, and so far it
has seemed to be scaling more or less linearly, as expected.

But as I went from 17 million entries to 34 million entries, the completion
time has gone from 18 hours, to 5 days and counting.


The keys and values are in all cases strings of equal length.

One might suspect that the slow-down might have to do with the memory being
swapped to disk, but from what I know about my computing environment, that
should not be the case.

So my first question:
Is anyone familiar with anything in the implementation of environments that
would limit their use or slow them down (faster than O(nlog(n)) as the
number of entries is increased?

And my second question:
I realize that this is not strictly what R environments were designed for,
but this is what my algorithm requires: I must go through these millions of
entries, storing them in the hash table and sometimes retrieving them along
the way, in a more or less random manner, which is contingent on the data I
am encountering, and on the contents of the hash table at each moment.

Does anyone have a good recommendation for alternatives to implement huge,
fast, table-like structures in R?

Best,
Magnus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] forecast.lm() and NEWDATA

2013-11-01 Thread Ryan

Good day all.

I am hoping you can help me (and I did this right). I've been working in 
R for a week now, and have encountered a problem with forecast.lm().


I have a list of 12 variables, all type = double, with 15 data entries.
(I imported them from tab delimited text files, and then formatted 
as.numeric to change from list to double)
(I understand that this leaves me rather limited in my degrees of 
freedom, but working with what I have, sadly. )


I have a LM model, such that
REGGY = lm(formula=Y~A,B,C,...,I,J)
which I am happy with.

I have
NEWDATA = data.frame(A+B+C+D+I+J)

When i try to run

forecast.lm(REGGY, h=5)

i receive the following error
"Error in as.data.frame(newdata) :
  argument "newdata" is missing, with no default"

When I run
forecast.lm(REGGY, NEWDATA, h=5)
I receive the confidence intervals of the 15 data entries I already 
possess. I understand that by including NEWDATA, the "h=5" is ignored, 
but without NEWDATA, I receive the error message.


Can anyone help me please?

Regards
Ryan

P.S The forecast is trying to predict the next 5 values for Y from the 
regression model pasted above. I'm a bit rusty with regressions, but I 
think I've covered my bases as well as I can, and from what I understand 
of the R code, I'm following the right steps.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Load Tawny package on R 2.15.3

2013-11-01 Thread Tstudent

Uwe Ligges  statistik.tu-dortmund.de> writes:

> 
> Install a recent version of tawny that does not depend on the other package?



The most recent version is this:
http://cran.r-project.org/web/packages/tawny/index.html

I can install, but can't load without parser package.
It seems true for 2.15.3 version of R

Someone says that with R 3.0 no problem.

My question is if there is some possibility to have a working tawny on R 2.15.3

I don't want upgrade to R 3.0 because have one package that doesn't work on
R 3.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Irregular time series frequencies

2013-11-01 Thread sartene
Thanks a lot Achim!

This helped a lot. I do not have exactly what I want yet, but I now have 
promising ideas to gather my data and find what I'm looking for (especially 
as.numeric(x, 
units = "hours")).

Regards,


Sartene Bel


> Message du 31/10/13 à 08h48
> De : "Achim Zeileis" 
> A : sart...@voila.fr
> Copie à : r-help@r-project.org
> Objet : Re: [R] Irregular time series frequencies
> 
> On Wed, 30 Oct 2013, sart...@voila.fr wrote:
> 
> > Hi everyone,
> >
> > I have a data frame with email addresses in the first column and in the 
> > second column a list of times (of different lengths) at which an email was 
> > sent from 
the 
> > user in the first column.
> >
> > Here is an example of my data:
> >
> > Email Email_sent
> > j...@doe.com "2013-09-26 15:59:55" "2013-09-27 09:48:29" "2013-09-27 
> > 10:00:02" "2013-09-27 10:12:54" 
> > j...@shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24" "2013-09-26 
> > 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02" "2013-09-27 14:41:10" 
> > "2013-09-27 15:37:36"
> > ...
> >
> > I cannot find any way to calculate the frequencies between each email sent 
> > for each user:
> > j...@doe.com 0.02 email / hour
> > j...@shoe.com 0.15 email / hour
> > ...
> >
> > Can anyone help me on this problem?
> 
> You could do something like this:
> 
> ## scan your data file
> d <- scan(, what = "character")
> 
> ## here I use the data from above
> d <- scan(textConnection('j...@doe.com "2013-09-26 15:59:55"
> "2013-09-27 09:48:29" "2013-09-27 10:00:02" "2013-09-27 10:12:54"
> j...@shoe.com "2013-09-26 09:50:28" "2013-09-26 14:41:24"
> "2013-09-26 14:51:36" "2013-09-26 17:50:10" "2013-09-27 13:34:02"
> "2013-09-27 14:41:10" "2013-09-27 15:37:36"'), what = "character")
> 
> ## find position of e-mail addresses
> n <- grep("@", dc, fixed = TRUE)
> 
> ## extract list of dates
> n <- c(n, length(d) + 1)
> x <- lapply(1:(length(n) - 1),
> function(i) as.POSIXct(d[(n[i] + 1):(n[i+1] - 1)]))
> 
> ## add e-mail addresses as names
> names(x) <- d[head(n, -1)]
> 
> ## functions that could extract quantities of interest such as
> ## number of mails per hour or mean time difference etc.
> meantime <- function(timevec)
> mean(as.numeric(diff(timevec), units = "hours"))
> numperhour <- function(timevec)
> length(timevec) / as.numeric(diff(range(timevec)), units = "hours")
> 
> ## apply to full list
> sapply(x, numperhour)
> sapply(x, meantime)
> 
> ## apply to list by date
> sapply(x, function(timevec) tapply(timevec, as.Date(timevec), numperhour))
> sapply(x, function(timevec) tapply(timevec, as.Date(timevec), meantime))
> 
> hth,
> Z
> 
> > The ultimate goal (which seems amibitious at this time) is to calculate, 
> > for each user, the frequencies between each mail per day, between the first 
> > email sent 
> > and the last email sent each day (to avoid taking nights into account), 
> > i.e.:
> >
> > 2013-09-26 2013-09-27
> > j...@doe.com 1.32 emails / hour 0.56 emails / hour
> > j...@shoe.com 10.57 emails / hour 2.54 emails / hour
> > ...
> >
> > At this time it seems pretty impossible, but I guess I will eventually find 
> > a way :-)
> >
> > Thanks a lot,
> >
> >
> > Sartene Bel
> > R learner
> > ___
> > Qu'y a-t-il ce soir à la télé ? D'un coup d'?il, visualisez le programme 
> > sur Voila.fr http://tv.voila.fr/programmes/chaines-tnt/ce-soir.html
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
___
Qu'y a-t-il ce soir à la télé ? D'un coup d'œil, visualisez le programme sur 
Voila.fr http://tv.voila.fr/programmes/chaines-tnt/ce-soir.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Load Tawny package on R 2.15.3

2013-11-01 Thread Bert Gunter
(Inline)

On Fri, Nov 1, 2013 at 7:33 AM, Tstudent  wrote:
>
> Uwe Ligges  statistik.tu-dortmund.de> writes:
>
>>
>> Install a recent version of tawny that does not depend on the other package?
>
>
>
> The most recent version is this:
> http://cran.r-project.org/web/packages/tawny/index.html
>
> I can install, but can't load without parser package.
> It seems true for 2.15.3 version of R
>
> Someone says that with R 3.0 no problem.
>
> My question is if there is some possibility to have a working tawny on R 
> 2.15.3
>
> I don't want upgrade to R 3.0 because have one package that doesn't work on
> R 3.0

I have no specific expertise here, but I just wanted to point out that
this sounds like a losing strategy long term: As new packages and
newer versions of packages come out that fix bugs and add features,
you'll be unable to use them because you'll be stuck with 2.15.3 . I
suggest you bite the bullet and follow the experts' advice to get
things working with the current R version now.

Cheers,
Bert

>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count number of consecutive zeros by group

2013-11-01 Thread PIKAL Petr
Hi

Yes you are right. This gives number of zeroes not max number of consecutive 
zeroes.

Regards
Petr


> -Original Message-
> From: arun [mailto:smartpink...@yahoo.com]
> Sent: Friday, November 01, 2013 2:17 PM
> To: R help
> Cc: PIKAL Petr; Carlos Nasher
> Subject: Re: [R] Count number of consecutive zeros by group
> 
> I think this gives a different result than the one OP asked for:
> 
> df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
> 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), x = c(1, 0, 0, 1, 0,
> 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0)), .Names = c("ID",
> "x"), row.names = c(NA, -22L), class = "data.frame")
> 
> with(df1, sapply(split(x, ID), function(x) sum(x==0)))
> 
> with(df1,tapply(x,list(ID),function(y) {rl <- rle(!y);
> max(c(0,rl$lengths[rl$values]))}))
> 
> 
> A.K.
> 
> 
> On Friday, November 1, 2013 6:01 AM, PIKAL Petr
>  wrote:
> Hi
> 
> Another option is sapply/split/sum construction
> 
> with(data, sapply(split(x, ID), function(x) sum(x==0)))
> 
> Regards
> Petr
> 
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> > project.org] On Behalf Of Carlos Nasher
> > Sent: Thursday, October 31, 2013 6:46 PM
> > To: S Ellison
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Count number of consecutive zeros by group
> >
> > If I apply your function to my test data:
> >
> > ID <- c(1,1,1,2,2,3,3,3,3)
> > x <- c(1,0,0,0,0,1,1,0,1)
> > data <- data.frame(ID=ID,x=x)
> > rm(ID,x)
> >
> > f2 <-   function(x) {
> >   max( rle(x == 0)$lengths )
> > }
> > with(data, tapply(x, ID, f2))
> >
> > the result is
> > 1 2 3
> > 2 2 2
> >
> > which is not what I'm aiming for. It should be
> > 1 2 3
> > 2 2 1
> >
> > I think f2 does not return the max of consecutive zeros, but the max
> > of any consecutve number... Any idea how to fix this?
> >
> >
> > 2013/10/31 S Ellison 
> >
> > >
> > >
> > > > -Original Message-
> > > > So I want to get the max number of consecutive zeros of variable
> x
> > > > for
> > > each
> > > > ID. I found rle() to be helpful for this task; so I did:
> > > >
> > > > FUN <- function(x) {
> > > >   rles <- rle(x == 0)
> > > > }
> > > > consec <- lapply(split(df[,2],df[,1]), FUN)
> > >
> > > You're probably better off with tapply and a function that returns
> > > what you want. You're probably also better off with a data frame
> > > name that isn't a function name, so I'll use dfr instead of df...
> > >
> > > dfr<- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups
> > > numbered 1-5, equal size but that doesn't matter for tapply
> > >
> > > f2 <-   function(x) {
> > >         max( rle(x == 0)$lengths )
> > > }
> > > with(dfr, tapply(x, ID, f2))
> > >
> > >
> > > S Ellison
> > >
> > >
> > > ***
> > > This email and any attachments are confidential. Any
> > > u...{{dropped:24}}
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html and provide commented, minimal, self-contained,
> > reproducible code.
> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice Legend/Key by row instead of by column

2013-11-01 Thread Richard Kwock
Hi Duncan,

Thanks for that template. Not quite the solution I was hoping for, but
that works!

Richard

On Thu, Oct 31, 2013 at 3:47 PM, Duncan Mackay  wrote:
> Hi Richard
>
> If you cannot get a better suggestion this example from Deepayan Sarkar may
> help.
> It is way back in the archives and I do not have a reference for it.
>
> I have used it about a year ago as a template to do a complicated key
>
> fl <- grid.layout(nrow = 2, ncol = 6,
>   heights = unit(rep(1, 2), "lines"),
>   widths = unit(c(2, 1, 2, 1, 2, 1),
>
> c("cm","strwidth","cm","strwidth","cm","strwidth"),
>   data = list(NULL,"John",NULL,"George",NULL,"The
> Beatles")))
>
> foo <- frameGrob(layout = fl)
> foo <- placeGrob(foo,
>  pointsGrob(.5, .5, pch=19,
> gp = gpar(col="red", cex=0.5)),
>  row = 1, col = 1)
> foo <- placeGrob(foo,
>  linesGrob(c(0.2, 0.8), c(.5, .5),
>gp = gpar(col="blue")),
>  row = 2, col = 1)
> foo <- placeGrob(foo,
>  linesGrob(c(0.2, 0.8), c(.5, .5),
>gp = gpar(col="green")),
>  row = 1, col = 3)
> foo <- placeGrob(foo,
>  linesGrob(c(0.2, 0.8), c(.5, .5),
>gp = gpar(col="orange")),
>  row = 2, col = 3)
> foo <- placeGrob(foo,
>  rectGrob(width = 0.6,
>   gp = gpar(col="#CC",
>   fill = "#CC")),
>  row = 1, col = 5)
> foo <- placeGrob(foo,
>  textGrob(lab = "John"),
>  row = 1, col = 2)
> foo <- placeGrob(foo,
>  textGrob(lab = "Paul"),
>  row = 2, col = 2)
> foo <- placeGrob(foo,
>  textGrob(lab = "George"),
>  row = 1, col = 4)
> foo <- placeGrob(foo,
>  textGrob(lab = "Ringo"),
>  row = 2, col = 4)
> foo <- placeGrob(foo,
>  textGrob(lab = "The Beatles"),
>  row = 1, col = 6)
>
> xyplot(1 ~ 1, legend = list(top = list(fun = foo)))
>
> In my case I changed  "strwidth" to "cm" for the text as I was cramped for
> space
>
> HTH
>
> Duncan
>
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> Armidale NSW 2351
> Email: home: mac...@northnet.com.au
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Richard Kwock
> Sent: Friday, 1 November 2013 06:42
> To: R help
> Subject: [R] Lattice Legend/Key by row instead of by column
>
> Hi All,
>
> I am having some trouble getting lattice to display the legend names by row
> instead of by column (default).
>
> Example:
>
> library(lattice)
> set.seed(456846)
> data <- matrix(c(1:10) + runif(50), ncol = 5, nrow = 10) dataset <-
> data.frame(data = as.vector(data), group = rep(1:5, each = 10), time = 1:10)
>
> xyplot(data ~ time, group = group, dataset, t = "l",
>   key = list(text = list(paste("group", unique(dataset$group)) ),
> lines = list(col = trellis.par.get()$superpose.symbol$col[1:5]),
> columns = 4
>   )
> )
>
> What I'm hoping for are 4 columns in the legend, like this:
> Legend row 1: "group 1", "group 2", "group 3", "group 4"
> Legend row 2: "group 5"
>
> However, I'm getting:
> Legend row 1: "group 1", "group 3", "group 5"
> Legend row 2: "group 2", "group 4"
>
> I can see how this might work if I include blanks/NULLs in the legend as
> placeholders, but that might get messy in data sets with many groups.
>
> Any ideas on how to get around this?
>
> Thanks,
> Richard
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread William Dunlap
Have you looked into the 'igraph' package?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Magnus Thor Torfason
> Sent: Friday, November 01, 2013 8:23 AM
> To: r-help@r-project.org
> Subject: Re: [R] Inserting 17M entries into env took 18h, inserting 34M 
> entries taking 5+
> days
> 
> Sure,
> 
> I was attempting to be concise and boiling it down to what I saw as the
> root issue, but you are right, I could have taken it a step further. So
> here goes.
> 
> I have a set of around around 20M string pairs. A given string (say, A)
> can either be equivalent to another string (B) or not. If A and B occur
> together in the same pair, they are equivalent. But equivalence is
> transitive, so if A and B occur together in one pair, and A and C occur
> together in another pair, then A and C are also equivalent. I need a way
> to quickly determine if any two strings from my data set are equivalent
> or not.
> 
> The way I do this currently is to designate the smallest
> (alphabetically) string in each known equivalence set as the "main"
> entry. For each pair, I therefore insert two entries into the hash
> table, both pointing at the mail value. So assuming the input data:
> 
> A,B
> B,C
> D,E
> 
> I would then have:
> 
> A->A
> B->A
> C->B
> D->D
> E->D
> 
> Except that I also follow each chain until I reach the end (key==value),
> and insert pointers to the "main" value for every value I find along the
> way. After doing that, I end up with:
> 
> A->A
> B->A
> C->A
> D->D
> E->D
> 
> And I can very quickly check equivalence, either by comparing the hash
> of two strings, or simply by transforming each string into its hash, and
> then I can use simple comparison from then on. The code for generating
> the final hash table is as follows:
> 
> h : Empty hash table created with hash.new()
> d : Input data
> hash.deep.get : Function that iterates through the hash table until it
> finds a key whose value is equal to itself (until hash.get(X)==X), then
> returns all the values in a vector
> 
> 
> h = hash.new()
> for ( i in 1:nrow(d) )
> {
>  deep.a  = hash.deep.get(h, d$a[i])
>  deep.b  = hash.deep.get(h, d$b[i])
>  equivalents = sort(unique(c(deep.a,deep.b)))
>  equiv.id= min(equivalents)
>  for ( equivalent in equivalents )
>  {
>  hash.put(h, equivalent, equiv.id)
>  }
> }
> 
> 
> I would so much appreciate if there was a simpler and faster way to do
> this. Keeping my fingers crossed that one of the R-help geniuses who
> sees this is sufficiently interested to crack the problem
> 
> Best,
> Magnus
> 
> On 11/1/2013 1:49 PM, jim holtman wrote:
> > It would be nice if you followed the posting guidelines and at least
> > showed the script that was creating your entries now so that we
> > understand the problem you are trying to solve.  A bit more
> > explanation of why you want this would be useful.  This gets to the
> > second part of my tag line:  Tell me what you want to do, not how you
> > want to do it.  There may be other solutions to your problem.
> >
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> > Tell me what you want to do, not how you want to do it.
> >
> >
> > On Fri, Nov 1, 2013 at 9:32 AM, Magnus Thor Torfason
> >  wrote:
> >> Pretty much what the subject says:
> >>
> >> I used an env as the basis for a Hashtable in R, based on information that
> >> this is in fact the way environments are implemented under the hood.
> >>
> >> I've been experimenting with doubling the number of entries, and so far it
> >> has seemed to be scaling more or less linearly, as expected.
> >>
> >> But as I went from 17 million entries to 34 million entries, the completion
> >> time has gone from 18 hours, to 5 days and counting.
> >>
> >>
> >> The keys and values are in all cases strings of equal length.
> >>
> >> One might suspect that the slow-down might have to do with the memory being
> >> swapped to disk, but from what I know about my computing environment, that
> >> should not be the case.
> >>
> >> So my first question:
> >> Is anyone familiar with anything in the implementation of environments that
> >> would limit their use or slow them down (faster than O(nlog(n)) as the
> >> number of entries is increased?
> >>
> >> And my second question:
> >> I realize that this is not strictly what R environments were designed for,
> >> but this is what my algorithm requires: I must go through these millions of
> >> entries, storing them in the hash table and sometimes retrieving them along
> >> the way, in a more or less random manner, which is contingent on the data I
> >> am encountering, and on the contents of the hash table at each moment.
> >>
> >> Does anyone have a good recommendation for alternatives to implement huge,
> >> fast, table-like structures in R?
> >>
> >> Best,
> >> Magnus
> >>
> >> 

[R] find max value in each row and return column number and column name

2013-11-01 Thread Gary Dong
Dear R users,

I wonder how I can use R to identify the max value of each row, the column
number column name:

For example:

a <- data.frame(x = rnorm(4), y = rnorm(4), z = rnorm(4))

> a
   x  y  z
1 -0.7289964  0.2194702 -2.4674780
2  1.0889353  0.3167629 -0.9208548
3 -0.6374692 -1.7249049  0.6567313
4 -0.1348642  0.4507473 -1.7309010

In this data frame, I compare y and z only.

What I need:

x y z
 max max.col.num max.col.name
1 -0.7289964  0.2194702 -2.4674780 0.2194702   2
 y
2  1.0889353  0.3167629 -0.9208548 0.3167629   2
 y
3 -0.6374692 -1.7249049  0.6567313 0.6567313   3
z
4 -0.1348642  0.4507473 -1.7309010 0.4507473   2
y


Any suggestion will be greatly appreciated!

Thank you!

Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] forecast.lm() and NEWDATA

2013-11-01 Thread David Winsemius

On Nov 1, 2013, at 6:50 AM, Ryan wrote:

> Good day all.
> 
> I am hoping you can help me (and I did this right). I've been working in R 
> for a week now, and have encountered a problem with forecast.lm().
> 
> I have a list of 12 variables, all type = double, with 15 data entries.
> (I imported them from tab delimited text files, and then formatted as.numeric 
> to change from list to double)
> (I understand that this leaves me rather limited in my degrees of freedom, 
> but working with what I have, sadly. )
> 
> I have a LM model, such that
> REGGY = lm(formula=Y~A,B,C,...,I,J)

This looks wrong. Separating independent predictors with commas would be highly 
unusual.
> which I am happy with.
> 
> I have
> NEWDATA = data.frame(A+B+C+D+I+J)

This also looks wrong. Separating arguments to data.frame with "+"-signs is 
surely wrong.
> 
> When i try to run
> 
> forecast.lm(REGGY, h=5)
> 
> i receive the following error
> "Error in as.data.frame(newdata) :
>  argument "newdata" is missing, with no default"

If your code prior to calling forecast on the REGGY-object was really what you 
showed here, I am not surprised. You should post the output of str() on the 
data-objects that has the 12 variables and if it was modified the data argument 
pasted to `lm()` when you made REGGY. (Beginners should name their data 
arguments.)

> 
> When I run
> forecast.lm(REGGY, NEWDATA, h=5)
> I receive the confidence intervals of the 15 data entries I already possess. 
> I understand that by including NEWDATA, the "h=5" is ignored, but without 
> NEWDATA, I receive the error message.
> 
> Can anyone help me please?
> 
> Regards
> Ryan
> 
> P.S The forecast is trying to predict the next 5 values for Y from the 
> regression model pasted above. I'm a bit rusty with regressions, but I think 
> I've covered my bases as well as I can, and from what I understand of the R 
> code, I'm following the right steps.

Not if what you posted here was your code. I think you missed a few crucials 
points about R syntax.
> 

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] find max value in each row and return column number and column name

2013-11-01 Thread Clint Bowman

?which.max should start you down the right path

Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

USPS:   PO Box 47600, Olympia, WA 98504-7600
Parcels:300 Desmond Drive, Lacey, WA 98503-1274

On Fri, 1 Nov 2013, Gary Dong wrote:


Dear R users,

I wonder how I can use R to identify the max value of each row, the column
number column name:

For example:

a <- data.frame(x = rnorm(4), y = rnorm(4), z = rnorm(4))


a

  x  y  z
1 -0.7289964  0.2194702 -2.4674780
2  1.0889353  0.3167629 -0.9208548
3 -0.6374692 -1.7249049  0.6567313
4 -0.1348642  0.4507473 -1.7309010

In this data frame, I compare y and z only.

What I need:

   x y z
max max.col.num max.col.name
1 -0.7289964  0.2194702 -2.4674780 0.2194702   2
y
2  1.0889353  0.3167629 -0.9208548 0.3167629   2
y
3 -0.6374692 -1.7249049  0.6567313 0.6567313   3
   z
4 -0.1348642  0.4507473 -1.7309010 0.4507473   2
   y


Any suggestion will be greatly appreciated!

Thank you!

Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] find max value in each row and return column number and column name

2013-11-01 Thread arun
Hi,
Try:

  cbind(a,do.call(rbind,apply(a,1,function(x) {data.frame(max=max(x), 
max.col.num=which.max(x), 
max.col.name=names(a)[which.max(x)],stringsAsFactors=FALSE)}))) ##assuming that 
unique max for each row.
A.K.


On Friday, November 1, 2013 1:05 PM, Gary Dong  wrote:
Dear R users,

I wonder how I can use R to identify the max value of each row, the column
number column name:

For example:

a <- data.frame(x = rnorm(4), y = rnorm(4), z = rnorm(4))

> a
           x          y          z
1 -0.7289964  0.2194702 -2.4674780
2  1.0889353  0.3167629 -0.9208548
3 -0.6374692 -1.7249049  0.6567313
4 -0.1348642  0.4507473 -1.7309010

In this data frame, I compare y and z only.

What I need:

            x                     y                     z
max                 max.col.num         max.col.name
1 -0.7289964  0.2194702 -2.4674780         0.2194702               2
                     y
2  1.0889353  0.3167629 -0.9208548         0.3167629               2
                     y
3 -0.6374692 -1.7249049  0.6567313         0.6567313               3
                    z
4 -0.1348642  0.4507473 -1.7309010         0.4507473               2
                    y


Any suggestion will be greatly appreciated!

Thank you!

Gary

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting values from a ecdf (empirical cumulative distribution function) curve

2013-11-01 Thread Manoranjan Muthusamy
Yeah, now it works. Thanks a lot, William and everyone who helped me. This
forum is really helpful for beginners like me. :)

Mano.


On Fri, Nov 1, 2013 at 3:54 PM, William Dunlap  wrote:

>  You are not using the inv_ecdf function that Rui sent.  His was
>
>inv_ecdf_orig <-
>
>function (f) 
>
>{
>
>x <- environment(f)$x
>
>y <- environment(f)$y
>
>approxfun(y, x)
>
>}
>
> (There is no 'xnew' in the environment of f.)
>
> ** **
>
> Bill Dunlap
>
> Spotfire, TIBCO Software
>
> wdunlap tibco.com
>
> ** **
>
> *From:* Manoranjan Muthusamy [mailto:ranjanmano...@gmail.com]
> *Sent:* Friday, November 01, 2013 4:38 AM
> *To:* William Dunlap; dulca...@bigpond.com
> *Cc:* Rui Barradas; r-help@r-project.org
>
> *Subject:* Re: [R] Extracting values from a ecdf (empirical cumulative
> distribution function) curve
>
>  ** **
>
> Thanks, Bill & Duncan. Actually I tried values which are inside the
> defined region. please find below the extracted script
>
> ** **
>
> > xnew<-rlnorm(seq(0,400,1), meanlog=9.7280055, sdlog=2.0443945)**
> **
>
> > f <- ecdf(xnew)
>
> > y <- f(x)
>
> > y1<-f(200)## finding y for a given xnew value of
> 200
>
> > y1
>
> [1] 0.9950125## It works.
>
> ** **
>
> > inv_ecdf <- function(f){
>
> + xnew <- environment(f)$xnew
>
> + y <- environment(f)$y
>
> + approxfun(y, xnew)
>
> + }
>
> ## Interpolation to find xnew for a known y value.
>
> ** **
>
> > g <- inv_ecdf(f)
>
> > g(0.9950125)
>
> [1] NA
>
> > g(0.99)  ## It doesn't
>
> [1] NA
>
> > g(0.5)
>
> [1] NA ## again
>
> > g(0.2)
>
> [1] NA ## and again
>
> ** **
>
>
> I am stuck here. Any help is appreciated.
>
> Mano.
>
> ** **
>
> On Fri, Nov 1, 2013 at 2:48 AM, William Dunlap  wrote:*
> ***
>
> > it gives 'NA' (for whatever y value).
>
> What 'y' values were you using?  inf_f maps probabilities (in [0,1]) to
> values in the range of the orginal data, x, but it will have problems for
> a probability below 1/length(x) because the original data didn't tell
> you anything about the ecdf in that region.
>
>> X <- c(101, 103, 107, 111)
>> f <- ecdf(X)
>> inv_f <- inv_ecdf(f)
>> inv_f(seq(0, 1, by=1/8))
>[1]  NA  NA 101 102 103 105 107 109 111
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Manoranjan Muthusamy
> > Sent: Thursday, October 31, 2013 6:18 PM
> > To: Rui Barradas
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Extracting values from a ecdf (empirical cumulative
> distribution function)
> > curve
> >
> > Thank you, Barradas. It works when finding y, but when I tried to find x
> > using interpolation for a known y it gives 'NA' (for whatever y value). I
> > couldn't find out the reason. Any help is really appreciated.
> >
> > Thanks,
> > Mano
> >
> >
> > On Thu, Oct 31, 2013 at 10:53 PM, Rui Barradas 
> wrote:
> >
> > > Hello,
> > >
> > > As for the problem of finding y given the ecdf and x, it's very easy,
> just
> > > use the ecdf:
> > >
> > > f <- ecdf(rnorm(100))
> > >
> > > x <- rnorm(10)
> > > y <- f(x)
> > >
> > > If you want to get the x corresponding to given y, use linear
> > > interpolation.
> > >
> > > inv_ecdf <- function(f){
> > > x <- environment(f)$x
> > > y <- environment(f)$y
> > > approxfun(y, x)
> > > }
> > >
> > > g <- inv_ecdf(f)
> > > g(0.5)
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > Em 31-10-2013 12:25, Manoranjan Muthusamy escreveu:
> > >
> > >> Hi R users,
> > >>
> > >> I am a new user, still learning basics of R. Is there anyway to
> extract y
> > >> (or x) value for a known x (or y) value from ecdf (empirical
> cumulative
> > >> distribution function) curve?
> > >>
> > >> Thanks in advance.
> > >> Mano.
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
>
> > >> __**
> > >> R-help@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/**listinfo/r-
> > help
> > >> PLEASE do read the posting guide http://www.R-project.org/**
> > >> posting-guide.html 
>
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ** **
>

[[alternative HTML version deleted]]

___

Re: [R] find max value in each row and return column number and column name

2013-11-01 Thread David Winsemius

On Nov 1, 2013, at 10:03 AM, Gary Dong wrote:

> Dear R users,
> 
> I wonder how I can use R to identify the max value of each row, the column
> number column name:
> 
> For example:
> 
> a <- data.frame(x = rnorm(4), y = rnorm(4), z = rnorm(4))
> 
>> a
>   x  y  z
> 1 -0.7289964  0.2194702 -2.4674780
> 2  1.0889353  0.3167629 -0.9208548
> 3 -0.6374692 -1.7249049  0.6567313
> 4 -0.1348642  0.4507473 -1.7309010
> 
> In this data frame, I compare y and z only.
> 
> What I need:
> 
>  x y zmax 
> max.col.num max.col.name
> 1 -0.7289964   0.2194702  -2.4674780 0.2194702
>2
> y
> 2  1.0889353  0.3167629 -0.9208548 0.3167629   2
> y
> 3 -0.6374692 -1.7249049  0.6567313 0.6567313   3
>z
> 4 -0.1348642  0.4507473 -1.7309010 0.4507473   2
>y
> 
> 
> Any suggestion will be greatly appreciated!

cbind(a, max=apply(a,1,max),
 max.col.num =apply(a,1,which.max) ,
 max.col.name= names(a)[apply(a,1,which.max)]  )
> 
> Thank you!
> 
> Gary
> 
>   [[alternative HTML version deleted]]

You can express your appreciation by posting in plain-text in the future.

> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extraction of roots in R

2013-11-01 Thread Gary Dong
Dear R users,

I wonder if R has a default function that I can use to do extraction of
roots.

Here is an example:

X  N
2.5  5
3.4  7
8.9  9
6.4  1
2.1  0
1.1 2

I want to calculate Y = root(X)^N, where N represents the power. what is
the easy way to do this?

Thank you!

Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extraction of roots in R

2013-11-01 Thread Don McKenzie

If you just want the nth root of X, use X^(1/n)

> x <- 256
> x^(1/8)
[1] 2

> x <- -256
> x^(1/8)
[1] NaN

It appears that you get the positive real root.

Is this all you wanted?


On 1-Nov-13, at 11:11 AM, Gary Dong wrote:


Dear R users,

I wonder if R has a default function that I can use to do  
extraction of

roots.

Here is an example:

X  N
2.5  5
3.4  7
8.9  9
6.4  1
2.1  0
1.1 2

I want to calculate Y = root(X)^N, where N represents the power.  
what is

the easy way to do this?

Thank you!

Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.






Don McKenzie
Pacific Wildland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences
University of Washington

d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extraction of roots in R

2013-11-01 Thread Don McKenzie
If you want complex roots, there is a post by Ravi Varadhan from  
2010, a reprint of which I found quickly by a google search at


http://r.789695.n4.nabble.com/finding-complex-roots-in-R-td2541514.html


On 1-Nov-13, at 11:20 AM, Don McKenzie wrote:


If you just want the nth root of X, use X^(1/n)

> x <- 256
> x^(1/8)
[1] 2

> x <- -256
> x^(1/8)
[1] NaN

It appears that you get the positive real root.

Is this all you wanted?


On 1-Nov-13, at 11:11 AM, Gary Dong wrote:


Dear R users,

I wonder if R has a default function that I can use to do  
extraction of

roots.

Here is an example:

X  N
2.5  5
3.4  7
8.9  9
6.4  1
2.1  0
1.1 2

I want to calculate Y = root(X)^N, where N represents the power.  
what is

the easy way to do this?

Thank you!

Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.






Don McKenzie
Pacific Wildland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences
University of Washington

d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.






Don McKenzie
Pacific Wildland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences
University of Washington

d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spsurvey analysis

2013-11-01 Thread Law, Jason
I use the spsurvey package a decent amount.  The cont.cdftest function bins the 
cdf in order to perform the test which I think is the root of the problem.  
Unfortunately, the default is 3 which is the minimum number of bins.

I would contact Tom Kincaid or Tony Olsen at NHEERL WED directly to ask about 
this problem.

Another option would be to take a different analytical approach (e.g., a mixed 
effects model) which would allow you a lot more flexibility.

Jason Law
Statistician
City of Portland
Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason@portlandoregon.gov


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Tim Howard
Sent: Friday, November 01, 2013 7:49 AM
To: r-help@r-project.org
Subject: [R] spsurvey analysis

All,
I've used the excellent package, spsurvey, to create spatially balanced samples 
many times in the past. I'm now attempting to use the analysis portion of the 
package, which compares CDFs among sub-populations to test for differences in 
sub-population metrics. 
 
- My data (count data) have many zeros, following a negative binomial or even 
zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 
2. Essentially the same sample areas and number of samples.

The problem:
- cont.cdftest  throws a warning and does not complete for most (but not all) 
species sampled. Warning message: "The combined number of values in at least 
one class is less than five. Action: The user should consider using a smaller 
number of classes."

- There are plenty of samples in my two time periods (the dummy set below: 
Yr1=27, Yr2=31 non-zero values). 
 
My Question:
Why is it throwing this error and is there a way to get around it?



Reproduceable example (change path to spsurvey sample data), requires us to use 
spsurvey to generate sample points:

### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd("C:/Program Files/R/R-3.0.2/library/spsurvey/doc")
att <- read.dbf("UT_ecoregions")
shp <- read.shape("UT_ecoregions")

set.seed(4447864)

# Create the design list
Stratdsgn <- list("Central Basin and Range"=list(panel=c(PanelOne=25), 
seltype="Equal"),
  "Colorado Plateaus"=list(panel=c(PanelOne=25), 
seltype="Equal"),
  "Mojave Basin and Range"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Northern Basin and Range"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Southern Rockies"=list(panel=c(PanelOne=14), 
seltype="Equal"),
  "Wasatch and Uinta Mountains"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Wyoming Basin"=list(panel=c(PanelOne=6), seltype="Equal"))

# Select the sample design for each year
Stratsites_Yr1 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
   type.frame="area", src.frame="sp.object",
   sp.object=shp, att.frame=att, stratum="Level3_Nam", 
shapefile=FALSE)

Stratsites_Yr2 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
   type.frame="area", src.frame="sp.object",
   sp.object=shp, att.frame=att, stratum="Level3_Nam", 
shapefile=FALSE)

#extract the core information, add year as a grouping variable, add a plot ID 
to link with dummy data
Yr1 <- cbind(pltID = 1001:1100, Stratsites_Yr1@data[,c(1,2,3,5)], grp = "Yr1")
Yr2 <- cbind(pltID = 2001:2100, Stratsites_Yr2@data[,c(1,2,3,5)], grp = "Yr2")  
   
sitedat <- rbind(Yr1, Yr2)

# create dummy sampling data. Lots of zeros!
bn.a <- rnbinom(size = 0.06, mu = 19.87, n=100) bn.b <- rnbinom(size = 0.06, mu 
= 20.15, n=100) dat.a <- data.frame(pltID = 1001:1100, grp = "Yr1",count = 
bn.a) dat.b <- data.frame(pltID = 2001:2100, grp = "Yr2",count = bn.b) dat <- 
rbind(dat.a, dat.b)


## Analysis begins here

data.cont <- data.frame(siteID = dat$pltID, Density=dat$count) sites <- 
data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop <- 
data.frame(siteID = dat$pltID, 
All_years=(rep("allYears",nrow(dat))),
Year = dat$grp)
design <- data.frame(siteID = sitedat$pltID,
wgt = sitedat$wgt,
xcoord = sitedat$xcoord,
ycoord = sitedat$ycoord)
framesize <- c("Yr1"=888081202000, "Yr2"=888081202000)

## There seem to be pretty good estimates CDF_Estimates <- cont.analysis(sites, 
subpop, design, data.cont, 
popsize = list(All_years=sum(framesize),
Year = as.list(framesize)))

print(CDF_Estimates$Pct)

## this test fails
CDF_Tests <- cont.cdftest(sites, subpop[,c(1,3)], design, data.cont,
   popsize=list(Year=as.list(framesize)))
warn

Re: [R] spsurvey analysis

2013-11-01 Thread Tim Howard
Jason,
Thank you for your reply. Interesting ... so you think the 'classes' in the 
error message "The combined number of values in at least one class..."  is 
referring to the CDF bins rather than the sub-population classes that I 
defined. 
 
That makes sense as I only defined two classes (!). I was worried it was 
detecting and treating polygons as classes, somehow (ecoregions in example 
below).
 
I had already reached out to Kincaid and Olsen but had not received an answer 
yet so I moved on to R-help.   I'll go back to them. 
 
Thanks again.
Best, 
Tim

>>> "Law, Jason"  11/1/2013 2:47 PM >>>
I use the spsurvey package a decent amount.  The cont.cdftest function bins the 
cdf in order to perform the test which I think is the root of the problem.  
Unfortunately, the default is 3 which is the minimum number of bins.

I would contact Tom Kincaid or Tony Olsen at NHEERL WED directly to ask about 
this problem.

Another option would be to take a different analytical approach (e.g., a mixed 
effects model) which would allow you a lot more flexibility.

Jason Law
Statistician
City of Portland
Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason@portlandoregon.gov


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Tim Howard
Sent: Friday, November 01, 2013 7:49 AM
To: r-help@r-project.org
Subject: [R] spsurvey analysis

All,
I've used the excellent package, spsurvey, to create spatially balanced samples 
many times in the past. I'm now attempting to use the analysis portion of the 
package, which compares CDFs among sub-populations to test for differences in 
sub-population metrics. 

- My data (count data) have many zeros, following a negative binomial or even 
zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 
2. Essentially the same sample areas and number of samples.

The problem:
- cont.cdftest  throws a warning and does not complete for most (but not all) 
species sampled. Warning message: "The combined number of values in at least 
one class is less than five. Action: The user should consider using a smaller 
number of classes."

- There are plenty of samples in my two time periods (the dummy set below: 
Yr1=27, Yr2=31 non-zero values). 

My Question:
Why is it throwing this error and is there a way to get around it?



Reproduceable example (change path to spsurvey sample data), requires us to use 
spsurvey to generate sample points:

### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd("C:/Program Files/R/R-3.0.2/library/spsurvey/doc")
att <- read.dbf("UT_ecoregions")
shp <- read.shape("UT_ecoregions")

set.seed(4447864)

# Create the design list
Stratdsgn <- list("Central Basin and Range"=list(panel=c(PanelOne=25), 
seltype="Equal"),
  "Colorado Plateaus"=list(panel=c(PanelOne=25), 
seltype="Equal"),
  "Mojave Basin and Range"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Northern Basin and Range"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Southern Rockies"=list(panel=c(PanelOne=14), 
seltype="Equal"),
  "Wasatch and Uinta Mountains"=list(panel=c(PanelOne=10), 
seltype="Equal"),
  "Wyoming Basin"=list(panel=c(PanelOne=6), seltype="Equal"))

# Select the sample design for each year
Stratsites_Yr1 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
   type.frame="area", src.frame="sp.object",
   sp.object=shp, att.frame=att, stratum="Level3_Nam", 
shapefile=FALSE)

Stratsites_Yr2 <- grts(design=Stratdsgn, DesignID="STRATIFIED",
   type.frame="area", src.frame="sp.object",
   sp.object=shp, att.frame=att, stratum="Level3_Nam", 
shapefile=FALSE)

#extract the core information, add year as a grouping variable, add a plot ID 
to link with dummy data
Yr1 <- cbind(pltID = 1001:1100, Stratsites_Yr1@data[,c(1,2,3,5)], grp = "Yr1")
Yr2 <- cbind(pltID = 2001:2100, Stratsites_Yr2@data[,c(1,2,3,5)], grp = "Yr2")  
   
sitedat <- rbind(Yr1, Yr2)

# create dummy sampling data. Lots of zeros!
bn.a <- rnbinom(size = 0.06, mu = 19.87, n=100) bn.b <- rnbinom(size = 0.06, mu 
= 20.15, n=100) dat.a <- data.frame(pltID = 1001:1100, grp = "Yr1",count = 
bn.a) dat.b <- data.frame(pltID = 2001:2100, grp = "Yr2",count = bn.b) dat <- 
rbind(dat.a, dat.b)


## Analysis begins here

data.cont <- data.frame(siteID = dat$pltID, Density=dat$count) sites <- 
data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop <- 
data.frame(siteID = dat$pltID, 
All_years=(rep("allYears",nrow(dat))),
Year = dat$grp)
design <- data.frame(siteID = sitedat$pltID,
wgt 

Re: [R] mapping data to a geographic map of Europe

2013-11-01 Thread Adams, Jean
Claudia,

I have not worked through the example myself.  Since you seem to be getting
errors, perhaps a different example would help.  Here are some more
choropleth maps (although these use US states rather than European
countries).

http://blog.revolutionanalytics.com/2009/11/choropleth-challenge-result.html

Jean



On Fri, Nov 1, 2013 at 12:04 PM,  wrote:

> Hi Jean,
> thanks again for your response. As I told you  I did the downloads and
> double checked if I selected the right directory.
> But I noticed right now what happend:
> The command in the example is :
>
> eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/Shape/data/NUTS_RG_60M_**
> 2010")
>
> But it should be :
> eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/data/ggg/NUTS_RG_60M_**
> 2010")
>
> Because if you do the downloads and unzip these data there is no such
> think as a "Shape" directory.
>
> Now eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/data/NUTS_RG_60M_2010")
> works.
>
>
> What happens now is that after typing:
> eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO")
> I get another error message because "eurMapDf" is unknown.
>
> So I supposed it should be:
> eurEduMapDf <- merge(eurMap, eurEdu, by.x="id", by.y="GEO")
>
> But this doesn't work either.The error message this time is: undefined
> column selected.
>
> Did I do something wrong?
>
>
>
> Best regards
>
> Claudia
>
>
>
>
>
>
>
>
> Zitat von "Adams, Jean" :
>
>  Claudia,
>>
>> You should cc r-help on all correspondence so that others can follow the
>> thread.
>>
>> In the second paragraph of the link I sent you
>>  
>> http://www.r-bloggers.com/**maps-in-r-choropleth-maps/
>> a link is provided for the NUTS data,
>>  "The polygons for drawing the administrative boundaries were obtained
>> from this link. In particular, the NUTS 2010 shapefile in the 1:60 million
>> scale was downloaded and used. The other available scales would allow the
>> drawing of better defined maps, but at a computational cost. The zipped
>> file has to be extracted in a folder of choice for using it later."
>>
>> http://epp.eurostat.ec.europa.**eu/portal/page/portal/gisco_**
>> Geographical_information_maps/**popups/references/**administrative_units_
>> **statistical_units_1
>>
>> If you want to follow the example, you will need to download this data to
>> your computer and then make sure that you refer to the appropriate
>> directory when using the readShapePoly() function.
>>
>> Jean
>>
>>
>>
>> On Wed, Oct 30, 2013 at 11:14 AM,  wrote:
>>
>>  Hi Jean,
>>> thank you for your advice.
>>> The page looks quite interesting and I tried the example in  GNU R. I did
>>> all the downloads.
>>> But
>>> Just in the beginnig after typing
>>>
>>> eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_
>>> 
>>>
>>> 2010")
>>> I get the following error message:
>>>
>>> Error in getinfo.shape(filen) : Error opening SHP file
>>>
>>> To you have an idea what I did wrong?
>>>
>>> Thanks a lot and best regards
>>>
>>> Claudia
>>>
>>>
>>>
>>>
>>> Zitat von "Adams, Jean" :
>>>
>>>  Check out this link for some examples
>>>
 
 http://www.r-bloggers.com/maps-in-r-choropleth-maps/
 http://www.r-bloggers.com/maps-in-r-choropleth-maps/>
 >


 Jean


 On Tue, Oct 29, 2013 at 12:02 PM,  wrote:

  Hello,

> I would like to draw a map of Europe. Each country should be colored
> depending on how it scores in an index called GPIndex.
> Say a dark red for real bad countries a light red for those which are
> not
> so bad, light blue for the fairly good ones and so on up to the really
> good
> ones in a dark blue.
> I never worked with geographic maps before so I tried library maps but
> I
> didn't get far,- especially because all examples I found only seem to
> work
> for the United states. So I'm a bit lost.
> I would be nice if somebody could help me.
>
> Thanking you in anticipation!
>
> Best regards
>
> Claudia
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> 
> >
> 
> https://stat.ethz.ch/mailman/listinfo/r-help>
> >
>
> >
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html 
> 
> 

Re: [R] mapping data to a geographic map of Europe

2013-11-01 Thread Bert Gunter
... but you may be interested in this:

http://andywoodruff.com/blog/why-are-choropleth-mercator-maps-bad-because-we-said-so/

Cheers,
Bert

On Fri, Nov 1, 2013 at 12:18 PM, Adams, Jean  wrote:
> Claudia,
>
> I have not worked through the example myself.  Since you seem to be getting
> errors, perhaps a different example would help.  Here are some more
> choropleth maps (although these use US states rather than European
> countries).
>
> http://blog.revolutionanalytics.com/2009/11/choropleth-challenge-result.html
>
> Jean
>
>
>
> On Fri, Nov 1, 2013 at 12:04 PM,  wrote:
>
>> Hi Jean,
>> thanks again for your response. As I told you  I did the downloads and
>> double checked if I selected the right directory.
>> But I noticed right now what happend:
>> The command in the example is :
>>
>> eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/Shape/data/NUTS_RG_60M_**
>> 2010")
>>
>> But it should be :
>> eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/data/ggg/NUTS_RG_60M_**
>> 2010")
>>
>> Because if you do the downloads and unzip these data there is no such
>> think as a "Shape" directory.
>>
>> Now eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/data/NUTS_RG_60M_2010")
>> works.
>>
>>
>> What happens now is that after typing:
>> eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO")
>> I get another error message because "eurMapDf" is unknown.
>>
>> So I supposed it should be:
>> eurEduMapDf <- merge(eurMap, eurEdu, by.x="id", by.y="GEO")
>>
>> But this doesn't work either.The error message this time is: undefined
>> column selected.
>>
>> Did I do something wrong?
>>
>>
>>
>> Best regards
>>
>> Claudia
>>
>>
>>
>>
>>
>>
>>
>>
>> Zitat von "Adams, Jean" :
>>
>>  Claudia,
>>>
>>> You should cc r-help on all correspondence so that others can follow the
>>> thread.
>>>
>>> In the second paragraph of the link I sent you
>>>  
>>> http://www.r-bloggers.com/**maps-in-r-choropleth-maps/
>>> a link is provided for the NUTS data,
>>>  "The polygons for drawing the administrative boundaries were obtained
>>> from this link. In particular, the NUTS 2010 shapefile in the 1:60 million
>>> scale was downloaded and used. The other available scales would allow the
>>> drawing of better defined maps, but at a computational cost. The zipped
>>> file has to be extracted in a folder of choice for using it later."
>>>
>>> http://epp.eurostat.ec.europa.**eu/portal/page/portal/gisco_**
>>> Geographical_information_maps/**popups/references/**administrative_units_
>>> **statistical_units_1
>>>
>>> If you want to follow the example, you will need to download this data to
>>> your computer and then make sure that you refer to the appropriate
>>> directory when using the readShapePoly() function.
>>>
>>> Jean
>>>
>>>
>>>
>>> On Wed, Oct 30, 2013 at 11:14 AM,  wrote:
>>>
>>>  Hi Jean,
 thank you for your advice.
 The page looks quite interesting and I tried the example in  GNU R. I did
 all the downloads.
 But
 Just in the beginnig after typing

 eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_
 

 2010")
 I get the following error message:

 Error in getinfo.shape(filen) : Error opening SHP file

 To you have an idea what I did wrong?

 Thanks a lot and best regards

 Claudia




 Zitat von "Adams, Jean" :

  Check out this link for some examples

> 
> http://www.r-bloggers.com/maps-in-r-choropleth-maps/
> http://www.r-bloggers.com/maps-in-r-choropleth-maps/>
> >
>
>
> Jean
>
>
> On Tue, Oct 29, 2013 at 12:02 PM,  wrote:
>
>  Hello,
>
>> I would like to draw a map of Europe. Each country should be colored
>> depending on how it scores in an index called GPIndex.
>> Say a dark red for real bad countries a light red for those which are
>> not
>> so bad, light blue for the fairly good ones and so on up to the really
>> good
>> ones in a dark blue.
>> I never worked with geographic maps before so I tried library maps but
>> I
>> didn't get far,- especially because all examples I found only seem to
>> work
>> for the United states. So I'm a bit lost.
>> I would be nice if somebody could help me.
>>
>> Thanking you in anticipation!
>>
>> Best regards
>>
>> Claudia
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> 
>> >
>> 

[R] constucting a sub-network based on time period

2013-11-01 Thread Jinie Pak
A sample of my data looks like this.

Header: Time Sender Receiver



  11   2

  11   3

  22   1

  22   1

  31   2

  31   2

There are 3 time periods (sessions) and the edgelists between nodes.



I tried to write the code for subsetting data (for constructing a network)
based on time  period as follows:



>uniq <-unique(unlist(df$Time))



>uniq [1] 1 2 3



>t=list()



>net=list()



>g=list()



>for (i in 1:length(uniq)) {



>t[[i]]<-subset(df, Time==uniq[i])



>t[[i]] <-as.matrix(t[[i]])



>net[[i]]<-t[[i]][,-1] #removing time column



# getting edgelist

>net[[i]][,1]=as.character (net[[i]][,1])



>net[[i]][,2]=as.character (net[[i]][,2])



>g [[i]]=graph.edgelist (net [[i]], directed=T)



>g [[i]] }



however, I've got an error message (expect two columns).I am kind of new to
R so it is hard to figure it out.  I guess t[i] is the problem.

Is there anyone who can find my logical or syntax error in the code?



Jinie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot of coxph tt effects

2013-11-01 Thread cesar garcia perez de leon
Dear all, 

We are conducting a study in with a set of covariates and a time to event 
outcome. 
Covariates b1 and b3 violate proportionality. We applied a coxph with a “tt” 
term to evaluate the nature of time dependence. 

Call: 
coxph(formula = Surv(start - 1, stop, outcome) ~ tt(b1) + 
b5 + tt(b3) + b4, data = data, tt = function(x,  t, ...) pspline(x + t/90)) 

We would like to plot the predicted effects. Following the logic of the tt 
function, does it make sense to plot the residuals against the results from the 
model with tt for the upsetting covariates? 

As far as we know the there has not been updates of survfit for the tt  
function. So, we are unsure as to how to plot our curves. 
  
I would appreciate if someone can give me an advise or a publication reference 
that can help us in this matter. 

Many thanks, 

Cesar  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mapping data to a geographic map of Europe

2013-11-01 Thread paladini

Hi Jean,
thanks again for your response. As I told you  I did the downloads and  
double checked if I selected the right directory.

But I noticed right now what happend:
The command in the example is :
eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_2010")

But it should be :
eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/data/ggg/NUTS_RG_60M_2010")

Because if you do the downloads and unzip these data there is no such  
think as a "Shape" directory.


Now eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/data/NUTS_RG_60M_2010")
works.


What happens now is that after typing:
eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO")
I get another error message because "eurMapDf" is unknown.

So I supposed it should be:
eurEduMapDf <- merge(eurMap, eurEdu, by.x="id", by.y="GEO")

But this doesn't work either.The error message this time is: undefined  
column selected.


Did I do something wrong?


Best regards

Claudia








Zitat von "Adams, Jean" :


Claudia,

You should cc r-help on all correspondence so that others can follow the
thread.

In the second paragraph of the link I sent you
 http://www.r-bloggers.com/maps-in-r-choropleth-maps/
a link is provided for the NUTS data,
 "The polygons for drawing the administrative boundaries were obtained
from this link. In particular, the NUTS 2010 shapefile in the 1:60 million
scale was downloaded and used. The other available scales would allow the
drawing of better defined maps, but at a computational cost. The zipped
file has to be extracted in a folder of choice for using it later."

http://epp.eurostat.ec.europa.eu/portal/page/portal/gisco_Geographical_information_maps/popups/references/administrative_units_statistical_units_1

If you want to follow the example, you will need to download this data to
your computer and then make sure that you refer to the appropriate
directory when using the readShapePoly() function.

Jean



On Wed, Oct 30, 2013 at 11:14 AM,  wrote:


Hi Jean,
thank you for your advice.
The page looks quite interesting and I tried the example in  GNU R. I did
all the downloads.
But
Just in the beginnig after typing

eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/Shape/data/NUTS_RG_60M_**
2010")
I get the following error message:

Error in getinfo.shape(filen) : Error opening SHP file

To you have an idea what I did wrong?

Thanks a lot and best regards

Claudia




Zitat von "Adams, Jean" :

 Check out this link for some examples
 
http://www.r-bloggers.com/**maps-in-r-choropleth-maps/


Jean


On Tue, Oct 29, 2013 at 12:02 PM,  wrote:

 Hello,

I would like to draw a map of Europe. Each country should be colored
depending on how it scores in an index called GPIndex.
Say a dark red for real bad countries a light red for those which are not
so bad, light blue for the fairly good ones and so on up to the really
good
ones in a dark blue.
I never worked with geographic maps before so I tried library maps but I
didn't get far,- especially because all examples I found only seem to
work
for the United states. So I'm a bit lost.
I would be nice if somebody could help me.

Thanking you in anticipation!

Best regards

Claudia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

>
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html  


>

and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mapping data to a geographic map of Europe

2013-11-01 Thread paladini

Hi Jim,
that works nice.
Thanks again!

Have a nice weekend, best regards

Claudia


Zitat von Jim Lemon :


On 10/31/2013 03:04 AM, palad...@trustindata.de wrote:

Hi Jim,
thats the second time that you helped me in a short while so thanks a lot!

But it seems to me quite laborious and error-prone to first select all
the relevant countries in this long list and then to create a color vector.
But perhaps I get it all wrong.


For the color vector I first did this

imagecolors<-color.scale(mydata$GPIndex ,c(1,0,0),0,c(0,0,1))

because I wanted the colors to scale from dark red (bad ones) to dark
blue (good ones).
But it went somehow wrong. By the way can you tell me what I did wrong?

Nevertheless I than createt a color vector looking loke this:

eurocol=c("#FFFF",8,"#71FF","#39FF",8,8,"#39FF",rep(8,10),"#2FFF"

,8,"#00FF",8,"#00FF","#00FF" ,"#55FF",8,"#64FF",2,
"#83FF",8,8,"#8BFF" ,"#F0FF" ,rep(8,20),"#F7FF"
,rep(8,18),"#", rep(8,120))


And than

world.map<-map('world', fill = TRUE,col =eurocol
,xlim=c(-12,35),ylim=c(37,70))

Beside the wrong colors it worked okay.
But I am not really happy with this solution.

Did I misapprehend you?


Hi Claudi,
Maybe. You write that the transformation of GPIndex to colors "went  
wrong". Let's see:


# make up GPIndex
GPIndex<-c(sample(1:100,33),rep(NA,165))
# transform to colors
eurocol<-color.scale(GPIndex,c(1,0),0,c(0,1))
world.map<-map('world',fill=TRUE,
 col=eurocol,xlim=c(-12,35),ylim=c(37,70))

This gives me what I would expect, and checking the colors against  
the country names (world.map$names) looks like the correct colors  
have been displayed. Obviously I left a lot of areas out (missed UK  
and Ireland for example) as I didn't want to overplot individual  
countries with areas. Does this look okay to you?


Jim


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Impose constraint on first order derivative at a point for cubic smoothing spline

2013-11-01 Thread Victor hyk
Hello, 
      Dr. Simon Wood told me how to force a cubic spline passing through a
point. The code is as following. Anyone  who knows how I can change the code
to force the first derivative to be certain value. For example, the first
derivative of the constrained cubic spline equals 2 at point (0, 0.6). 
      I really appreciate your help! 
      Thanks! 
        
      Best      
      Victor 
      
      Here is the initial reply and code provided by Dr. Simon Wood: 

"Actually, you might as well use "gam" directly for this (has the 
advantage that the smoothing parameter will be chosen correctly subject 
to the constraint). Here is some code. Key idea is to set the basis and 
penalty for the spline up first, apply the constraint, and then use gam 
to fit it..." 

best, 
Simon 

## Example constraining spline to pass through a 
## particular point (0,.6)... 

## Fake some data... 

library(mgcv) 
set.seed(0) 
n <- 100 
x <- runif(n)*4-1;x <- sort(x); 
f <- exp(4*x)/(1+exp(4*x));y <- f+rnorm(100)*0.1;plot(x,y) 
dat <- data.frame(x=x,y=y) 

## Create a spline basis and penalty, making sure there is a knot 
## at the constraint point, (0 here, but could be anywhere) 
knots <- data.frame(x=seq(-1,3,length=9)) ## create knots 
## set up smoother... 
sm <- smoothCon(s(x,k=9,bs="cr"),dat,knots=knots)[[1]] 

## 3rd parameter is value of spline at knot location 0, 
## set it to 0 by dropping... 
X <- sm$X[,-3]        ## spline basis 
S <- sm$S[[1]][-3,-3] ## spline penalty 
off <- y*0 + .6      ## offset term to force curve through (0, .6) 

## fit spline constrained through (0, .6)... 
b <- gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S))) 
lines(x,predict(b)) 

## compare to unconstrained fit... 

b.u <- gam(y ~ s(x,k=9),data=dat,knots=knots) 
lines(x,predict(b.u))  

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Load Tawny package on R 2.15.3

2013-11-01 Thread Tstudent

> I have no specific expertise here, but I just wanted to point out that
> this sounds like a losing strategy long term: As new packages and
> newer versions of packages come out that fix bugs and add features,
> you'll be unable to use them because you'll be stuck with 2.15.3 . I
> suggest you bite the bullet and follow the experts' advice to get
> things working with the current R version now.
> 
> Cheers,
> Bert
> 
> >
> > __
> > R-help  r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 


It seems that the only possibility for me is to install R 3.0
So i have a question. 
Now i use R 2.15.3 and Rstudio (which is linked to R 2.15.3)
Can i install R 3.0 in another directory but leave R 2.15.3 as default or
primary R  ? Any problems to do this? Something to be careful?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mapping data to a geographic map of Europe

2013-11-01 Thread paladini

Hi Jean,
nevertheless this page "R-bloggers" looks realy interesting so I'll  
work through the tutorial.

Thanks again for recommanding this web-site.

Best regards

Claudia


Zitat von "Adams, Jean" :


Claudia,

I have not worked through the example myself.  Since you seem to be getting
errors, perhaps a different example would help.  Here are some more
choropleth maps (although these use US states rather than European
countries).

http://blog.revolutionanalytics.com/2009/11/choropleth-challenge-result.html

Jean



On Fri, Nov 1, 2013 at 12:04 PM,  wrote:


Hi Jean,
thanks again for your response. As I told you  I did the downloads and
double checked if I selected the right directory.
But I noticed right now what happend:
The command in the example is :

eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/Shape/data/NUTS_RG_60M_**
2010")

But it should be :
eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/data/ggg/NUTS_RG_60M_**
2010")

Because if you do the downloads and unzip these data there is no such
think as a "Shape" directory.

Now eurMap <- readShapePoly(fn="NUTS_2010_**60M_SH/data/NUTS_RG_60M_2010")
works.


What happens now is that after typing:
eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO")
I get another error message because "eurMapDf" is unknown.

So I supposed it should be:
eurEduMapDf <- merge(eurMap, eurEdu, by.x="id", by.y="GEO")

But this doesn't work either.The error message this time is: undefined
column selected.

Did I do something wrong?



Best regards

Claudia








Zitat von "Adams, Jean" :

 Claudia,


You should cc r-help on all correspondence so that others can follow the
thread.

In the second paragraph of the link I sent you
  
http://www.r-bloggers.com/**maps-in-r-choropleth-maps/

a link is provided for the NUTS data,
 "The polygons for drawing the administrative boundaries were obtained
from this link. In particular, the NUTS 2010 shapefile in the 1:60 million
scale was downloaded and used. The other available scales would allow the
drawing of better defined maps, but at a computational cost. The zipped
file has to be extracted in a folder of choice for using it later."

http://epp.eurostat.ec.europa.**eu/portal/page/portal/gisco_**
Geographical_information_maps/**popups/references/**administrative_units_
**statistical_units_1

If you want to follow the example, you will need to download this data to
your computer and then make sure that you refer to the appropriate
directory when using the readShapePoly() function.

Jean



On Wed, Oct 30, 2013 at 11:14 AM,  wrote:

 Hi Jean,

thank you for your advice.
The page looks quite interesting and I tried the example in  GNU R. I did
all the downloads.
But
Just in the beginnig after typing

eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_


2010")
I get the following error message:

Error in getinfo.shape(filen) : Error opening SHP file

To you have an idea what I did wrong?

Thanks a lot and best regards

Claudia




Zitat von "Adams, Jean" :

 Check out this link for some examples

 
http://www.r-bloggers.com/maps-in-r-choropleth-maps/

http://www.r-bloggers.com/maps-in-r-choropleth-maps/>
>


Jean


On Tue, Oct 29, 2013 at 12:02 PM,  wrote:

 Hello,


I would like to draw a map of Europe. Each country should be colored
depending on how it scores in an index called GPIndex.
Say a dark red for real bad countries a light red for those which are
not
so bad, light blue for the fairly good ones and so on up to the really
good
ones in a dark blue.
I never worked with geographic maps before so I tried library maps but
I
didn't get far,- especially because all examples I found only seem to
work
for the United states. So I'm a bit lost.
I would be nice if somebody could help me.

Thanking you in anticipation!

Best regards

Claudia

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-help

>

https://stat.ethz.ch/mailman/listinfo/r-help>
>

>
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html  



>

>

and provide commented, minimal, self-contained, reproducible code.











__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinf

Re: [R] Plot of coxph tt effects

2013-11-01 Thread David Winsemius

On Nov 1, 2013, at 11:16 AM, cesar garcia perez de leon wrote:

> Dear all, 
> 
> We are conducting a study in with a set of covariates and a time to event 
> outcome. 
> Covariates b1 and b3 violate proportionality.

Can you describe the basis for that statement?

> We applied a coxph with a „tt‰ term to evaluate the nature of time 
> dependence. 
> 
> Call: 
> coxph(formula = Surv(start - 1, stop, outcome) ~ tt(b1) + 
>b5 + tt(b3) + b4, data = data, tt = function(x,  t, ...) pspline(x + 
> t/90)) 
> 
> We would like to plot the predicted effects. Following the logic of the tt 
> function, does it make sense to plot the residuals against the results from 
> the model with tt for the upsetting covariates? 
> 
> As far as we know the there has not been updates of survfit for the tt  
> function. So, we are unsure as to how to plot our curves. 
> 
> I would appreciate if someone can give me an advise or a publication 
> reference that can help us in this matter. 

Is there a reason not to use `cox.zph` and it's `print` and `plot` methods for 
this purpose? I think it is already offering what you are trying to re0invent.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2013-11-01 Thread arun


Hi,

Check whether this works:


vec1 <- c( 'eric', 'JOHN', 'eric', 'JOHN', 'steve', 'scott', 'steve', 'scott', 
'JOHN', 'eric')
vec2 <- c( 'eric', 'JOHN', 'eric', 'eric', 'JOHN', 'JOHN', 'steve', 'steve', 
'scott', 'scott')
vec3 <- c( 'eric', 'eric', 'JOHN', 'eric', 'JOHN', 'JOHN', 'steve', 'steve', 
'scott', 'scott')

vec4 <- c( 'eric', 'eric', 'JOHN', 'eric', 'JOHN', 'steve', 'steve', 'scott', 
'scott','JOHN')
vec5 <- c('JOHN', 'JOHN', 'eric', 'eric', 'JOHN', 'eric', 'steve', 'steve', 
'scott', 'scott')
vec6 <- c( 'eric', 'eric',  'eric', 'JOHN','JOHN', 'JOHN', 'steve', 
'steve','scott', 'scott')
vec7 <- c( 'eric', 'eric',  'eric', 'JOHN','JOHN', 'JOHN', 'steve', 'scott', 
'scott', 'steve')
fun1 <- function(vec) {
 sum(unlist(sapply(unique(vec),function(x) {x1 <- diff(which(vec %in% x)); 
ifelse(x1==1, 1, -x1)}),use.names=FALSE))
 }


A.K.






I took a steve and put it at the end of the vector.  score should be less as 
the steves are farther apart.

> vec3 <- c( 'eric', 'eric',  'eric', 'JOHN','JOHN', 'JOHN', 'steve', 
> 'steve','scott', 'scott')
> sum(diff(vec3[-1]!=vec3[-length(vec3)])) +  sum(vec3[-1]== 
> vec3[-length(vec3)])
[1] 6

> vec4 <- c( 'eric', 'eric',  'eric', 'JOHN','JOHN', 'JOHN', 'steve', 'scott', 
> 'scott', 'steve')
> sum(diff(vec4[-1]!=vec4[-length(vec4)])) +  sum(vec4[-1]== 
> vec4[-length(vec4)])
[1] 6

S

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quickly extract response from formula

2013-11-01 Thread Andreas Leha
Hi David,

thanks for your quick answer!

David Winsemius  writes:

> On Oct 31, 2013, at 1:27 PM, Andreas Leha wrote:
>
>> Hi all,
>> 
>> what is the recommended way to quickly (and without much burden on the
>> memory) extract the response from a formula?
>
> If you want its expression value its just form[[2]]
>
> If you wnat it evaluated in the environment of a dataframe then this should 
> be fairly efficient:
>
> x <- stats::runif(20)
> y <- stats::runif(20)
> dfrm <- data.frame(x=x,y=y)
> extractResponse <- function(frm, dat) { resp <- frm[[2]]; print(resp) # 
> that's optional
>  fdat <- eval(resp,
>  envir=dat); return(fdat) }

This is what I'll be using.  Thanks again!

[...]

Regards,
Andreas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quickly extract response from formula

2013-11-01 Thread William Dunlap
You can bullet-proof it a bit by making sure that length(formula)==3
before assuming that formula[[2]] is the response.   If length(formula)==2
then there is no response term, only predictor terms.  E.g., replace
   resp <- frm[[2]]
with
   resp <- if (length(frm)==3) frm[[2]] else NULL
(or call stop(), or warning(), ...)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Andreas Leha
> Sent: Friday, November 01, 2013 2:50 PM
> To: r-h...@stat.math.ethz.ch
> Subject: Re: [R] quickly extract response from formula
> 
> Hi David,
> 
> thanks for your quick answer!
> 
> David Winsemius  writes:
> 
> > On Oct 31, 2013, at 1:27 PM, Andreas Leha wrote:
> >
> >> Hi all,
> >>
> >> what is the recommended way to quickly (and without much burden on the
> >> memory) extract the response from a formula?
> >
> > If you want its expression value its just form[[2]]
> >
> > If you wnat it evaluated in the environment of a dataframe then this should 
> > be fairly
> efficient:
> >
> > x <- stats::runif(20)
> > y <- stats::runif(20)
> > dfrm <- data.frame(x=x,y=y)
> > extractResponse <- function(frm, dat) { resp <- frm[[2]]; print(resp) # 
> > that's optional
> >  fdat <- eval(resp,
> >  envir=dat); return(fdat) }
> 
> This is what I'll be using.  Thanks again!
> 
> [...]
> 
> Regards,
> Andreas
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting 17M entries into env took 18h, inserting 34M entries taking 5+ days

2013-11-01 Thread Martin Morgan

On 11/01/2013 08:22 AM, Magnus Thor Torfason wrote:

Sure,

I was attempting to be concise and boiling it down to what I saw as the root
issue, but you are right, I could have taken it a step further. So here goes.

I have a set of around around 20M string pairs. A given string (say, A) can
either be equivalent to another string (B) or not. If A and B occur together in
the same pair, they are equivalent. But equivalence is transitive, so if A and B
occur together in one pair, and A and C occur together in another pair, then A
and C are also equivalent. I need a way to quickly determine if any two strings
from my data set are equivalent or not.


Do you mean that if A,B occur together and B,C occur together, then A,B and A,C 
are equivalent?


Here's a function that returns a unique identifier (not well tested!), allowing 
for transitive relations but not circularity.


 uid <- function(x, y)
{
i <- seq_along(x)   # global index
xy <- paste0(x, y)  # make unique identifiers
idx <- match(xy, xy)

repeat {
## transitive look-up
y_idx <- match(y[idx], x)   # look up 'y' in 'x'
keep <- !is.na(y_idx)
if (!any(keep)) # no transitive relations, done!
break
x[idx[keep]] <- x[y_idx[keep]]
y[idx[keep]] <- y[y_idx[keep]]

## create new index of values
xy <- paste0(x, y)
idx <- match(xy, xy)
}
idx
}

Values with the same index are identical. Some tests

> x <- c(1, 2, 3, 4)
> y <- c(2, 3, 5, 6)
> uid(x, y)
[1] 1 1 1 4
> i <- sample(x); uid(x[i], y[i])
[1] 1 1 3 1
> uid(as.character(x), as.character(y))  ## character() ok
[1] 1 1 1 4
> uid(1:10, 1 + 1:10)
 [1] 1 1 1 1 1 1 1 1 1 1
> uid(integer(), integer())
integer(0)
> x <- c(1, 2, 3)
> y <- c(2, 3, 1)
> uid(x, y)  ## circular!
  C-c C-c

I think this will scale well enough, but the worst-case scenario can be made to 
be log(longest chain) and copying can be reduced by using an index i and 
subsetting the original vector on each iteration. I think you could test for 
circularity by checking that the updated x are not a permutation of the kept x, 
all(x[y_idx[keep]] %in% x[keep]))


Martin



The way I do this currently is to designate the smallest (alphabetically) string
in each known equivalence set as the "main" entry. For each pair, I therefore
insert two entries into the hash table, both pointing at the mail value. So
assuming the input data:

A,B
B,C
D,E

I would then have:

A->A
B->A
C->B
D->D
E->D

Except that I also follow each chain until I reach the end (key==value), and
insert pointers to the "main" value for every value I find along the way. After
doing that, I end up with:

A->A
B->A
C->A
D->D
E->D

And I can very quickly check equivalence, either by comparing the hash of two
strings, or simply by transforming each string into its hash, and then I can use
simple comparison from then on. The code for generating the final hash table is
as follows:

h : Empty hash table created with hash.new()
d : Input data
hash.deep.get : Function that iterates through the hash table until it finds a
key whose value is equal to itself (until hash.get(X)==X), then returns all the
values in a vector


h = hash.new()
for ( i in 1:nrow(d) )
{
 deep.a  = hash.deep.get(h, d$a[i])
 deep.b  = hash.deep.get(h, d$b[i])
 equivalents = sort(unique(c(deep.a,deep.b)))
 equiv.id= min(equivalents)
 for ( equivalent in equivalents )
 {
 hash.put(h, equivalent, equiv.id)
 }
}


I would so much appreciate if there was a simpler and faster way to do this.
Keeping my fingers crossed that one of the R-help geniuses who sees this is
sufficiently interested to crack the problem

Best,
Magnus

On 11/1/2013 1:49 PM, jim holtman wrote:

It would be nice if you followed the posting guidelines and at least
showed the script that was creating your entries now so that we
understand the problem you are trying to solve.  A bit more
explanation of why you want this would be useful.  This gets to the
second part of my tag line:  Tell me what you want to do, not how you
want to do it.  There may be other solutions to your problem.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Nov 1, 2013 at 9:32 AM, Magnus Thor Torfason
 wrote:

Pretty much what the subject says:

I used an env as the basis for a Hashtable in R, based on information that
this is in fact the way environments are implemented under the hood.

I've been experimenting with doubling the number of entries, and so far it
has seemed to be scaling more or less linearly, as expected.

But as I went from 17 million entries to 34 million entries, the completion
time has gone from 18 hou

Re: [R] Lattice Legend/Key by row instead of by column

2013-11-01 Thread Duncan Mackay
Hi Richard

Untested Perhaps adding some dummy factors with NA and then have their
labels as " " and color of lines as 0 or "transparent".

I think that I used it partly for the same reason and in addition I was
combining 2 purposes with  the groups and wanted to split them 

Duncan

-Original Message-
From: Richard Kwock [mailto:richardkw...@gmail.com] 
Sent: Saturday, 2 November 2013 02:31
To: Duncan Mackay
Cc: R
Subject: Re: [R] Lattice Legend/Key by row instead of by column

Hi Duncan,

Thanks for that template. Not quite the solution I was hoping for, but that
works!

Richard

On Thu, Oct 31, 2013 at 3:47 PM, Duncan Mackay  wrote:
> Hi Richard
>
> If you cannot get a better suggestion this example from Deepayan 
> Sarkar may help.
> It is way back in the archives and I do not have a reference for it.
>
> I have used it about a year ago as a template to do a complicated key
>
> fl <- grid.layout(nrow = 2, ncol = 6,
>   heights = unit(rep(1, 2), "lines"),
>   widths = unit(c(2, 1, 2, 1, 2, 1),
>
> c("cm","strwidth","cm","strwidth","cm","strwidth"),
>   data = list(NULL,"John",NULL,"George",NULL,"The
> Beatles")))
>
> foo <- frameGrob(layout = fl)
> foo <- placeGrob(foo,
>  pointsGrob(.5, .5, pch=19,
> gp = gpar(col="red", cex=0.5)),
>  row = 1, col = 1)
> foo <- placeGrob(foo,
>  linesGrob(c(0.2, 0.8), c(.5, .5),
>gp = gpar(col="blue")),
>  row = 2, col = 1)
> foo <- placeGrob(foo,
>  linesGrob(c(0.2, 0.8), c(.5, .5),
>gp = gpar(col="green")),
>  row = 1, col = 3)
> foo <- placeGrob(foo,
>  linesGrob(c(0.2, 0.8), c(.5, .5),
>gp = gpar(col="orange")),
>  row = 2, col = 3)
> foo <- placeGrob(foo,
>  rectGrob(width = 0.6,
>   gp = gpar(col="#CC",
>   fill = "#CC")),
>  row = 1, col = 5)
> foo <- placeGrob(foo,
>  textGrob(lab = "John"),
>  row = 1, col = 2)
> foo <- placeGrob(foo,
>  textGrob(lab = "Paul"),
>  row = 2, col = 2)
> foo <- placeGrob(foo,
>  textGrob(lab = "George"),
>  row = 1, col = 4)
> foo <- placeGrob(foo,
>  textGrob(lab = "Ringo"),
>  row = 2, col = 4)
> foo <- placeGrob(foo,
>  textGrob(lab = "The Beatles"),
>  row = 1, col = 6)
>
> xyplot(1 ~ 1, legend = list(top = list(fun = foo)))
>
> In my case I changed  "strwidth" to "cm" for the text as I was cramped 
> for space
>
> HTH
>
> Duncan
>
> Duncan Mackay
> Department of Agronomy and Soil Science University of New England 
> Armidale NSW 2351
> Email: home: mac...@northnet.com.au
>
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Richard Kwock
> Sent: Friday, 1 November 2013 06:42
> To: R help
> Subject: [R] Lattice Legend/Key by row instead of by column
>
> Hi All,
>
> I am having some trouble getting lattice to display the legend names 
> by row instead of by column (default).
>
> Example:
>
> library(lattice)
> set.seed(456846)
> data <- matrix(c(1:10) + runif(50), ncol = 5, nrow = 10) dataset <- 
> data.frame(data = as.vector(data), group = rep(1:5, each = 10), time = 
> 1:10)
>
> xyplot(data ~ time, group = group, dataset, t = "l",
>   key = list(text = list(paste("group", unique(dataset$group)) ),
> lines = list(col = trellis.par.get()$superpose.symbol$col[1:5]),
> columns = 4
>   )
> )
>
> What I'm hoping for are 4 columns in the legend, like this:
> Legend row 1: "group 1", "group 2", "group 3", "group 4"
> Legend row 2: "group 5"
>
> However, I'm getting:
> Legend row 1: "group 1", "group 3", "group 5"
> Legend row 2: "group 2", "group 4"
>
> I can see how this might work if I include blanks/NULLs in the legend 
> as placeholders, but that might get messy in data sets with many groups.
>
> Any ideas on how to get around this?
>
> Thanks,
> Richard
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combinations of values in two columns

2013-11-01 Thread arun


Hi,
You may try:
dat1 <- read.table(text="
Friend1,Friend2
A,B
A,C
B,A
C,D",sep=",",header=TRUE,stringsAsFactors=FALSE)
indx <- as.vector(outer(unique(dat1[,1]),unique(dat1[,2]),paste))
res <- 
cbind(setNames(read.table(text=indx,sep="",header=FALSE,stringsAsFactors=FALSE),paste0("Friend",1:2)),
 New=1*(indx %in% as.character(interaction(dat1,sep=" "

A.K.



On Friday, November 1, 2013 5:56 AM, Thomas  
wrote:
I have data that looks like this:

Friend1, Friend2
A, B
A, C
B, A
C, D

And I'd like to generate some more rows and another column. In the new  
column I'd like to add a 1 beside all the existing rows. That bit's  
easy enough.

Then I'd like to add rows for all the possible directed combinations  
of rows not included in the existing data. So for the above I think  
that would be:

A, D
D, A
B, C
C, B
B, D
C, A
D, B
D, C

and then put a 0 in the column beside these.

Can anyone suggest how to do this?

I'm using R version 2.15.3.

Thank you,

Thomas Chesney
This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system, you 
are advised to perform your own checks. Email communications with the 
University of Nottingham may be monitored as permitted by UK legislation.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Package(s) for making waffle plot-like figures?

2013-11-01 Thread Zhao Jin
Dear all,

I am trying to make a series of waffle plot-like figures for my data to
visualize the ratios of amino acid residues at each position. For each one
of 37 positions, there may be one to four different amino acid residues. So
the data consist of the positions, what residues are there, and the ratios
of residues. The ratios of residues at a position add up to 100, or close
to 100 (more on this soon)*. I am hoping to make a *square* waffle
plot-like figure for each position, and fill the 10 X 10 grids with colors
representing each amino acid residue and areas for grids of a certain color
corresponding to the ratio of that residue. Then I could line up all the
plots in one row from position 1 to position 37.
*: if the sum of the ratios is less than 100 at a position, that's because
of an unknown residue which I did not include in the table.

I am attaching the dput output for my data here:
structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L,
8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L,
26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L,
36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L,
7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L,
14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L,
15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L,
12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C", "D", "E",
"G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V",
"Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L, 99L,
100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L,
1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L,
98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L,
100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names =
c("position",
"residue", "ratio"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "10", "11", "12", "13", "14", "15",
"17", "18", "19", "20", "23", "25", "27", "28", "29", "30", "31",
"32", "33", "34", "36", "37", "38", "39", "40", "42", "43", "44",
"45", "46", "47", "48", "50", "51", "52", "53", "54", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65"))

Inspired by a statexchange post, I am using these scripts to make the plots
:
library(ggplot2)
col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99')
dflist=list()
for (i in 1:37){
residue_num=length(which(df$position==i))
dflist[[i]]=df[df$position==i,2:3]
waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(sum(dflist[[i]]$ratio)/residue_num)))
residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio)
waffle$residue=c(as.vector(residuevec),rep(NA,nrow(waffle)-length(residuevec)))
png(paste('plot',i,'.png',sep=''))
print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color =
"white") + scale_fill_manual("residue",values = col4) + coord_equal() +
theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank())
+ theme(axis.ticks=element_blank()) +
theme(axis.text.x=element_blank(),axis.text.y=element_blank()) +
theme(axis.title.x=element_blank(),axis.title.y=element_blank())
)
dev.off()}

With my scripts, I could make a waffle plot, but not a *square* 10 X 10
waffle plot. Also, the grid size differs for positions with different
numbers of residues. I am suspecting that I didn't use coord_equal()
correctly.

So I wonder how I can make the plots like I described above in ggplot2 or
with some other packages. Also, is there a way to assign a color to
different residues, say, purple for alanine, blue for glycine, etc, and
incorporate that information in the for loop?

Many thanks for any suggestion you may give me!

Zhao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quickly extract response from formula

2013-11-01 Thread Andreas Leha
William Dunlap  writes:

> You can bullet-proof it a bit by making sure that length(formula)==3
> before assuming that formula[[2]] is the response.   If length(formula)==2
> then there is no response term, only predictor terms.  E.g., replace
>resp <- frm[[2]]
> with
>resp <- if (length(frm)==3) frm[[2]] else NULL
> (or call stop(), or warning(), ...)

Will do.  Thanks.

- Andreas

>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
>> Behalf
>> Of Andreas Leha
>> Sent: Friday, November 01, 2013 2:50 PM
>> To: r-h...@stat.math.ethz.ch
>> Subject: Re: [R] quickly extract response from formula
>> 
>> Hi David,
>> 
>> thanks for your quick answer!
>> 
>> David Winsemius  writes:
>> 
>> > On Oct 31, 2013, at 1:27 PM, Andreas Leha wrote:
>> >
>> >> Hi all,
>> >>
>> >> what is the recommended way to quickly (and without much burden on the
>> >> memory) extract the response from a formula?
>> >
>> > If you want its expression value its just form[[2]]
>> >
>> > If you wnat it evaluated in the environment of a dataframe then this 
>> > should be fairly
>> efficient:
>> >
>> > x <- stats::runif(20)
>> > y <- stats::runif(20)
>> > dfrm <- data.frame(x=x,y=y)
>> > extractResponse <- function(frm, dat) { resp <- frm[[2]]; print(resp) # 
>> > that's optional
>> >  fdat <- eval(resp,
>> >  envir=dat); return(fdat) }
>> 
>> This is what I'll be using.  Thanks again!
>> 
>> [...]
>> 
>> Regards,
>> Andreas
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot time series data in wide format

2013-11-01 Thread Gary Dong
Dear R users,

I wonder if there is a way that I can plot a time series data which is in a
wide format like this:

CITY_NAME   2000Q12000Q2  2000Q32000Q4 2001Q1
2001Q2  2001Q3 2001Q4 2002Q1  2002Q2
CITY1100.5210   101.9667  103.24933   104.0506   104.4317
105.3921   106.7643   107.5202   107.2561   107.8184
CITY2100.0412   100.6146  103.20293   104.0867   104.6612
106.6126   109.3514   110.1943   110.9480   113.0071
CITY3 99.589599.2298   99.2694799.4101   100.5776
101.3719   101.5957   102.2411   103.4390   105.1745
CITY4 99.6491   101.5386  104.90953   106.1065   108.1785
110.6845   113.3746   114.1254   116.2121   119.1033
CITY5100.9828   103.6847  105.04793   106.5925   108.7437
110.5549   111.9343   112.6704   113.6201   115.3020

Ideally, each city of the five city is represented by a line in the plot.

Any suggestion is appreciated!

Thanks!
Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] computation of hessian matrix

2013-11-01 Thread Suzen, Mehmet
On 1 November 2013 11:06, IZHAK shabsogh  wrote:
> below is a code to compute hessian matrix , which i need to generate 29 
> number of different matrices for example first

You may consider using Numerical Derivatives package for that instead, see:
http://cran.r-project.org/web/packages/numDeriv/vignettes/Guide.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot time series data in wide format

2013-11-01 Thread Pete Brecknock
wudadan wrote
> Dear R users,
> 
> I wonder if there is a way that I can plot a time series data which is in
> a
> wide format like this:
> 
> CITY_NAME   2000Q12000Q2  2000Q32000Q4 2001Q1
> 2001Q2  2001Q3 2001Q4 2002Q1  2002Q2
> CITY1100.5210   101.9667  103.24933   104.0506   104.4317
> 105.3921   106.7643   107.5202   107.2561   107.8184
> CITY2100.0412   100.6146  103.20293   104.0867   104.6612
> 106.6126   109.3514   110.1943   110.9480   113.0071
> CITY3 99.589599.2298   99.2694799.4101   100.5776
> 101.3719   101.5957   102.2411   103.4390   105.1745
> CITY4 99.6491   101.5386  104.90953   106.1065   108.1785
> 110.6845   113.3746   114.1254   116.2121   119.1033
> CITY5100.9828   103.6847  105.04793   106.5925   108.7437
> 110.5549   111.9343   112.6704   113.6201   115.3020
> 
> Ideally, each city of the five city is represented by a line in the plot.
> 
> Any suggestion is appreciated!
> 
> Thanks!
> Gary
> 
>   [[alternative HTML version deleted]]
> 
> __

> R-help@

>  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

How about using the zoo package 

library(zoo)

# Read Data
text <- "CITY_NAME  2000Q1 2000Q2  2000Q3  2000Q4  2001Q1 2001Q2  2001Q3 
2001Q4  2002Q1 2002Q2 
CITY1 100.5210 101.9667 103.24933 104.0506 104.4317 105.3921 106.7643
107.5202 107.2561 107.8184 
CITY2 100.0412 100.6146 103.20293 104.0867 104.6612 106.6126 109.3514
110.1943 110.9480 113.0071 
CITY3  99.5895  99.2298  99.26947  99.4101 100.5776 101.3719 101.5957
102.2411 103.4390 105.1745 
CITY4  99.6491 101.5386 104.90953 106.1065 108.1785 110.6845 113.3746
114.1254 116.2121 119.1033 
CITY5 100.9828 103.6847 105.04793 106.5925 108.7437 110.5549 111.9343
112.6704 113.6201 115.3020"

df <- read.table(textConnection(text), header=TRUE, check.names=FALSE)

#Create zoo object
d <- t(df[,-1])
ind <- as.yearqtr(names(df)[-1]) 
z <- zoo(d,ind)

# Plot
plot(z, plot.type="single", col=1:5, lwd=2)
legend("topleft",legend=c("City1","City2","City3","City4","City5"),lty=1,
lwd=2, col=1:5)

HTH

Pete




--
View this message in context: 
http://r.789695.n4.nabble.com/plot-time-series-data-in-wide-format-tp4679589p4679591.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Installing RCurl: 'configure' exists but is not executable

2013-11-01 Thread Rainer Schuermann
I'm trying to install.packages( "RCurl" ) as root but get
ERROR: 'configure' exists but is not executable

I remember having had something like that before on another machine and tried 
in bash what is described here 
http://mazamascience.com/WorkingWithData/?p=1185
and helped me before:
# mkdir ~/tmp
# export TMPDIR=~/tmp
and added, just in case,
# chmod u+x $TMPDIR

which apparently does what it should
# ls -ld $TMPDIR
   
drwxrwxrwx 2 root root 4096 Nov  1 08:59 /root/tmp

but it doesn't help, I get the same error.

What else can I try?

Thanks in advance,
Rainer


> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing RCurl: 'configure' exists but is not executable

2013-11-01 Thread Michael Hannon
The error message doesn't seem to refer to the tmp directory.  What do you
get from:

ls -l `which curl-config`

-- Mike


On Fri, Nov 1, 2013 at 7:43 PM, Rainer Schuermann  wrote:

> I'm trying to install.packages( "RCurl" ) as root but get
> ERROR: 'configure' exists but is not executable
>
> I remember having had something like that before on another machine and
> tried in bash what is described here
> http://mazamascience.com/WorkingWithData/?p=1185
> and helped me before:
> # mkdir ~/tmp
> # export TMPDIR=~/tmp
> and added, just in case,
> # chmod u+x $TMPDIR
>
> which apparently does what it should
> # ls -ld $TMPDIR
> drwxrwxrwx 2 root root 4096 Nov  1 08:59 /root/tmp
>
> but it doesn't help, I get the same error.
>
> What else can I try?
>
> Thanks in advance,
> Rainer
>
>
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing RCurl: 'configure' exists but is not executable

2013-11-01 Thread Rainer Schuermann
#  ls -l `which curl-config`
-rwxr-xr-x 1 root root 6327 Oct 20 15:25 /usr/bin/curl-config



On Friday 01 November 2013 20:21:36 Michael Hannon wrote:
> The error message doesn't seem to refer to the tmp directory.  What do you
> get from:
> 
> ls -l `which curl-config`
> 
> -- Mike
> 
> 
> On Fri, Nov 1, 2013 at 7:43 PM, Rainer Schuermann  > wrote:
> 
> > I'm trying to install.packages( "RCurl" ) as root but get
> > ERROR: 'configure' exists but is not executable
> >
> > I remember having had something like that before on another machine and
> > tried in bash what is described here
> > http://mazamascience.com/WorkingWithData/?p=1185
> > and helped me before:
> > # mkdir ~/tmp
> > # export TMPDIR=~/tmp
> > and added, just in case,
> > # chmod u+x $TMPDIR
> >
> > which apparently does what it should
> > # ls -ld $TMPDIR
> > drwxrwxrwx 2 root root 4096 Nov  1 08:59 /root/tmp
> >
> > but it doesn't help, I get the same error.
> >
> > What else can I try?
> >
> > Thanks in advance,
> > Rainer
> >
> >
> > > sessionInfo()
> > R version 3.0.2 (2013-09-25)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] tools_3.0.2
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.