date:20130806

Re: [R] HiPLARb installation failed when magma-lib included

2013-08-06 Thread Clara Antón Fernández

Just in case it is helpful for someone else, this is what did the trick for me. 
I reinstalled CUDA 5.5 (and gcc g++ and build-essential, just in case), removed 
the previous installation of all the related libraries and reinstalled 
everything and HiPLARb works perfectly now. 
Clara

From: r-sig-hpc-boun...@r-project.org [r-sig-hpc-boun...@r-project.org] on 
behalf of Clara Antón Fernández [c...@skogoglandskap.no]
Sent: Saturday, August 03, 2013 14:41
To: r-sig-...@r-project.org
Subject: [R-sig-hpc] HiPLARb installation failed when magma-lib included

Hi all,
I am trying to install HiPLARb package but I am running into an error. Any help 
would be greatly appreciated.
I used the installer available from their website 
(http://www.hiplar.org/hiplar-b-installation.html) and followed their
instructions. The installer finished without showing warnings or errors, but 
HiPLARb is not installed when I try to load it in R. So, I tried to install 
HiPLARb package manually.
If I do
R CMD INSTALL --configure-args="--with-lapack=-L/home/caf/mylibs/lib\ 
-lopenblas  --with-plasma-lib=/home/caf/mylibs --with-cuda-home=/usr/local/cuda 
  " HiPLARb_0.1.3.tar.gz

the installation is successfull, but if I try to include magma

R CMD INSTALL --configure-args="--with-lapack=-L/home/caf/mylibs/lib\ 
-lopenblas  --with-magma-lib=/home/caf/mylibs 
--with-plasma-lib=/home/caf/mylibs   --with-cuda-home=/usr/local/cuda" 
HiPLARb_0.1.3.tar.gz

I get the error
Error in dyn.load(file, DLLpath = DLLpath, ...) :  unable to load shared object
'/home/caf/R.2.15.2-patched/lib/R/library/HiPLARb/libs/HiPLARb.so':
  /home/caf/mylibs/lib/libmagmablas.so: undefined symbol: cudaMemcpyFromSymbol

Info about the system
Ubuntu 12.04.2 LTS
NVIDIA Quadro K4000
CUDA 5.5
R 2.15.2, hwloc, MAGMA and PLASMA where both downloaded and installed
by HiPLARb installer

Any help much appreciated,
Clara

___
R-sig-hpc mailing list
r-sig-...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot base64decode string which is base64encode in R

2013-08-06 Thread Qiang Wang

Thanks for your Elaborative explanation. If I'm understanding correct. "ß"
belongs to those characters that CAN be interpreted by UTF-8. Others are
left as they are, such as, "\xe4" and "\xac". So the following code will
show an error message, but it won't affect the use of x?
x <- "\xe4"

I have a question maybe off the topic, but it bothered me much and can't
find the answer anywhere:
In R, how to add a null character to a string? Even just to store one null
character seems not possible:
x <- "\0". The question raised from a web api which requires submitted
strings to contain a null character.


On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann wrote:

> On Mon, 05 Aug 2013, Qiang Wang  writes:
>
> >> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann  >wrote:
> >>
> >>> On Fri, 02 Aug 2013, Qiang Wang  writes:
> >>>
> >>> > Hi,
> >>> >
> >>> > I'm struggling with encode/decode strings in R. Don't know why the
> second
> >>> > example below would fail. Thanks in advance for your help.
> >>> > succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x,
> "character")
> >>> > fail: s <- "safs" x <- base64encode(s) y <- base64decode(x,
> "character")
> >>> >
> >>>
> >>> And the first example works for you?
> >>>
> >>>   require("base64enc")
> >>>   s <- "saf"
> >>>   x <- base64encode(s)
> >>>
> >>> ## Error in file(what, "rb") : cannot open the connection
> >>> ## In addition: Warning message:
> >>> ## In file(what, "rb") : cannot open file 'saf': No such file or
> directory
> >>>
> >>> ?base64encode says that its first argument is
> >>>
> >>> "data to be encoded/decoded. For âbase64encodeâ it can be a raw
> >>>  vector, text connection or file name. For âbase64decodeâ it can 
> >>> be
> >>>  a string or a binary connection."
> >>>
> >>> Try this:
> >>>
> >>>   rawToChar(base64decode(base64encode(charToRaw("saf"
> >>>
> >>> ## [1] "saf"
> >>>
> >>> --
> >>> Enrico Schumann
> >>> Lucerne, Switzerland
> >> http://enricoschumann.net
> >>
> >
> > Thanks for your reply!
> >
> > Sorry I did not clarify that I was using base64encode and base64decode
> > functions provide from "caTools" package. It seems that if I convert the
> > string to the raw type first, it still solves my problem.
> >
> > My original problem actually is that I have a string:
> > secret <-
> >
> '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='
> >
> > It was claimed to be encoded in Base64. So I tried to decode it:
> >
> > require("base64enc")
> > rawToChar(base64decode(secret))
> >
> > Then, I got
> >
> "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ßl\xc9\xf8V\xcdqk6"
> >
> > But what I suppose to get is:
> >
> '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'
> >
> > Most part of the result is correct except several characters near the
> end.
> > I don't know where the problem is.
> >
>
> See the help page of 'rawToChar': the function transforms raw bytes into
> characters.  But, depending on your locale, one character may be more
> than one byte.  On my computer, with a UTF-8 locale (see my
> '?sessionInfo' below),
>
>   rawToChar(base64decode(secret), TRUE)
>
> gives me
>
>   ##  [1] "\xe4" "\xac" ".""\x83" "\xe0" "r""\xae"
>   ##  [8] "\xaf" "\xa2" "\x95" "B""\xcc" "\xcf" "r"
>   ## [15] "\001" "\017" "\x9b" "j""\xb8" "\xdb" "y"
>   ## [22] "\t"   "\xc7" "X""\x8b" "u""\xcf" "s"
>   ## [29] "\xc8" "\xe7" "+""\v"   "W""\xbc" "\x88"
>   ## [36] "\a"   "\xc3" "\xfb" "\xdc" "H""e""5"
>   ## [43] "T""\""   "(""\xe1" "\xbf" "\xce" "}"
>   ## [50] "\xc4" "C""\x9b" "\xbb" "\x87" "\xdf" "\x9f"
>   ## [57] "l""\xc9" "\xf8" "V""\xcd" "q""k"
>   ## [64] "6"
>
> That is, every *single* byte is converted into character.  For example:
>
>   rawToChar(base64decode(secret), TRUE)[55:56]
>
> gives
>
>   ## [1] "\xdf" "\x9f"
>
> which probably is what you expected.  But if I paste those two
> characters together,
>
>   paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")
>
> they will be shown like so:
>
>   ## [1] "ß"
>
> because this is how this byte pattern will be interpreted in UTF-8.
>
>
>
>
> Abbreviated 'sessionInfo':
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_GB.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_GB.UTF-8
>  [7] LC_PAPER=C LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>
>
> --
> Enrico Schumann
> Lucerne, Switzerland
> http://enricoschumann.net
>

[[alternative HTML version deleted]]

__

Re: [R] cannot base64decode string which is base64encode in R

2013-08-06 Thread Prof Brian Ripley


On 06/08/2013 08:34, Qiang Wang wrote:

Thanks for your Elaborative explanation. If I'm understanding correct. "ßŸ"
belongs to those characters that CAN be interpreted by UTF-8. Others are
left as they are, such as, "\xe4" and "\xac". So the following code will
show an error message, but it won't affect the use of x?
x <- "\xe4"

I have a question maybe off the topic, but it bothered me much and can't
find the answer anywhere:
In R, how to add a null character to a string? Even just to store one null
character seems not possible:
x <- "\0". The question raised from a web api which requires submitted
strings to contain a null character.


It is not possible.  Character strings in R cannot contain nuls (not 
nulls, sic).  Use raw vectors instead.


This is documented, so time to read some manuals 




On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann wrote:


On Mon, 05 Aug 2013, Qiang Wang  writes:


On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann 
wrote:



On Fri, 02 Aug 2013, Qiang Wang  writes:


Hi,

I'm struggling with encode/decode strings in R. Don't know why the

second

example below would fail. Thanks in advance for your help.
succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x,

"character")

fail: s <- "safs" x <- base64encode(s) y <- base64decode(x,

"character")




And the first example works for you?

   require("base64enc")
   s <- "saf"
   x <- base64encode(s)

## Error in file(what, "rb") : cannot open the connection
## In addition: Warning message:
## In file(what, "rb") : cannot open file 'saf': No such file or

directory


?base64encode says that its first argument is

 "data to be encoded/decoded. For â€˜base64encodeâ€™ it can be a raw
  vector, text connection or file name. For â€˜base64decodeâ€™ it can be
  a string or a binary connection."

Try this:

   rawToChar(base64decode(base64encode(charToRaw("saf"

## [1] "saf"

--
Enrico Schumann
Lucerne, Switzerland

http://enricoschumann.net



Thanks for your reply!

Sorry I did not clarify that I was using base64encode and base64decode
functions provide from "caTools" package. It seems that if I convert the
string to the raw type first, it still solves my problem.

My original problem actually is that I have a string:
secret <-


'5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='


It was claimed to be encoded in Base64. So I tried to decode it:

require("base64enc")
rawToChar(base64decode(secret))

Then, I got


"\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ßŸl\xc9\xf8V\xcdqk6"


But what I suppose to get is:


'\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'


Most part of the result is correct except several characters near the

end.

I don't know where the problem is.



See the help page of 'rawToChar': the function transforms raw bytes into
characters.  But, depending on your locale, one character may be more
than one byte.  On my computer, with a UTF-8 locale (see my
'?sessionInfo' below),

   rawToChar(base64decode(secret), TRUE)

gives me

   ##  [1] "\xe4" "\xac" ".""\x83" "\xe0" "r""\xae"
   ##  [8] "\xaf" "\xa2" "\x95" "B""\xcc" "\xcf" "r"
   ## [15] "\001" "\017" "\x9b" "j""\xb8" "\xdb" "y"
   ## [22] "\t"   "\xc7" "X""\x8b" "u""\xcf" "s"
   ## [29] "\xc8" "\xe7" "+""\v"   "W""\xbc" "\x88"
   ## [36] "\a"   "\xc3" "\xfb" "\xdc" "H""e""5"
   ## [43] "T""\""   "(""\xe1" "\xbf" "\xce" "}"
   ## [50] "\xc4" "C""\x9b" "\xbb" "\x87" "\xdf" "\x9f"
   ## [57] "l""\xc9" "\xf8" "V""\xcd" "q""k"
   ## [64] "6"

That is, every *single* byte is converted into character.  For example:

   rawToChar(base64decode(secret), TRUE)[55:56]

gives

   ## [1] "\xdf" "\x9f"

which probably is what you expected.  But if I paste those two
characters together,

   paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")

they will be shown like so:

   ## [1] "ßŸ"

because this is how this byte pattern will be interpreted in UTF-8.




Abbreviated 'sessionInfo':

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C



--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net



[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented,

[R] MDA

2013-08-06 Thread Tjun Kiat Teo

I am trying to use the package mda

And this is my command

mdfit<-mda(factor(forsen[,f]) ~ .,data=forsen[,-f],subclasses=sc)

But I keep getting this error message on a particular data set

Error in maxdist[l] <- x[l, i] :
  NAs are not allowed in subscripted assignments

Can anyone help ? Thanks


Regards

Tjun Kiat

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Horizontal bar plot for lengthy data

2013-08-06 Thread Jim Lemon


On 08/06/2013 07:01 AM, Christofer Bogaso wrote:

Hi David,

Thanks for your answer.

However I was thinking if it would be possible to have the Vertical-scroll
bar, so that user can scroll his screen while still having all the bars on
the plot clearly.

Is there any possibility?


Hi Christofer,
I had to solve a similar problem when I had to represent the movement of 
a pointer over an experimental session in which there could be a few 
thousand records of the pointer position. I output the plot as a very 
wide PNG graphic and you can then zoom in and scroll through this either 
in an image display program or in a web browser. You can do the same 
thing with Postscript or PDF graphics.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] timereg

2013-08-06 Thread Uwe Ligges




On 05.08.2013 20:27, Silvano wrote:

Hi,

I tried to fit a model using a timecox function in R version 3.0.0,
using sTRACE data  present in survival package.

If I use the 2.9.0 version, I don't have problems, but in 3.0.0 version,
I get the following error message:


out <- timecox(Surv(time/365, status==9) ~ age+sex+diabetes+chf+vf,
sTRACE, max.time=7, n.sim=500)


Error in .C("OStimecox", as.double(times), as.integer(Ntimes),
as.double(designX),  :
  "OStimecox" not available for .C() for package "timereg"

I don't know what's happend.


Perhaps an old package that was not built for R-3.0.0 or some old code 
in your wokrspace?


Uwe Ligges



I would like use 3.0.0 version, but I don't know what's wrong. Somebody
help me?


Also, when I use the aalen model I get differents results in 3.0.0 and
2.9.0 versions. Why?

Version 3.0.0:


fit1.semi <- aalen(Surv(time/365, status==9)~age+sex+diabetes+chf+vf,
sTRACE,max.time=7,n.sim=500)
summary(fit1.semi)


Additive Aalen Model

Test for nonparametric terms

Test for non-significant effects
Supremum-test of significance p-value H_0: B(t)=0
(Intercept)  7.29 0.000
age  8.63 0.000
sex  2.95 0.060
diabetes 2.31 0.240
chf  5.30 0.000
vf   2.95 0.042

Test for time invariant effects
  Kolmogorov-Smirnov test p-value H_0:constant effect
(Intercept)   0.68600 0.006
age   0.00934 0.008
sex   0.16900 0.078
diabetes  0.22100 0.184
chf   0.14800 0.176
vf0.46100 0.008


in 2.9.0 version:


fit1.semi <- aalen(Surv(time/365, status==9)~age+sex+diabetes+chf+vf,
sTRACE,max.time=7,n.sim=500)
summary(fit1.semi)

Additive Aalen Model

Test for nonparametric terms

Test for non-significant effects
sup|  hat B(t)/SD(t) | p-value H_0: B(t)=0
(Intercept)   7.29   0.000
age   8.63   0.000
sex   2.95   0.052
diabetes  2.31   0.246
chf   5.30   0.000
vf2.95   0.026

Test for time invariant effects
sup| B(t) - (t/tau)B(tau)| p-value H_0: B(t)=b t
(Intercept)0.68600 0.004
age0.00934 0.004
sex0.16900 0.084
diabetes   0.22100 0.208
chf0.14800 0.158
vf 0.46100 0.004

part of them...


Thanks a lot,

--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Compiling a C++ file in RStudio?

2013-08-06 Thread jpm miao

Hi,

   I am wondering if C++ programs could be compiled in RStudio. I search on
the web, and I find many yes's and no's. It looks like Rcpp can do it (or
partially?). I just wonder if it can replace the C++ IDE, e.g., Eclipse or
Visual Studio since RStudio is much easier to use.

   One says yes:

http://learndataanalysis.com/you-can-now-source-c-file-rstudio

   Thanks,

Miao

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compiling a C++ file in RStudio?

2013-08-06 Thread Berend Hasselman

On 06-08-2013, at 11:14, jpm miao  wrote:

> Hi,
> 
>   I am wondering if C++ programs could be compiled in RStudio. I search on
> the web, and I find many yes's and no's. It looks like Rcpp can do it (or
> partially?). I just wonder if it can replace the C++ IDE, e.g., Eclipse or
> Visual Studio since RStudio is much easier to use.
> 

This does not belong on the R-help mailinglist.
Ask on the Rstudio forums.

Berend

>   One says yes:
> 
> http://learndataanalysis.com/you-can-now-source-c-file-rstudio
> 
>   Thanks,
> 
> Miao
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R plot

2013-08-06 Thread Mª Teresa Martinez Soriano

Hi to everyone, first of all, thanks hor this excellent service.

I have a doubt in R, it looks like:

I want to get a plot of my data.frame, but I have used the funtion split in 
this data.frame and I 

don't know if there exist some function which could help me, I was using for 
loop. The problem is 

that I get a plot with all the datas together and I want a plot for each 
data.frame i get after using 

split function.

x<-split(cast1,cast1$SECTOR)
y<-split(cast2,cast2$SECTOR)


for (i in 1:length(a))
{
datos[[i]]<-y[[i]]
}



plot(datos[[2]][,6], type="l", main= "SECTOR" )
for(j in 1: length(datos))for(i in 6:13)(lines(datos[[j]][,i], type="l", col=i))

What can I do in order to get this plot separately?

I have tried with this:

apply(datos,function(x)lines(x[[]][,i], type='l',col=i))

but I get nothing, I have no idea about using apply, mapply..functions with plot


Thanks in advance 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem installing RDCOMClient on Windows 7

2013-08-06 Thread Kishor Tappita

Dear Uwe,

Thank you for your repsonse. I installed the latest version of R
(R-3.0.1) . I get the below errors when I try to install RDCOMClient.

g++ -m32 -I"D:/R-3.0.1/include" -DNDEBUG -D_GNU_ -DNO_PYCOM_IPROVIDECLASSINFO -
.-I"d:/RCompile/CRANpkg/extralibs64/local/include"  -Wno-deprecated-O2
Wall  -mtune=core2 -c connect.cpp -o connect.o
In file included from d:\mingw\bin\../lib/gcc/mingw32/4.6.2/include/c++/cstdio:
4:0,
 from D:/R-3.0.1/include/Rinternals.h:25,
 from D:/R-3.0.1/include/Rdefines.h:29,
 from RUtils.h:1,
 from connect.cpp:20:
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:573:9:
error: 'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:574:9:
error:'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:575:9:
error:'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:586:9:
error:'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:587:9: error:
'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:589:9: error:
'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:591:9: error:
'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:632:9: error:
'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:633:9: error:
'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:638:9: error:
'wint_t' does not name a type
d:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/stdio.h:639:9: error:
'wint_t' does not name a type
make: *** [connect.o] Error 1
ERROR: compilation failed for package 'RDCOMClient'
* removing 'D:/R-3.0.1/library/RDCOMClient'



> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
>

Thanks,
Regards,
Kishor



On Mon, Aug 5, 2013 at 9:03 PM, Uwe Ligges
 wrote:
> Please try with a recent version of R. Yours is 6 major updates behind
>
> Best,
> Uwe Ligges
>
>
>
> On 05.08.2013 16:11, Kishor Tappita wrote:
>>
>> Dear R-Users,
>>
>> I am trying to install RDCOMClient package as it is a dependency for
>> installing excel.link package. I get the below error while trying to
>> install RDCOMClient on 64-bit Windows 7 operating system.
>>
>>
>> g++ -I"D:/R-2.10.0/include" -D_GNU_ -DNO_PYCOM_IPROVIDECLASSINFO -I.
>>   -Wno-deprecated-O2 -Wall  -c RUtils.c -o RUtils.o
>>
>> RUtils.c: In function 'SEXPREC* R_createRCOMUnknownObject(void*, const
>> char*)':
>> RUtils.c:151:52: error: invalid conversion from 'int' to 'Rboolean'
>> [-fpermissive]
>> D:/R-2.10.0/include/Rinternals.h:692:6: error:   initializing argument
>> 3 of 'void R_RegisterCFinalizerEx(SEXP, R_CFinalizer_t, Rboolean)'
>> [-fpermissive]
>> RUtils.c: In function 'Rboolean ISSInstanceOf(SEXP, const char*)':
>> RUtils.c:245:26: error: invalid conversion from 'int' to 'Rboolean'
>> [-fpermissive]
>>
>> make: *** [RUtils.o] Error 1
>>
>> Please help me to resolve this issue. RDCOMClient_0.93-0.tar.gz is the
>> source file that I am using. Below is my session information.
>>
>>> sessionInfo()
>>
>> R version 2.10.0 (2009-10-26)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> Thanks,
>> Kishor
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.table.ffdf and fixed width files

2013-08-06 Thread christian.kamenik

Dear all

I am working on Windows 7 32-bit, and the ff- package is my daily life-saver to 
overcome the inherent memory limitations. Recently, I tried using 
read.table.ffdf to import data from a fixed-width ASCII file (file size: 
1'440'865'015 Bytes) with 6'079'455 lines and 32 variables using the command
read.table.ffdf(file=my.filename, FUN="read.fwf", width=my.format, 
asffdf_args=list(col_args=list(pattern = my.pattern))

The command generates a temporary file, which has 1'629'328'120 Bytes, plus 32 
ff files following my.pattern. The latter 32 files, however, only take up 
136'000 Bytes. And the resulting R object has a dimension of 1000 x 32. To me, 
it seems that read.table.ffdf aborts the data import after 1000 lines, instead 
of importing the entire file.

I tried running read.table.ffdf with different parameter settings, I was 
browsing the help pages and the mailing lists, but I did not find any hint on 
why read.table.ffdf aborts the data import. (Does it really? - The file size of 
the temporary file suggests that all data were read.)

Any help would be highly appreciated

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R plot

2013-08-06 Thread Jose Iparraguirre

Hola Maria Teresa!

I'd need some data to fully understand your problem, but as far as I can tell 
you are overdoing things.
Again, without the data I can't help you but I'd use the ggplot2 package and 
forget about splitting the data into vectors and thus creating additional 
objects which look redundant if they are only meant for plotting.
I'd suggest you should do include some of your data and re-send the request.
Regards,

José

Prof. José Iparraguirre
Chief Economist
Age UK



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mª Teresa Martinez Soriano
Sent: 06 August 2013 08:43
To: r-help@r-project.org
Subject: [R] R plot

Hi to everyone, first of all, thanks hor this excellent service.

I have a doubt in R, it looks like:

I want to get a plot of my data.frame, but I have used the funtion split in 
this data.frame and I 

don't know if there exist some function which could help me, I was using for 
loop. The problem is 

that I get a plot with all the datas together and I want a plot for each 
data.frame i get after using 

split function.

x<-split(cast1,cast1$SECTOR)
y<-split(cast2,cast2$SECTOR)


for (i in 1:length(a))
{
datos[[i]]<-y[[i]]
}



plot(datos[[2]][,6], type="l", main= "SECTOR" ) for(j in 1: length(datos))for(i 
in 6:13)(lines(datos[[j]][,i], type="l", col=i))

What can I do in order to get this plot separately?

I have tried with this:

apply(datos,function(x)lines(x[[]][,i], type='l',col=i))

but I get nothing, I have no idea about using apply, mapply..functions with plot


Thanks in advance 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

The Wireless from Age UK | Radio for grown-ups.

www.ageuk.org.uk/thewireless


If you’re looking for a radio station that offers real variety, tune in to The 
Wireless from Age UK. 
Whether you choose to listen through the website at 
www.ageuk.org.uk/thewireless, on digital radio (currently available in London 
and Yorkshire) or through our TuneIn Radio app, you can look forward to an 
inspiring mix of music, conversation and useful information 24 hours a day.



 
---
Age UK is a registered charity and company limited by guarantee, (registered 
charity number 1128267, registered company number 6825798). 
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
Representative of Age UK Enterprises Limited, Age UK is an Introducer 
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
Access for the purposes of introducing potential annuity and health 
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
Solutions Limited and Simplyhealth Access are all authorised and 
regulated by the Financial Services Authority. 
--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. If you receive a message in error, please advise the sender and 
delete immediately.

Except where this email is sent in the usual course of our business, any 
opinions expressed in this email are those of the author and do not 
necessarily reflect the opinions of Age UK or its subsidiaries and associated 
companies. Age UK monitors all e-mail transmissions passing 
through its network and may block or modify mails which are deemed to be 
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 
272786) and their trading and other associated companies merged 
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to 
improving the lives of people in later life.  The three national 
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help 
the Aged in these nations to form three registered charities: 
Age Scotland, Age NI, Age Cymru.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] breaks argument in heatmap.2 doesn't do what it should I think

2013-08-06 Thread Witold E Wolski

I do set the breaks parameter in heatmap.2
I would expect that the color.key and the histogram (the thing in the
top left of the plot) are aligned.
Just that everyone can reproduce the problem:


mypalette<-brewer.pal(11,"RdYlBu")
ddd <- rnorm(400,0,0.1)
mdd <- matrix(ddd,ncol=50)
hm <- heatmap.2(mdd,col=mypalette,scale="row") #this is used to
produce the breaks, (and here it works fine)

mdd <- t(scale(t(mdd)))
heatmap.2(mdd,col=mypalette,scale="none",breaks=hm$breaks)

# take a look and you will see that the colors are not aligned with the breaks
although the documentation states
breaks: (optional) Either a numeric vector indicating the splitting
  points for binning ‘x’ into colors, or a integer number of
  break points to be used, in which case the break points will
  be spaced equally between ‘min(x)’ and ‘max(x)’.



-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] model.frame.default variable length differ in coxph

2013-08-06 Thread E Joffe

Hello,

 

I am trying to run a  coxph model but get an error

 
Error in model.frame.default(formula = Surv(time, status) ~
selectedVarnames,  : 
  variable lengths differ (found for 'selectedVarnames')

 

Of note the dataset is generated as part of using the glmnet for Lasso
regularization.

Glmnet takes in as input a matrix where all categorical variables have been
converted to binary factors with dummies.

As a result I have had to do some data manipulations that are probably the
origin of the error.

Original dataframe -> transform categorical to binary -> transform to matrix
-> run glment -> get best features -> selected features + status + time ->
transform to dataframe -> run coxph

 

The coxph code (entire code below):

glmnet.cox <- coxph(Surv(time,status) ~ selectedVarnames,
data=reformat_dataSet, init=selectedBeta,iter=200)

 

 

Here is a description of the data (I can't attach the real thing as it is
protected health data):

 

selectedVar: 394X81 Double Matrix

reformat_dataSet: 394 obs. 83 varaibles (dataframe including 59 predicting
variables, time and status)

selectedBeta: numeric [81]

selectedVarnames:
"FABM0+FABM1+FABM2+FABM4+FABM4EOS+SEXF+SEXM+Age_at_Dx+Performance_Status0+Pe
rformance_Status2+Performance_Status3+Performance_Status4+AHD_typeabnormal
counts_MDS+AHD_typeALL_MDS+AHD_typecytopenia+AHD_typelow PLT+AHD_typelow
white
count+AHD_typeMPD+AHD_typepancytopenia+PRIOR_MALNO+PRIOR_MALYES+PRIOR_CHEMON
O+PRIOR_XRTNO+PRIOR_XRTYES+cyto_grpUNFAVORABLE+Cyto5+Cyto7+Cyto8+Cyto10+Cyto
11+Cyto12+ITDPOS+D835POS+ABS_BLST+BM_PROM+PB_MONO+PB_PROM+HGB+PLT+ALBUMIN+CR
EATININE+CD13+CD34+CD19+Fresh_or__CryoCryo+AKT1_2_3_pT308+ARC+ATF3+BECN1+BIR
C2+CAV1+CDKN2A+CTNNB1+FOXO3_S318_321+GAB2+GRP78+IGFBP2+INPP5D+JMJD6+JUN_pS73
+KIT+LYN+MAP2K1_2_pS217_221+NRP1+PIK3CA+PLAC1+PRKCA+PRKCD_pT507+PTEN_pS380T3
82T383+RAC1_2_3+RPS6KB1+SMAD1+SMAD2_pS245+SMAD5+SMAD5_pS463+STAT3+STMN1+TP53
+TRIM62+VASP+XPO1"

dataset: 394 obs. 270 variables

predict_matrix: 394X392 double matrix

 

 

THANK YOU!!!

 

 

 

The entire code (without the data though):

library("survival")

library("pec")

library ("glmnet")

library ("peperr")

library ("Hmisc")

 

cIndexCoxglmnet <- list()

for (i in 1:50){

  train <- sample(1:nrow(dataset), nrow(dataset), replace = TRUE) ## random
sampling with replacement

  trainSet <-dataset [train,] 

  testSet<-dataset [-train,]

  cat ("\n","ITERATION:",i,"\n")

  

  #creat Y (survival matrix) for glmnet

  surv_obj <- Surv(trainSet$time,trainSet$status) 

  

  

  ## tranform categorical variables into binary variables with dummy for
trainSet

  predict_matrix <- model.matrix(~ ., data=trainSet, 

 contrasts.arg = lapply
(trainSet[,sapply(trainSet, is.factor)], contrasts, contrasts=FALSE))

  

  ## remove the statu/time variables from the predictor matrix (x) for
glmnet

  predict_matrix <- subset (predict_matrix, select=c(-time,-status))

  

  ## create a glmnet cox object using lasso regularization and cross
validation

  glmnet.cv <- cv.glmnet (predict_matrix, surv_obj, family="cox")

  

  ## get the glmnet model on the full dataset

  glmnet.obj <- glmnet.cv$glmnet.fit

  

  # find lambda index for the models with least partial likelihood deviance
(by cv.glmnet) 

  optimal.lambda <- glmnet.cv$lambda.min# For a more parsimoneous model
use lambda.1se 

  lambda.index <- which(glmnet.obj$lambda==optimal.lambda) 

  

  

  # take beta for optimal lambda 

  optimal.beta  <- glmnet.obj$beta[,lambda.index] 

  

  # find non zero beta coef 

  nonzero.coef <- abs(optimal.beta)>0 

  selectedBeta <- optimal.beta[nonzero.coef] 

  

  # take only covariates for which beta is not zero 

  selectedVar   <- predict_matrix[,nonzero.coef] 

  

  # create a dataframe for trainSet with time, status and selected variables
in binary representation for evaluation in pec

  reformat_dataSet <- as.data.frame(cbind(surv_obj,selectedVar))

  

  # names of selectedVars

  selectedVarnames<-paste(colnames(selectedVar),collapse="+")

  

  # create coxph object with pre-defined coefficients 

  glmnet.cox <- coxph(Surv(time,status) ~ selectedVarnames,
data=reformat_dataSet, init=selectedBeta,iter=200)

  

  

  ## create datasets based on the testSet fit for glmnet models for testing
in pec function

  ## !!!encountered tech problems in variables selection for unknown reason
so the code is somewhat redundant !!!

  

  #creat Y (survival matrix) for glmnet

  surv_obj_test <- Surv(testSet$time,testSet$status)

  

  ## tranform categorical variables into binary variables with dummy for
testSet

  predict_matrix_test <- model.matrix(~ ., data=testSet, 

  contrasts.arg = lapply
(testSet[,sapply(testSet, is.factor)], contrasts, contrasts=FALSE))

  

  ## remove the statu/time variables from the predictor matrix (x) for
glmnet

  predict_matrix_test <- subset (predict_matrix_test,
select=c(-time,-status))

  

  ## remove the

Re: [R] breaks argument in heatmap.2 doesn't do what it should I think

2013-08-06 Thread Adams, Jean

If you look at the help for heatmap.2 (in the gplots package) you will see
the following explanation for breaks

   # mapping data to colors
   breaks,
   symbreaks=min(x < 0, na.rm=TRUE) || scale!="none",

If you run your code with the second call to the heatmap.2() function with
the argument scale="row" instead of scale="none" the colors are aligned as
in your first call to heatmap.2().

mdd <- t(scale(t(mdd)))
heatmap.2(mdd,col=mypalette, scale="row", breaks=hm$breaks)

Does that help?

Jean



On Tue, Aug 6, 2013 at 6:27 AM, Witold E Wolski  wrote:

> I do set the breaks parameter in heatmap.2
> I would expect that the color.key and the histogram (the thing in the
> top left of the plot) are aligned.
> Just that everyone can reproduce the problem:
>
>
> mypalette<-brewer.pal(11,"RdYlBu")
> ddd <- rnorm(400,0,0.1)
> mdd <- matrix(ddd,ncol=50)
> hm <- heatmap.2(mdd,col=mypalette,scale="row") #this is used to
> produce the breaks, (and here it works fine)
>
> mdd <- t(scale(t(mdd)))
> heatmap.2(mdd,col=mypalette,scale="none",breaks=hm$breaks)
>
> # take a look and you will see that the colors are not aligned with the
> breaks
> although the documentation states
> breaks: (optional) Either a numeric vector indicating the splitting
>   points for binning x into colors, or a integer number of
>   break points to be used, in which case the break points will
>   be spaced equally between min(x) and max(x).
>
>
>
> --
> Witold Eryk Wolski
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Distribution Fitting with R

2013-08-06 Thread S Ellison

> I want to actually fit a formal 
> statistical distribution to my data using the classical 
> methodologies (either Shapiro Wilk, Anderson Darling, 
> Chi-Square, etc.). 

You probably need to go find out what those things do. Two of them are tests 
for normality, not fitting methods or distributions, and will not give you 
distributions to simulate from.

S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] algorithm for clustering categorical data

2013-08-06 Thread Li, Yan

H David and other R helpers,

If I rescale the numerical fields to [0,1] and represent the categorical fields 
to 1:k, which is the same starting point as Gower's measure, but I use 
Euclidean distance instead of Gower's distance to do k-means clustering. How 
much is the difference? What is the draw back? 

Thanks you,
Yan

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu] 
Sent: Thursday, August 01, 2013 12:08 PM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You didn't give us any 
information about how you numerically transformed the categorical variables, 
but the usual approach is to create indicator variables that code 
presence/absence for each category within a categorical variable. Different 
variances between variables can be reduced by standardizing the variables.

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help@r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical variables? R 
packages? Which is the best?

If a data has both numeric and categorical data, what is the best clustering 
algorithm to use and R package?

I tried numeric transformation of all categorical fields  and doing clustering 
afterwards. But the transformed fields have values from 1...10, and my other 
fields is in a bigger scale:
1-...This will make the categorical fields has less effect on the distance 
calculation...

Thank you!
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help.start

2013-08-06 Thread Witold E Wolski

Does anyone also observes with R 3.1 (on linux) that the help.start
function frequently blocks the R session (never returns) ?
Can I do anything or just waith for R3.1.1?




-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help.start

2013-08-06 Thread Martin Maechler

> Witold E Wolski 
> on Tue, 6 Aug 2013 16:40:42 +0200 writes:

> Does anyone also observes with R 3.1 (on linux) that the help.start

there's no R 3.1 .  What do you mean really?

R 3.0.1 ?

or do you mean the current "R under development"
which (quite rarely) also shows as 3.1.0 and will be changed
quite a bit before eventually be released as 3.1.0  in 2014 ??

Witold, reallly, you should know better,
I think 

> function frequently blocks the R session (never returns) ?
> Can I do anything or just waith for R3.1.1?

> -- 
> Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] algorithm for clustering categorical data

2013-08-06 Thread David Carlson

What do you mean by representing the categorical fields by 1:k?

a <- c("red", "green", "blue", "orange", "yellow")

becomes

a <- c(1, 2, 3, 4, 5)

That guarantees your results are worthless unless your categories
have an inherent order (e.g. tiny, small, medium, big, giant).
Otherwise it should be four (k-1) indicator/dummy variables (e.g.):

a.red <- c(1, 0, 0, 0, 0)
a.green <- c(0, 1, 0, 0, 0)
a.blue <- c(0, 0, 1, 0, 0)
a.orange <- c(0, 0, 0, 1, 0)

Then you can use Euclidean distance.

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: Li, Yan [mailto:yan...@ibi.com] 
Sent: Tuesday, August 6, 2013 9:36 AM
To: dcarl...@tamu.edu; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

H David and other R helpers,

If I rescale the numerical fields to [0,1] and represent the
categorical fields to 1:k, which is the same starting point as
Gower's measure, but I use Euclidean distance instead of Gower's
distance to do k-means clustering. How much is the difference? What
is the draw back? 

Thanks you,
Yan

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu] 
Sent: Thursday, August 01, 2013 12:08 PM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You didn't
give us any information about how you numerically transformed the
categorical variables, but the usual approach is to create indicator
variables that code presence/absence for each category within a
categorical variable. Different variances between variables can be
reduced by standardizing the variables.

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help@r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical
variables? R packages? Which is the best?

If a data has both numeric and categorical data, what is the best
clustering algorithm to use and R package?

I tried numeric transformation of all categorical fields  and doing
clustering afterwards. But the transformed fields have values from
1...10, and my other fields is in a bigger scale:
1-...This will make the categorical fields has less effect on
the distance calculation...

Thank you!
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] algorithm for clustering categorical data

2013-08-06 Thread Martin Maechler

> "DC" == David Carlson 
> on Tue, 6 Aug 2013 10:26:56 -0500 writes:

> What do you mean by representing the categorical fields by 1:k?
> a <- c("red", "green", "blue", "orange", "yellow")

> becomes

> a <- c(1, 2, 3, 4, 5)

> That guarantees your results are worthless 
worthless indeed!

> unless your categories
> have an inherent order (e.g. tiny, small, medium, big, giant).
> Otherwise it should be four (k-1) indicator/dummy variables (e.g.):

> a.red <- c(1, 0, 0, 0, 0)
> a.green <- c(0, 1, 0, 0, 0)
> a.blue <- c(0, 0, 1, 0, 0)
> a.orange <- c(0, 0, 0, 1, 0)

> Then you can use Euclidean distance.

Yes, ... or use Gower's or other similarly sophisticated
distances, as you (David) mentioned earlier in this thread.

Do also note that a generalized Gower's distance (+ weighting of
variables) is available from the ('recommended' hence always
installed) package 'cluster' :

  require("cluster")
  ?daisy
  ## notably  daisy(*,  metric="gower")

Note that daisy() is more sophisticated than most users know, 
using the 'type = *' specification allowing, notably for binary
variables (as your a. dummies above) allowing asymmetric
behavior which maybe quite important in "rare event" and similar
cases.

Martin


> -
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77840-4352


> -Original Message-
> From: Li, Yan [mailto:yan...@ibi.com] 
> Sent: Tuesday, August 6, 2013 9:36 AM
> To: dcarl...@tamu.edu; r-help@r-project.org
> Subject: RE: [R] algorithm for clustering categorical data

> H David and other R helpers,

> If I rescale the numerical fields to [0,1] and represent the
> categorical fields to 1:k, which is the same starting point as
> Gower's measure, but I use Euclidean distance instead of Gower's
> distance to do k-means clustering. How much is the difference? What
> is the draw back? 

> Thanks you,
> Yan

> -Original Message-
> From: David Carlson [mailto:dcarl...@tamu.edu] 
> Sent: Thursday, August 01, 2013 12:08 PM
> To: Li, Yan; r-help@r-project.org
> Subject: RE: [R] algorithm for clustering categorical data

> Read up on Gower's Distance measures (available in the ecodist
> package) which can combine numeric and categorical data. You didn't
> give us any information about how you numerically transformed the
> categorical variables, but the usual approach is to create indicator
> variables that code presence/absence for each category within a
> categorical variable. Different variances between variables can be
> reduced by standardizing the variables.

> -
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77840-4352

> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
> Sent: Thursday, August 1, 2013 11:00 AM
> To: r-help@r-project.org
> Subject: [R] algorithm for clustering categorical data

> Hi All,

> Does anyone know what algorithm for clustering categorical
> variables? R packages? Which is the best?

> If a data has both numeric and categorical data, what is the best
> clustering algorithm to use and R package?

> I tried numeric transformation of all categorical fields  and doing
> clustering afterwards. But the transformed fields have values from
> 1...10, and my other fields is in a bigger scale:
> 1-...This will make the categorical fields has less effect on
> the distance calculation...

> Thank you!
> Yan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] creating a quantile variable based on subsets of a dataframe

2013-08-06 Thread Gavin Rudge

#some sample data:
library(Hmisc)
set.seed(33)
df<-data.frame(x=(sample(letters[1:10],1000,replace=TRUE)),y=rnorm(1000,mean=20,sd=15))

x is a category from a to J, say a geographical area, into which an observation 
y falls, y being a score.  Now if I want to put my score into quantiles 
(quintiles in this case) across the whole population of observations and then 
make a quintile variable I do the following:

#make a quintile variable
df<- within(df,z<-as.integer(cut2(y,quantile(y,probs=seq(0,1,0.2)

I'm using cut2 here as I want the extremes of my ranges to be included in the 
upper and lower bins.

So far so good, but I would also like another variable to indicate the quintile 
of the score within the areas indicated by the x variable, so all of the scores 
where x=a, binned into quintiles for area a, the same for scores in areas b, c 
and so on.

I see that I could put my quintile variable code into a function and then split 
my data frame by x, apply the function in each of the ten groups and stitch the 
whole thing back together again (not sure I could write it though), but is 
there a much simpler solution?

Thanks,

GavinR

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a quantile variable based on subsets of a dataframe

2013-08-06 Thread David Winsemius


On Aug 6, 2013, at 10:17 AM, Gavin Rudge wrote:

> #some sample data:
> library(Hmisc)
> set.seed(33)
> df<-data.frame(x=(sample(letters[1:10],1000,replace=TRUE)),y=rnorm(1000,mean=20,sd=15))
> 
> x is a category from a to J, say a geographical area, into which an 
> observation y falls, y being a score.  Now if I want to put my score into 
> quantiles (quintiles in this case) across the whole population of 
> observations and then make a quintile variable I do the following:
> 
> #make a quintile variable
> df<- within(df,z<-as.integer(cut2(y,quantile(y,probs=seq(0,1,0.2)
> 
> I'm using cut2 here as I want the extremes of my ranges to be included in the 
> upper and lower bins.
> 
> So far so good, but I would also like another variable to indicate the 
> quintile of the score within the areas indicated by the x variable, so all of 
> the scores where x=a, binned into quintiles for area a, the same for scores 
> in areas b, c and so on.
> 
> I see that I could put my quintile variable code into a function and then 
> split my data frame by x, apply the function in each of the ten groups and 
> stitch the whole thing back together again (not sure I could write it 
> though), but is there a much simpler solution?
> 

Generally questions involving the distribution of a single variate grouped 
within categories where the desired result is as long as the original variate 
are well handled with th `ave` function:

> df$c2.grp <- ave(df$y, df$x, FUN=function(z) cut2(z, 
> quantile(z,probs=seq(0,1,0.2)) ) )
> str(df)
'data.frame':   1000 obs. of  3 variables:
 $ x : Factor w/ 10 levels "a","b","c","d",..: 5 4 5 10 9 6 5 4 1 2 ...
 $ y : num  15 45.3 29.9 45.2 23.3 ...
 $ c2.grp: num  2 5 4 5 3 4 2 4 3 2 ...

I was a bit surprised that the resulting column in df was numeric rather than 
factor, but I suspect it was the fact that the levels of the intra-groups 
splits could not be reconciled. You didn't apparently consider that issue in 
your problem specification. The result could be "cleaned up" with:

> df$c2.grp <- factor(df$c2.grp, labels=paste0("Q", 1:5) )
> with(df, table(x, c2.grp))
   c2.grp
x   Q1 Q2 Q3 Q4 Q5
  a 22 23 22 22 22
  b 19 19 18 19 19
  c 21 20 20 20 21
  d 20 19 19 19 20
  e 19 20 21 20 20
  f 21 21 21 21 22
  g 21 21 21 21 22
  h 19 19 19 19 19
  i 18 18 17 18 18
  j 20 20 19 20 20
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with mahal function Package dismo

2013-08-06 Thread Ernesto Villarino

Hi all,
I want to apply mahal function using data.frame instead of raster data
but I am having problems (see error message below). I want to use
data.frame since we have seasonal data (the species distribute
differently as a function of months).

> head (predictor)

   OCPT x1XM zPc pHxM  MLD
38 21.23519 36.24476 -3164  8.836913 8.082310 68.09159
39 21.13811 36.25013 -2487  8.451318 8.077561 57.78384
40 21.03920 36.25259 -2025  8.132195 8.073292 62.59614
41 20.94312 36.25257 -3409  7.851401 8.069450 55.83329
79 21.22135 36.10911   -40 18.707443 8.108031 42.55479
80 21.14884 36.13638 -2800 21.133693 8.063561 64.28003

> head (Cfin)

   Lat Long
38  35  -38
39  35  -37
40  35  -36
41  35  -35
79  36  -75
80  36  -74

> mm<-mahal (predictor,Cfin)
Error en (function (classes, fdef, mtable)  :
  unable to find an inherited method for function 'mahal' for signature
'"data.frame", "data.frame"'

Can you help me ??
Thanks,
Regards,
Ernesto

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] laf_open_fwf

2013-08-06 Thread christian.kamenik

Dear all

I was trying the (fairly new) LaF package, and came across the following 
problem:

I opened a connection to a fixed width ASCII file using
laf_open_fwf(my.filename, my.column_types, my.column_widths, my.column_names)

When looking at the data, it turned out that \n (newline) and \r (carriage 
return) were considered as characters, thus destroying the structure in my data 
(the second column does not include any numbers):

> my.data[1565:1575,1:3]

   MF_FARZ1  Fahrzeugarttext MF_MARKE
1 \n043 Landwirt. Traktor2140
2 \n043 Landwirt. Traktor6206
3 \n001 Personenwagen2026
4 \n001 Personenwagen2026
5\r\n00 1Personenwagen404
6\r\n02 0Gesellschaftswagen   710
7\r\n00 1Personenwagen505
8\r\n00 1Personenwagen505
9\r\n00 1Personenwagen301
10   \r\n00 1Personenwagen553
11   \r\n04 3Landwirt. Traktor257

I am working on Windows 7 32-bit.

Any help would be highly appreciated.

Best Regard

Christian Kamenik
Project Manager

Federal Department of the Environment, Transport, Energy and Communications 
DETEC
Federal Roads Office FEDRO
Division Road Traffic
Road Accident Statistics

Mailing Address: 3003 Bern
Location: Weltpoststrasse 5, 3015 Bern

Tel +41 31 323 14 89
Fax +41 31 323 43 21

christian.kame...@astra.admin.ch
www.astra.admin.ch


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R help

2013-08-06 Thread Mª Teresa Martinez Soriano

Hi everyone, I'm sorry for my questions, I'm sure they are totally stupig, but 
I am completely new in this program  and I am facing this "danger" alone 

I  have done imputation for one part of my data set, however I am not able for 
doing in general.
this is part of my data set (cast2)
cast2[1:30,]X. Fecha1 Fecha2 CEES.NUMERO SECTOR IE.2003 IE.2004 
IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010 rS2   15 17/05/1999 10/02/2011  
  7420   APCT 173 125 155  74  NA  NA  13  
NA  35   23 27/06/1998 18/06/20134941 TA135818032115
23902506232020082007  011  58  4/12/1997 18/06/20134772 
   CRV  93 179 221 196 297 191 126 112  015  87 
30/09/2004 18/06/20134121  C  NA  31 3901246
37621430  NA  NA  316  94  1/03/2006 18/06/20134121  C  
NA  NA  NA 212 513 706 202 127  317  97 
20/12/2005 18/06/20134110  C  NA  NA  NA  64  
98 251  79 176  320 133 30/09/2002 18/06/20137112   APCT
 153 279 289 370 412 262 115  75  021 138 
11/07/2002 13/05/20094121  C546078638365   12009   
16763  NA  NA  NA  323 152 27/05/1999 18/06/20137490   APCT 
 NA  80  77  60  89 137 144 146  124 154 
21/12/2004 18/06/20136820 AI  NA  NA 148 186 
302 233 194 204  226 177 20/02/1996 18/06/20137490   APCT   
   16   4  NA   3   3  NA   5   5  227 185  
6/03/1992 12/08/20116820 AI  26  NA  21  21  NA 
 21  21  16  232 231 14/03/2001 27/06/20116810 AI  
NA  63  76  79  72   5  NA  NA  338 272 28/03/2001 
18/06/20134110  C24625571588061596951 927   
 11021289  040 288 12/02/1997 18/06/20135630  H 307 671 
805 979  NA 558 238 449  141 306  1/01/2000 18/06/2013  
  7311   APCT 161 200 250 250 263 161  43  
50  042 311 21/02/2001 18/06/20136831 AI  NA  51  89
  69 135  28  11  12  147 373 18/07/1995 18/06/20134619 
   CRV 159  NA  NA  NA 161 192 208 230  349 389 
27/07/1990 18/06/20135610  H 686 750 749 783 
795 645 514 415  054 410 19/11/1992 18/06/20136920   APCT   
  330 290 290 342 387 415 465 421  055 420  
9/01/2004 18/06/20135610  H  NA 205 335 267 234 
211 194 204  159 443 18/01/2005 18/06/20134110  C  
NA  NA   7 702 957 1951489   5  263 463 13/03/2006 
18/06/20137311   APCT  NA  NA  NA  71 190 219   
  172 109  364 465 16/01/1995 18/06/20136920   APCT   7  42 
 42  42  90  60  36  12  071 503  8/06/1992 18/06/2013  
  2512 IM 470 551 549 582 638 618 510 
472  073 510 12/02/1997 18/06/20134759CRV 182 212 293   
  299 322 226 231  NA  176 527 26/09/2003 18/06/2013
7111   APCT  30 112 144  73  NA 171  51  68  
178 548 19/07/2002 18/06/20134673CRV 158 9511025 
301 112 358  18   8  079 552  4/11/1997 07/09/20114675  
  CRV78689420   10772   15140   14843   126829704   14077  082 603  
1/01/1996 18/06/20134334  C  47  49  69  NA  NA 
 80  96  76  2

setwd("C:/rprueba")# indicar donde están nuestros 
datoscastellon<-read.delim("clipboard", header=T, dec=",",check.names=T)

PASO 1, #contar missing values, lo hacemos el primer paso para quitar las 
empresas que no vamos a utilizar  
 rS<-rowSums(is.na(castellon[,18:24])) #sacamos el número de NA que hay en cada 
fila castellon["rS"]<-rowSums(is.na(castellon))#unimos columna de rS= numero de 
Na d<-dim(castellon)[2]#traem el numero de la ultima columan de la df 
p<-which(castellon[ ,d]<=3,arr.ind=T)#cast selecc filas con menos de 4 missing 
values cast<-castellon[p,]#guardamos como nueva data.frame esas filas 
cast[1:20,]
PASO 2, # de las empresas(mis.val<=3), dividimos la muestra por columnas q. nos 
interese
#cast1 son las primeras columnas que nos dan informacion #cast2 son numero 
empresa más variable para imputar
cast1<-cast[,c(1:12,14:16)]
cast2<-cast[,c(1,8,12,15:25)]#con las fechas

x<-split(cast1,cast1$SECTOR)y<-split(cast2,cast2$SECTOR)
for(i in 1:length(x)){  write.table(x[i],paste(paste("cast1_sector", i), 
".csv"),col.names=T,row.names=FALSE)   
write.table(y[i],paste(paste("cast2_secto

[R] Temporal Correlation in Logistic Regression

2013-08-06 Thread Worthington, Thomas A

I am attempting to create a logistic regression model to examine the factors 
that determine the emergence of four species of aquatic invertebrates. The 
invertebrates were trapped at two sites over a period of two years. The traps 
were emptied on an irregular spaced basis (with an extended gap over the winter 
period) and both sites were not always visited on the same day. I have two 
covariates I would like to test, discharge which is the same at both sites and 
temperature with is different between the sites. The main aim of the analysis 
is to see whether the difference temperature regimes between the two sites 
alters the probability the invertebrates will emerge. I had planned to test the 
this using a GLM of the form

model1<-glm(B.rhodani_Pres ~ Temp+ Discharge + Temp:Site_Code, family = 
binomial())

However examination of the Durbin Watson statistic suggests the residuals for 
the four models (one for each species) are highly autocorrelated.

Does anyone have any ideas how I can incorporate the temporal autocorrelation 
into the models?

Any advice would be greatly appreciated

Tom

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lattice yscale.components: use multiple convenience functions

2013-08-06 Thread Taylor, Sean D

Good morning,

I really enjoy some of the recent convenience functions in lattice_0.20-15 and 
latticeExtra_0.6-24. I am wondering if there is a way to use multiple functions 
in the same call? Specifically, I would like to be able to use 
yscale.components.log10ticks (to get the major tick marks at powers of 10 and 
minor tick marks in between) and also label the major tick marks smartly using 
superscripts for the power. Something along the lines of this:

##Pseudocode, does not work
xyplot((1:200)/20 ~ (1:200)/20, type = c("p", "g"),
   scales = list(x = list(log = 2), y = list(log = 10)),
   xscale.components = xscale.components.fractions,
   yscale.components = list(yscale.components.log10ticks,
yscale.components.logpower))

or this:
##Does not work
xyplot((1:200)/20 ~ (1:200)/20, type = c("p", "g"),
   scales = list(x = list(log = 2), y = list(log = 10)),
   xscale.components = xscale.components.fractions,
   yscale.components = function(...){
 yscale.components.log10ticks
 yscale.components.logpower}
   )

Thanks!
Sean

Sean Taylor
Post-doctoral Fellow
Fred Hutchinson Cancer Research Center
206-667-5544


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Structural equation models (SEM) for count data / poisson distribution

2013-08-06 Thread Stella Copeland


Dear R community,

I am constructing structural equation models in R and I have tried both 
the sem and lavaan packages. I have count data (numbers of plants in 
this case) that I would like to use as an endogenous variable. The 
poisson distribution seems appropriate for these data, but I can't seem 
to figure out how best to handle this within either package, lavaan or 
sem, though I have noticed that both packages are adding lots of 
options/capability to their core sem functions, so perhaps I have missed 
something.


Could someone recommend a reasonable approach for handling count data 
with SEM in R, in either lavaan or sem?


Thanks in advance for comments and suggestions,
Stella

--
Stella Copeland
PhD Candidate
Environmental Science & Policy
UC Davis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with mahal function Package dismo

2013-08-06 Thread David Carlson

It appears that you are trying to pass a data.frame to the function
and it is complaining. You didn't give us enough information to know
for sure (e.g. str(predictor) and str(Cfin)), but you could try

mm<-mahal (as.matrix(predictor), as.matrix(Cfin))

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Ernesto Villarino
Sent: Tuesday, August 6, 2013 5:51 AM
To: r-help@r-project.org
Subject: Re: [R] problem with mahal function Package dismo

Hi all,
I want to apply mahal function using data.frame instead of raster
data
but I am having problems (see error message below). I want to use
data.frame since we have seasonal data (the species distribute
differently as a function of months).

> head (predictor)

   OCPT x1XM zPc pHxM  MLD
38 21.23519 36.24476 -3164  8.836913 8.082310 68.09159
39 21.13811 36.25013 -2487  8.451318 8.077561 57.78384
40 21.03920 36.25259 -2025  8.132195 8.073292 62.59614
41 20.94312 36.25257 -3409  7.851401 8.069450 55.83329
79 21.22135 36.10911   -40 18.707443 8.108031 42.55479
80 21.14884 36.13638 -2800 21.133693 8.063561 64.28003

> head (Cfin)

   Lat Long
38  35  -38
39  35  -37
40  35  -36
41  35  -35
79  36  -75
80  36  -74

> mm<-mahal (predictor,Cfin)
Error en (function (classes, fdef, mtable)  :
  unable to find an inherited method for function 'mahal' for
signature
'"data.frame", "data.frame"'

Can you help me ??
Thanks,
Regards,
Ernesto

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice yscale.components: use multiple convenience functions

2013-08-06 Thread David Winsemius


On Aug 6, 2013, at 9:09 AM, Taylor, Sean D wrote:

> Good morning,
> 
> I really enjoy some of the recent convenience functions in lattice_0.20-15 
> and latticeExtra_0.6-24. I am wondering if there is a way to use multiple 
> functions in the same call? Specifically, I would like to be able to use 
> yscale.components.log10ticks (to get the major tick marks at powers of 10 and 
> minor tick marks in between) and also label the major tick marks smartly 
> using superscripts for the power. Something along the lines of this:

See the code supporting figures 8.4 and 8.5 in Sarkar's Lattice book.

-- 
David.
> 
> ##Pseudocode, does not work
> xyplot((1:200)/20 ~ (1:200)/20, type = c("p", "g"),
>   scales = list(x = list(log = 2), y = list(log = 10)),
>   xscale.components = xscale.components.fractions,
>   yscale.components = list(yscale.components.log10ticks,
>yscale.components.logpower))
> 
> or this:
> ##Does not work
> xyplot((1:200)/20 ~ (1:200)/20, type = c("p", "g"),
>   scales = list(x = list(log = 2), y = list(log = 10)),
>   xscale.components = xscale.components.fractions,
>   yscale.components = function(...){
> yscale.components.log10ticks
> yscale.components.logpower}
>   )
> 
> Thanks!
> Sean
> 
> Sean Taylor
> Post-doctoral Fellow
> Fred Hutchinson Cancer Research Center
> 206-667-5544
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to retrieve pairwise distances between clusters after cutting the tree?

2013-08-06 Thread Naxerova, Kamila

Dear all,

what would be the best way of retrieving distances between individual clusters 
after cutting my tree of interest? $height from the hclust object will give me 
the distance between clusters at a each agglomeration step, but let's say I 
have a situation where I have six observations A, B, C, D, E, F. The clustering 
proceeds 

1) {A,B}
2) {C,D},
3) {E,F},
3) {C,D,E,F}
4) {A,B,C,D,E,F}

but now I want to know the distance between {A,B} and {E,F} which is not 
directly recorded in $height?

I could find the distance by locating cluster members in the original distance 
matrix, but is there a more direct way that I might not be aware of? Something 
along the lines of calc.pairwise.dist(cutree(hclust(dist),k=3))?

Many thanks in advance.
Kamila
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help.start

2013-08-06 Thread Witold E Wolski

Dear Martin

I mean
R version 3.0.1 (2013-05-16) -- "Good Sport"

On 6 August 2013 16:55, Martin Maechler  wrote:
>> Witold E Wolski 
>> on Tue, 6 Aug 2013 16:40:42 +0200 writes:
>
> > Does anyone also observes with R 3.1 (on linux) that the help.start
>
> there's no R 3.1 .  What do you mean really?
>
> R 3.0.1 ?
>
> or do you mean the current "R under development"
> which (quite rarely) also shows as 3.1.0 and will be changed
> quite a bit before eventually be released as 3.1.0  in 2014 ??
>
> Witold, reallly, you should know better,
> I think 
>
> > function frequently blocks the R session (never returns) ?
> > Can I do anything or just waith for R3.1.1?
>
> > --
> > Witold Eryk Wolski



-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Horizontal bar plot for lengthy data

2013-08-06 Thread Christofer Bogaso

Hello Jim,

Thanks for your pointer. Could you be more specific how I can implement
your strategy?

Thanks and regards,

On Tue, Aug 6, 2013 at 2:49 PM, Jim Lemon  wrote:

> On 08/06/2013 07:01 AM, Christofer Bogaso wrote:
>
>> Hi David,
>>
>> Thanks for your answer.
>>
>> However I was thinking if it would be possible to have the Vertical-scroll
>> bar, so that user can scroll his screen while still having all the bars on
>> the plot clearly.
>>
>> Is there any possibility?
>>
>>  Hi Christofer,
> I had to solve a similar problem when I had to represent the movement of a
> pointer over an experimental session in which there could be a few thousand
> records of the pointer position. I output the plot as a very wide PNG
> graphic and you can then zoom in and scroll through this either in an image
> display program or in a web browser. You can do the same thing with
> Postscript or PDF graphics.
>
> Jim
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] algorithm for clustering categorical data

2013-08-06 Thread Li, Yan

Thanks David, This is very useful!

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu] 
Sent: Tuesday, August 06, 2013 11:27 AM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

What do you mean by representing the categorical fields by 1:k?

a <- c("red", "green", "blue", "orange", "yellow")

becomes

a <- c(1, 2, 3, 4, 5)

That guarantees your results are worthless unless your categories have an 
inherent order (e.g. tiny, small, medium, big, giant).
Otherwise it should be four (k-1) indicator/dummy variables (e.g.):

a.red <- c(1, 0, 0, 0, 0)
a.green <- c(0, 1, 0, 0, 0)
a.blue <- c(0, 0, 1, 0, 0)
a.orange <- c(0, 0, 0, 1, 0)

Then you can use Euclidean distance.

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: Li, Yan [mailto:yan...@ibi.com]
Sent: Tuesday, August 6, 2013 9:36 AM
To: dcarl...@tamu.edu; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

H David and other R helpers,

If I rescale the numerical fields to [0,1] and represent the categorical fields 
to 1:k, which is the same starting point as Gower's measure, but I use 
Euclidean distance instead of Gower's distance to do k-means clustering. How 
much is the difference? What is the draw back? 

Thanks you,
Yan

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu]
Sent: Thursday, August 01, 2013 12:08 PM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You didn't give us any 
information about how you numerically transformed the categorical variables, 
but the usual approach is to create indicator variables that code 
presence/absence for each category within a categorical variable. Different 
variances between variables can be reduced by standardizing the variables.

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help@r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical variables? R 
packages? Which is the best?

If a data has both numeric and categorical data, what is the best clustering 
algorithm to use and R package?

I tried numeric transformation of all categorical fields  and doing clustering 
afterwards. But the transformed fields have values from 1...10, and my other 
fields is in a bigger scale:
1-...This will make the categorical fields has less effect on the distance 
calculation...

Thank you!
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] algorithm for clustering categorical data

2013-08-06 Thread Li, Yan

Thanks for the reply...

For some reason, I need to keep Euclidean distance in the process...

-Original Message-
From: Martin Maechler [mailto:maech...@stat.math.ethz.ch] 
Sent: Tuesday, August 06, 2013 12:04 PM
To: dcarl...@tamu.edu
Cc: Li, Yan; r-help@r-project.org
Subject: Re: [R] algorithm for clustering categorical data

> "DC" == David Carlson 
> on Tue, 6 Aug 2013 10:26:56 -0500 writes:

> What do you mean by representing the categorical fields by 1:k?
> a <- c("red", "green", "blue", "orange", "yellow")

> becomes

> a <- c(1, 2, 3, 4, 5)

> That guarantees your results are worthless worthless indeed!

> unless your categories
> have an inherent order (e.g. tiny, small, medium, big, giant).
> Otherwise it should be four (k-1) indicator/dummy variables (e.g.):

> a.red <- c(1, 0, 0, 0, 0)
> a.green <- c(0, 1, 0, 0, 0)
> a.blue <- c(0, 0, 1, 0, 0)
> a.orange <- c(0, 0, 0, 1, 0)

> Then you can use Euclidean distance.

Yes, ... or use Gower's or other similarly sophisticated distances, as you 
(David) mentioned earlier in this thread.

Do also note that a generalized Gower's distance (+ weighting of
variables) is available from the ('recommended' hence always
installed) package 'cluster' :

  require("cluster")
  ?daisy
  ## notably  daisy(*,  metric="gower")

Note that daisy() is more sophisticated than most users know, using the 'type = 
*' specification allowing, notably for binary variables (as your a. 
dummies above) allowing asymmetric behavior which maybe quite important in 
"rare event" and similar cases.

Martin


> -
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77840-4352


> -Original Message-
> From: Li, Yan [mailto:yan...@ibi.com] 
> Sent: Tuesday, August 6, 2013 9:36 AM
> To: dcarl...@tamu.edu; r-help@r-project.org
> Subject: RE: [R] algorithm for clustering categorical data

> H David and other R helpers,

> If I rescale the numerical fields to [0,1] and represent the
> categorical fields to 1:k, which is the same starting point as
> Gower's measure, but I use Euclidean distance instead of Gower's
> distance to do k-means clustering. How much is the difference? What
> is the draw back? 

> Thanks you,
> Yan

> -Original Message-
> From: David Carlson [mailto:dcarl...@tamu.edu] 
> Sent: Thursday, August 01, 2013 12:08 PM
> To: Li, Yan; r-help@r-project.org
> Subject: RE: [R] algorithm for clustering categorical data

> Read up on Gower's Distance measures (available in the ecodist
> package) which can combine numeric and categorical data. You didn't
> give us any information about how you numerically transformed the
> categorical variables, but the usual approach is to create indicator
> variables that code presence/absence for each category within a
> categorical variable. Different variances between variables can be
> reduced by standardizing the variables.

> -
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77840-4352

> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
> Sent: Thursday, August 1, 2013 11:00 AM
> To: r-help@r-project.org
> Subject: [R] algorithm for clustering categorical data

> Hi All,

> Does anyone know what algorithm for clustering categorical
> variables? R packages? Which is the best?

> If a data has both numeric and categorical data, what is the best
> clustering algorithm to use and R package?

> I tried numeric transformation of all categorical fields  and doing
> clustering afterwards. But the transformed fields have values from
> 1...10, and my other fields is in a bigger scale:
> 1-...This will make the categorical fields has less effect on
> the distance calculation...

> Thank you!
> Yan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] odfWeave post processing error

2013-08-06 Thread Maverick topgun

Hello all.  When I attempt to run the following script, I receive a post 
processing error message. I would appreciate any help in interpreting and 
correcting the error.
Script:
library(odfWeave)
ctrl <- odfWeaveControl(zipCmd = c("\"C:\\Program Files\\7-Zip\\7z.exe\" a 
$$file$$", "\"C:\\Program Files\\7-Zip\\7z.exe\" x -tzip $$file$$"))
odfWeave('ex1.odt', 'out.odt', control = ctrl)
Output:
> odfWeave('ex1.odt', 'out.odt', control = ctrl)  Creating  
> C:\Users\Frank\AppData\Local\Temp\RtmpCctAn2/odfWeave06140621908   Copying  
> ex1.odt   Setting wd to  
> C:\Users\Frank\AppData\Local\Temp\RtmpCctAn2\odfWeave06140621908   Unzipping 
> ODF file using "C:\Program Files\7-Zip\7z.exe" x -tzip "ex1.odt" 
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
Processing archive: ex1.odt
Extracting  mimetypeExtracting  Configurations2\floaterExtracting  
Configurations2\accelerator\current.xmlExtracting  
Configurations2\images\BitmapsExtracting  Configurations2\progressbarExtracting 
 Configurations2\menubarExtracting  Configurations2\popupmenuExtracting  
Configurations2\statusbarExtracting  Configurations2\toolbarExtracting  
Configurations2\toolpanelExtracting  Thumbnails\thumbnail.pngExtracting  
content.xmlExtracting  settings.xmlExtracting  styles.xmlExtracting  
manifest.rdfExtracting  meta.xmlExtracting  META-INF\manifest.xml
Everything is Ok
Folders: 8Files: 9Size:   33353Compressed: 14190
  Removing  ex1.odt   Creating a Pictures directory
  Pre-processing the contents  Sweaving  content.Rnw 
  Writing to file content_1.xml  Processing code chunks ...1 : term hide
2 : term verbatim(label=t1)3 : term xml(label=t2)
  'content_1.xml' has been Sweaved
  Removing content.xml
Here the processing stops.
Error:
Error: XML content does not seem to be XML: 'content_1.xml'In addition: Warning 
message:In file.remove("content.xml") :  cannot remove file 'content.xml', 
reason 'No such file or directory'
SessionInfo:
  Post-processing the contents sessionInfo()R version 3.0.1 
(2013-05-16)Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:[1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United 
States.1252   [3] LC_MONETARY=English_United States.1252[4] LC_NUMERIC=C
  [5] LC_TIME=English_United States.1252
attached base packages: [1] grid  grDevices datasets  splines   graphics  
stats tcltk [8] utils methods   base 
other attached packages: [1] odfWeave_0.8.2   XML_3.98-1.1 Kmisc_0.4.0-1
Rcpp_0.10.4  [5] psych_1.3.2  ggplot2_0.9.3.1  MCMCpack_1.3-3   
coda_0.16-1  [9] xtable_1.7-1 plyr_1.8 polycor_0.7-8
sfsmisc_1.0-23  [13] prettyR_2.0-7lme4_0.99-2  Matrix_1.0-12
effects_2.2-4   [17] colorspace_1.2-2 lattice_0.20-15  mvnormtest_0.1-9 
gmodels_2.15.4  [21] gtools_3.0.0 doBy_4.5-8   multcomp_1.2-18  
mvtnorm_0.9-9995[25] rms_4.0-0SparseM_1.03 e1071_1.6-1  
class_7.3-8 [29] car_2.0-18   nnet_7.3-7   mitools_2.2  
foreign_0.8-54  [33] MASS_7.3-28  svSocket_0.9-55  TinnR_1.0-5  
R2HTML_2.2.1[37] Hmisc_3.12-2 Formula_1.1-1survival_2.37-4 
loaded via a namespace (and not attached): [1] cluster_1.14.4 
dichromat_2.0-0digest_0.6.3   gdata_2.13.2   [5] gtable_0.1.2   
labeling_0.2   munsell_0.4.2  nlme_3.1-110   [9] proto_0.3-10   
RColorBrewer_1.0-5 reshape2_1.2.2 rpart_4.1-1   [13] scales_0.2.3   
stats4_3.0.1   stringr_0.6.2  svMisc_0.9-69 [17] tools_3.0.1   




Respectfully,
 
Frank Lawrence
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with t-test

2013-08-06 Thread arun

Hi Vivek,
I removed the rows with missing values and also duplicated rows.  Now, it looks 
like it is working.




x<-read.table("RP_matrix_FPKM_PGTvsPDGT.txt",header=T,sep="\t")
x1<- read.table("RP_plaise_FPKM_PGTvsPDGT.txt",header=T,sep="\t") 
str(x1)
#'data.frame':    19680 obs. of  6 variables:
# $ ID    : Factor w/ 19678 levels "XLOC_01",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ PGT.1 : num  112.47 13.76 62.13 4.16 0 ...
# $ PGT.0 : num  118.83 14.88 94.29 3.49 0 ...
# $ PGT.2 : num  179.324 22.677 117.368 6.36 0.385 ...
# $ PDGT.0: num  301.154 39.165 242.685 9.119 0.126 ...
# $ PDGT.1: num  144.5 30 161.2 3.5 0 ...
 str(x)
#'data.frame':    28599 obs. of  6 variables:
# $ gene  : Factor w/ 28599 levels "XLOC_01",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ PGT.1 : num  71.25 8.71 14.6 1.99 0 ...
# $ PGT.0 : num  68.36 8.16 9.75 2.4 0 ...
# $ PGT.2 : num  108.17 13.35 18.29 3.64 0 ...
# $ PDGT.0: num  195.01 24.76 40.59 5.61 0 ...
# $ PDGT.1: num  93.06 18.88 26.83 2.14 0 ...
 length(unique(x[,1]))
#[1] 28599
 length(unique(x1[,1]))
#[1] 19679
x2<- x1[-which(duplicated(x1[,1])),]
dim(x2)
#[1] 19679 6

x3<- na.omit(x2)
 dim(x3)
#[1] 19678 6



cl<-c(rep(0,3),rep(1,2))

origin<-c(rep(1,5))


library(RankProd)
RP.out <- 
RPadvance(x3[,-1],cl,origin,gene.names=as.character(x3[,1]),num.perm=200)

A.K.


From: Vivek Das 
To: arun  
Sent: Tuesday, August 6, 2013 9:38 AM
Subject: Re: Problem with t-test



No I have tried it again on other files and the error is not there it works 
fine.. its a new file I have created, I am sending you the script and the file 
which I am using, its a non fussy script I created and worked multiples times 
with other files, I am sending you 2 different input files where in one it 
works in the other it does not. With the files plaise its not working but with 
the other input file its working.

library(RankProd)

x<-read.table("RP_matrix_RF_PGTvsPDGT.txt",header=T,sep="\t")

cl<-c(rep(0,3),rep(1,2))

origin<-c(rep(1,5))

RP.out <- RPadvance(x[,-1],cl,origin,gene.names=x[,1],num.perm=200)

topGene(RP.out,cutoff = 0.1)
#plotRP(RP.out, cutoff = 0.1)

table=topGene(RP.out,cutoff=0.1,method="pfp")

t1<-table$Table1
t2<-table$Table2

ind1<-which(t1[,4]<0.1)

ind2<-which(t2[,4]<0.1)


up<-t1[ind1,]

down<-t2[ind2,]

degs<-rbind(up,down)




--

Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy

emails: vivek@ieo.eu
            vchris...@yahoo.co.in
            vd4mm...@gmail.com



On Tue, Aug 6, 2013 at 3:17 PM, arun  wrote:

HI Vivek,
>I never used RankProd before.  So, can't guarantee if I can sort the problem.  
>But, you can send me the file and the script.  I will try it later.
>As you mentioned that RankProd worked before, is it on the same file or a 
>different file.  If it is the latter, then try running it on that file and see 
>if the error repeats.
>
>
>
>
>
>
>
>
>
>From: Vivek Das 
>To: arun 
>Sent: Tuesday, August 6, 2013 9:09 AM
>
>Subject: Re: Problem with t-test
>
>
>
>Yes, I know this but am worried about the consistency of the data then as it 
>will remove a lot of observations and so the results will not be good infact I 
>tested it and am not getting p value as I expected. Anyways I am doing another 
>test which is a RankProd package in R. I am encountering a problem here, I 
>have used this package multiple number of times but have never faced this , do 
>you have any idea when do we get the below error?
>
>Error in `row.names<-.data.frame`(`*tmp*`, value = value) : duplicate 
>'row.names' are not allowed In addition: Warning message: non-unique values 
>when setting 'row.names': ‘’ in rankprod. 
>
>
>I am not being able to understand the duplicate'row.names' option as these are 
>gene location on the row with values of expression and the locations are 
>duplicate more than 2-3 times , I have used such data frame earlier as well to 
>compute the RankProd and they worked. But now I am getting some error. I can 
>share the script and the file with you if you need as the pipeline for 
>RankProd is very easy to execute.
>
>If you can give me some idea about the error it will be good.
>
>
>--
>
>Vivek Das
>PhD Student in Computational Biology
>Giuseppe Testa's Lab
>European School of Molecular Medicine
>IFOM-IEO Campus
>Via Adamello, 16
>Milan, Italy
>
>emails: vivek@ieo.eu
>            vchris...@yahoo.co.in
>            vd4mm...@gmail.com
>
>
>
>On Tue, Aug 6, 2013 at 3:01 PM, arun  wrote:
>
>Hi Vivek,
>>No problem.
>>?t.test
>>na.action: a function which indicates what should happen when the data
>>  contain ‘NA’s.  Defaults to ‘getOption("na.action")’.
>>
>>In my system,
>>
>>getOption("na.action")
>>#[1] "na.omit"
>>
>>
>>So, it removes the NA's by default and reduce the number o

Re: [R] R plot

2013-08-06 Thread Guanrao Chen

Try extract the individual frames first ...

Thanks,
Guanrao 

http://www.foundyo.com



 From: Mª Teresa Martinez Soriano 
To: "r-help@r-project.org"  
Sent: Tuesday, August 6, 2013 3:43 AM
Subject: [R] R plot


Hi to everyone, first of all, thanks hor this excellent service.

I have a doubt in R, it looks like:

I want to get a plot of my data.frame, but I have used the funtion split in 
this data.frame and I 

don't know if there exist some function which could help me, I was using for 
loop. The problem is 

that I get a plot with all the datas together and I want a plot for each 
data.frame i get after using 

split function.

x<-split(cast1,cast1$SECTOR)
y<-split(cast2,cast2$SECTOR)


for (i in 1:length(a))
{
datos[[i]]<-y[[i]]
}



plot(datos[[2]][,6], type="l", main= "SECTOR" )
for(j in 1: length(datos))for(i in 6:13)(lines(datos[[j]][,i], type="l", col=i))

What can I do in order to get this plot separately?

I have tried with this:

apply(datos,function(x)lines(x[[]][,i], type='l',col=i))

but I get nothing, I have no idea about using apply, mapply..functions with plot


Thanks in advance                           
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to retrieve pairwise distances between clusters after cutting the tree?

2013-08-06 Thread David Carlson

Assuming you are defining "distance between clusters" as the
distance between the centroids and you have the original data, you
can use aggregate() on the original data with the output from
cutree() as the grouping variable to create a new data.frame of
cluster centers (means). Then just run that through dist().
Something like

set.seed(42)
x <- matrix(runif(250), 25, 10)
dist(aggregate(x, by=list(cutree(hclust(dist(x)), k=3)), mean))
#  12
# 2 1.297682 
# 3 2.150580 1.380707

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Naxerova, Kamila
Sent: Tuesday, August 6, 2013 1:00 PM
To: r-help@r-project.org
Subject: [R] How to retrieve pairwise distances between clusters
after cutting the tree?

Dear all,

what would be the best way of retrieving distances between
individual clusters after cutting my tree of interest? $height from
the hclust object will give me the distance between clusters at a
each agglomeration step, but let's say I have a situation where I
have six observations A, B, C, D, E, F. The clustering proceeds 

1) {A,B}
2) {C,D},
3) {E,F},
3) {C,D,E,F}
4) {A,B,C,D,E,F}

but now I want to know the distance between {A,B} and {E,F} which is
not directly recorded in $height?

I could find the distance by locating cluster members in the
original distance matrix, but is there a more direct way that I
might not be aware of? Something along the lines of
calc.pairwise.dist(cutree(hclust(dist),k=3))?

Many thanks in advance.
Kamila
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] descriptive stats by cells in factorial design

2013-08-06 Thread Mike Miller

I received two additional suggestions, one off-list, both appended below. 
Both helped me to learn a bit more about how to get what I want.


First, the aggregate() function is in package:stats, it provides the 
numbers I needed, but I don't like the output format as much as I liked 
the format from doBy:summaryBy().  Here it is:



aggregate(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, 
function(x) c(mean=mean(x), sd=sd(x), quantile(x), N=length(x)))

   Generation ZygositySex Cohort ESstatusAge.mean  Age.sd  
Age.0% Age.25% Age.50% Age.75%Age.100%   Age.N
1   Offspring   DZ Female 11   ES  17.7852830   0.3535863  
16.930  17.600  17.775  17.965  18.920 106.000
2  Parent   DZ Female 11   ES  44.6151240   5.1246314  
32.170  41.340  44.680  48.280  57.950 121.000
3   Offspring   MZ Female 11   ES  17.8762755   0.4506530  
16.860  17.6775000  17.805  18.100  19.120 196.000
4  Parent   MZ Female 11   ES  44.2347573   5.0214627  
29.550  40.6925000  44.125  47.730  56.730 206.000
5   Offspring   DZ   Male 11   ES  17.7614925   0.3467540  
17.180  17.515  17.715  18.000  18.710 134.000
6  Parent   DZ   Male 11   ES  44.6020635   4.5605484  
34.310  41.4475000  44.890  47.4975000  58.750 126.000
7   Offspring   MZ   Male 11   ES  17.7717436   0.3236917  
16.840  17.580  17.790  17.970  19.020 195.000
8  Parent   MZ   Male 11   ES  43.4078680   5.3507439  
31.280  39.970  43.440  46.480  64.650 197.000
9   Offspring   DZ Female 11notES  18.1367901   0.968  
16.760  17.8525000  18.190  18.4575000  19.500 162.000
10 Parent   DZ Female 11notES  42.5434579   4.3670998  
34.030  39.345  42.110  45.550  57.060 107.000
11  Offspring   MZ Female 11notES  18.0573883   0.6103713  
16.760  17.630  18.050  18.420  19.700 291.000
12 Parent   MZ Female 11notES  42.3198837   5.3622671  
30.310  38.605  41.835  46.0175000  56.580 172.000
13  Offspring   DZ   Male 11notES  17.877   0.5187333  
16.830  17.460  17.860  18.240  19.020 153.000
14 Parent   DZ   Male 11notES  42.7112102   4.9600561  
32.050  39.240  42.760  45.270  58.200 157.000
15  Offspring   MZ   Male 11notES  17.8771831   0.6472397  
16.560  17.330  17.855  18.210  20.010 284.000
16 Parent   MZ   Male 11notES  41.5636254   4.6564818  
32.100  38.025  41.390  44.645  65.290 331.000
17  Offspring   DZ Female 17notES  17.4752880   0.4569588  
16.560  17.070  17.590  17.870  18.290 191.000
18 Parent   DZ Female 17notES  46.3055882   4.9177705  
36.100  42.7275000  45.765  48.335  62.690  68.000
19  Offspring   MZ Female 17notES  17.4106076   0.4956190  
16.550  16.970  17.340  17.820  18.450 395.000
20 Parent   MZ Female 17notES  46.3649032   5.1770435  
34.880  42.420  45.950  49.495  63.180 155.000
21  Offspring   DZ   Male 17notES  17.5041818   0.3915823  
16.730  17.190  17.530  17.830  18.520 165.000
22 Parent   DZ   Male 17notES  46.7745763   4.0226198  
40.180  44.125  46.000  48.820  61.120  59.000
23  Offspring   MZ   Male 17notES  17.4911446   0.3961757  
16.650  17.1775000  17.500  17.810  18.350 332.000
24 Parent   MZ   Male 17notES  46.6929771   5.2421896  
34.450  43.150  45.890  49.005  63.800 131.000

That's great but there are two things I didn't like:  (1) There too many 
digits, especially on the integers in the last column.  I thought five 
digits to the right of the decimal was more than enough but here we have 
seven, even for integers.  (2) The ordering of levels within factors 
implied by the right side of the formula is not honored -- it looks like 
it used the order Cohort, ESstatus, Sex, Zygosity, Generation.  Unlike 
doBy::summaryBy(), it does not accept an order=T argument (that is the 
default in doBy::summaryBy()).


One thing both suggestions taught me was to use names in function 
definitions so that I always get correct column headings on output.  This 
was in the documentation for doBy::summaryBy(), but I didn't understand it 
when I first read it.  Using that naming concept, I created this function:


descriptivefun <- function(x, ...){c(mean=mean(x, ...), sd=sd(x, ...), 
quantile(x, ...), N=sum(!is.na(x)), NAs=

Re: [R] descriptive stats by cells in factorial design

2013-08-06 Thread David Winsemius


On Aug 6, 2013, at 4:02 PM, Mike Miller wrote:

> I received two additional suggestions, one off-list, both appended below. 
> Both helped me to learn a bit more about how to get what I want.
> 
> First, the aggregate() function is in package:stats, it provides the numbers 
> I needed, but I don't like the output format as much as I liked the format 
> from doBy:summaryBy().  Here it is:
> 
>> aggregate(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, 
>> function(x) c(mean=mean(x), sd=sd(x), quantile(x), N=length(x)))
>   Generation ZygositySex Cohort ESstatusAge.mean  Age.sd  
> Age.0% Age.25% Age.50% Age.75%Age.100%   Age.N
> 1   Offspring   DZ Female 11   ES  17.7852830   0.3535863  
> 16.930  17.600  17.775  17.965  18.920 106.000
> 2  Parent   DZ Female 11   ES  44.6151240   5.1246314  
> 32.170  41.340  44.680  48.280  57.950 121.000
> 
snipped
> 23  Offspring   MZ   Male 17notES  17.4911446   0.3961757  
> 16.650  17.1775000  17.500  17.810  18.350 332.000
> 24 Parent   MZ   Male 17notES  46.6929771   5.2421896  
> 34.450  43.150  45.890  49.005  63.800 131.000
> 
> That's great but there are two things I didn't like:  (1) There too many 
> digits, especially on the integers in the last column.  I thought five digits 
> to the right of the decimal was more than enough but here we have seven, even 
> for integers.  (2) The ordering of levels within factors implied by the right 
> side of the formula is not honored -- it looks like it used the order Cohort, 
> ESstatus, Sex, Zygosity, Generation.  Unlike doBy::summaryBy(), it does not 
> accept an order=T argument (that is the default in doBy::summaryBy()).
> 
> One thing both suggestions taught me was to use names in function definitions 
> so that I always get correct column headings on output.  This was in the 
> documentation for doBy::summaryBy(), but I didn't understand it when I first 
> read it.  Using that naming concept, I created this function:
> 
> descriptivefun <- function(x, ...){c(mean=mean(x, ...), sd=sd(x, ...), 
> quantile(x, ...), N=sum(!is.na(x)), NAs=sum(is.na(x)))}
> 
> That will allow me to feed the na.rm=T argument to the mean, sd and quantile 
> functions.  By not naming the quantile function (e.g., not using 
> q=quantile(x, ...)) I allow the builtin column names to be used unaltered 
> (i.e., 0%, 25%, 50%, 75%, 100%).  I also did not use length() because it will 
> count NA values and I want to see the sample sizes used for mean, sd and 
> quantile.  To deal with that problem I created a function with output named 
> "N" to count those sample sizes and one with output named "NAs" to count the 
> number of NAs.  Then I do this:
> 
>> summaryBy(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, 
>> FUN=descriptivefun, na.rm=T)
>   Generation ZygositySex Cohort ESstatus Age.meanAge.sd Age.0% 
> Age.25% Age.50% Age.75% Age.100% Age.N Age.NAs
> 1   Offspring   DZ Female 11   ES 17.78528 0.3535863  16.93 
> 17.6000  17.775 17.965018.92   106   0
> 2   Offspring   DZ Female 11notES 18.13679 0.968  16.76 
> 17.8525  18.190 18.457519.50   162   0
> 
snipped
> 22 Parent   MZ   Male 11   ES 43.40787 5.3507439  31.28 
> 39.9700  43.440 46.480064.65   197   0
> 23 Parent   MZ   Male 11notES 41.56363 4.6564818  32.10 
> 38.0250  41.390 44.645065.29   331   0
> 24 Parent   MZ   Male 17notES 46.69298 5.2421896  34.45 
> 43.1500  45.890 49.005063.80   131   0
> 
> I think that output looks very nice.  One thing that I don't understand is 
> why my function produces %.5f output for every value but the 
> doBy::summaryBy() function uses different formats in different columns.

Look at the code. You are attributing behavior to `summaryBy` that should be 
ascribed to `print.data.frame`, and to `format.data.frame`. Your function is 
returning a numeric vector and getting displayed by `print.default`.

-- 
David.

> Compare the above output with this output:
> 
>> descriptivefun(x$Age)
>  mean sd 0%25%50%75%   100%   
>NNAs
>  28.49302   13.29077   16.55000   17.65000   18.23000   42.25500   65.29000 
> 4434.00.0
> 
> It's not a big deal, but it would be cool if I could tell doBy::summaryBy() 
> how to format the numbers using something like format=c(rep("%.2f",7), 
> rep("%d",2)).
> 
> Mike
> 
> --
> Michael B. Miller, Ph.D.
> Minnesota Center for Twin and Family Research
> Department of Psychology
> University of Minnesota
> 
> 
> 
> On Mon, 5 Aug 2013, David Carlson wrote:
> 
>> This is a bit simpler. The function quantile() labels the output whereas 
>> fivenum() does not:
>> 
>> aggregate(Age ~ Generation + Zygosity + Sex + Cohort +
>> E

[R] Problem when running an SVAR-AB model in vars package

2013-08-06 Thread jpm miao

Is "B" a reserve word in vars package?

I tried to run an SVAR-AB model by SVAR function and find the IRF in vars
package. The problem is that when B matrix is named by "B", an error
message occurs. However, if the same matrix is named by "Bm", then things
run smoothly.  What's wrong? Is "B" a reserve word in vars package?

> Amatexo
 [,1] [,2]
[1,]10
[2,]   NA1
> Bm
 [,1] [,2]
[1,]   NA0
[2,]0   NA
> B<-Bm
> B
 [,1] [,2]
[1,]   NA0
[2,]0   NA
> svar.Aexo<-SVAR(var_dl, estmethod ="direct", Amat=Amatexo,
Bmat=Bm,hessian=TRUE)
Warning message:
In SVAR(var_dl, estmethod = "direct", Amat = Amatexo, Bmat = Bm,  :
  The AB-model is just identified. No test possible.
> irf.svaraexo<-irf(svar.Aexo,  boot=TRUE, n.ahead=12)
There were 50 or more warnings (use warnings() to see the first 50)
> svar.Aexo<-SVAR(var_dl, estmethod ="direct", Amat=Amatexo,
Bmat=B,hessian=TRUE)
Warning message:
In SVAR(var_dl, estmethod = "direct", Amat = Amatexo, Bmat = B,  :
  The AB-model is just identified. No test possible.
> irf.svaraexo<-irf(svar.Aexo,  boot=TRUE, n.ahead=12)
Error in determinant.matrix(x, logarithm = TRUE, ...) :
  'x' must be a square matrix
In addition: Warning message:
In optim(start, logLc, ...) :
  one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Demographic Analytics in R

2013-08-06 Thread Ray DiGiacomo, Jr.

Hello All,

The Orange County R User Group (OC-RUG) will host a free webinar with M.I.T
on August 29 to debut the new "acs" R package for demographic analytics.

Registration is at:

https://www3.gotomeeting.com/register/730429166

Best Regards,

The Orange County R User Group

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

44 matches

Mail list logo