date:20080922

2008/9/22 José E. Lozano <[EMAIL PROTECTED]>:

> Recently I have been trying to open a huge database with no success.
>
> It's a 4GB csv plain text file with around 2000 rows and over 500,000
> columns/variables.

 I wouldn't call a 4GB csv text file a 'database'.

> Is there any way to work with "parts" (a set of columns) of this database,
> since its impossible to manage it all at once?

 Yes, use a database. A real database.

> Is there any way to establish a link to the csv file and to state the
> columns you want to fetch every time you make an analysis?

 No, but you can establish a link to a database. You want a database.
A real relational database.

> I've been searching the net, but found little about this topic.

Try:
http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Symmetric matrix

2008-09-22 Thread Martin Maechler

> "DR" == Dimitris Rizopoulos <[EMAIL PROTECTED]>
> on Sun, 21 Sep 2008 19:58:44 +0200 writes:

DR> try the following
DR> a <- matrix(rnorm(36), 6)
DR> ind <- lower.tri(a)
DR> a[ind] <- t(a)[ind]
DR> a

Yes, indeed, it needs the t(.) trick.

Note that 'Matrix' package has a function  forceSymmetric(.) to
do this for you (faster, using C code):

 A <- forceSymmetric(Matrix(rnorm(36), 6))

is all you'd need {if can afford to trash half of the random
  numbers generated}

Martin Maechler, ETH Zurich

DR> I hope it helps.

DR> Best,
DR> Dimitris

DR> Megh Dal wrote:
>> I have following matrix :
>> 
>> a = matrix(rnorm(36), 6)
>> 
>> Now I want to replace the lower-triangular elements with it's 
upper-triangular elements. That is I want to make a symmetric matrix from a. I 
have tried with lower.tri() and upper.tri() function, but got desired result. 
Can anyone please tell me how to do that?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008-09-22 Thread Yihui Xie

Hi,

You can treat it as a database and use ODBC to fetch data from the CSV
file using SQL. See the package RODBC for details about database
connections. (I have dealt with similar problems before with RODBC)

Regards,
Yihui
--
Yihui Xie <[EMAIL PROTECTED]>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Sep 22, 2008 at 2:50 PM, José E. Lozano <[EMAIL PROTECTED]> wrote:
> Hello,
>
>
>
> Recently I have been trying to open a huge database with no success.
>
>
>
> It's a 4GB csv plain text file with around 2000 rows and over 500,000
> columns/variables.
>
>
>
> I have try with The SAS System, but it reads only around 5000 columns, no
> more. R hangs up when opening.
>
>
>
> Is there any way to work with "parts" (a set of columns) of this database,
> since its impossible to manage it all at once?
>
>
>
> Is there any way to establish a link to the csv file and to state the
> columns you want to fetch every time you make an analysis?
>
>
>
> I've been searching the net, but found little about this topic.
>
>
>
> Best regards,
>
> Jose Lozano
>
>
>[[alternative HTML version deleted]]
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

Hello, Yihui

> You can treat it as a database and use ODBC to fetch data from the CSV
> file using SQL. See the package RODBC for details about database
> connections. (I have dealt with similar problems before with RODBC)

Thanks for your tip, I have used RODBC before to read data from MSAccess and
MSExcel files, but never I imagined it could work for non-database files
such as csv.

I will check the RODBC documentation.

Best Regards,
Jose Lozano

--
Jose E. Lozano Alonso
Observatorio de Salud Pública.
Direccion General de Salud Pública e I+D+I.
Junta de Castilla y León.
Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

> I wouldn't call a 4GB csv text file a 'database'.

Obviously, a csv it's not a database itself, I tried to mean (though it
seems I was not understood) that I had a huge database, exported to csv file
by the people who created it (and I dont have any idea of the original
format of the database).

> Yes, use a database. A real database.

I've used MSAccess and there is a limit of 255 columns, as far as I know, so
there is no way of import it. Obviously, I won't buy an Oracle license to
read this file, so: what database system allows a 50 variables table?
MySQL? Do I have to split the file in smaller parts to import in tables to
relate them all using an index field?

> No, but you can establish a link to a database. You want a database.
> A real relational database.

> Try:
> http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases

It didn't help, sorry. I perfectly knew what a relational database is (and I
humbly consider myself an advanced user on working with MSAccess+VBA, only
that I've never face this problem with variables), you should not suppose
everyone's stupid, though...

Thanks for your help,
Best regards
Jose Lozano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to keep up with R?

2008-09-22 Thread Robin Hankin


Adaikalavan Ramasamy wrote:
I agree! The best way to learn (and remember for longer) is to teach 
someone else about it.


And there is not reason not to repeat some of the anlysis done on SAS 
with R. That way you can verify your outputs or compare the 
presentations. If you consistently find differences in the outputs, 
then trying to figure out the reason may lead you to better understand 
the methods (e.g. different optimization or estimation procedures).




My take on this:

I have repeatedly found that it is surprisingly easy to improve on 
existing (non-R) implementations

of statistical and non-statistical computation, when working  in R.

Something about the structure of the language, something about the 
package mechanism,
something about R-help, something about R-core, something about 
open-source, something
about JSS or R-news, whatever it is, there is SOMETHING ABOUT R which 
lends itself
to straightforward production of quality software.  And that something 
is missing from other

programming languages, IMO.



rksh




Regards, Adai



Barry Rowlingson wrote:

2008/9/19 Wensui Liu <[EMAIL PROTECTED]>:

Dear Listers,

I've been a big fan of R since graduate school. After working in the
industry for years, I haven't had many opportunities to use R and am 
mainly
using SAS. However, I am still forcing myself really hard to stay 
close to R
by reading R-help and books and writing R code by myself for fun. 
But by and
by, I start realizing I have hard time to keep up with R and am 
afraid that

I would totally forget how to program in R.

I really like it and am very unwilling to give it up. Is there any 
idea how
I might keep touch with R without using it in work on daily basis? I 
really

appreciate it.



--
Robin K. S. Hankin
Senior Research Associate
Cambridge Centre for Climate Change Mitigation Research (4CMR)
Faculty of Economics
The University of Cambridge
[EMAIL PROTECTED]
01223-764877

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why isn't R recognising integers as numbers?

2008-09-22 Thread Ted Harding

Hi Ted (from Ted),
Just to clarify Marc's comments about dataframes in more basic terms.

If you read in data with read.csv() the result returned by the function
is a dataframe. This is a specialised kind of list, which you can think
of as a list of "columns" all of the same length. You can think of each
"column" as a vector of elements, all of which must be of the same type
within the column, though the type can vary (e.g. numeric, factor,
character) between columns. When you display a dataframe, it looks like
a matrix, though in R terms it is not really a matrix; it is a list,
where each component of the list is a "column".

Of course a dataframe, like any list, might have only one component.
But it is still a list -- and the actual contents are only available
"one layer down", after you have extracted that component by some
means (e.g. by using the "$" extractor). Simple example:

  L <- c(1,2,3,4) ## vector
  L
# [1] 1 2 3 4
  L.df <- data.frame(L=L) ## Dataframe with 1 component named "L"
  L.df
#   L
# 1 1
# 2 2
# 3 3
# 4 4
  L.df$L  ## Extract the component named "L"
# [1] 1 2 3 4 ## Compare with the result of 'L' above

# Try a regression on L (this works):
  lm(L ~ 1)
# Call:
# lm(formula = L ~ 1)
# Coefficients:
# (Intercept)  
# 2.5  

# Try a regression on L.df (this doesn't work):
  lm(L.df ~ 1)
# Error in model.frame.default(formula = L.df ~ 1,
#   drop.unused.levels = TRUE) : 
#   invalid type (list) for variable 'L.df'

# But it does after you refer to the component L by name:
  lm(L.df$L ~ 1)
# Call:
# lm(formula = L.df$L ~ 1)
# Coefficients:
# (Intercept)  
# 2.5  

# or:
  lm(L ~ 1, data=L.df)
# Call:
# lm(formula = L ~ 1, data = L.df)
# Coefficients:
# (Intercept)  
# 2.5  

# But you can (for a dataframe, not a general list) use an "index"
method of extraction *as if* it were a matrix (even though it isn't):

  L.df[,1]
# [1] 1 2 3 4
  L.df[3,1]
# [1] 3

# But compare with:
  L.df[1]
#   L
# 1 1
# 2 2
# 3 3
# 4 4

which is essentially the same as L.df itself (e.g. lm(L.df[1] ~ 1)
will not work in exactly the same way as lm(L.df ~ 1) didn't work).

The dataframe structure exists in R because so much data is typically
in the row by column (case by variables) layout such as you get in
spreadsheets and associated CSV files, and it is very useful to be
able to get into this layout directly (and refer to the variables
by name, as above).

The full generality of a 'list' can also be useful for encapsulating
data of a less strictly structured kind, but that is another (longer)
story!

Helping this helps.
Ted.

On 22-Sep-08 02:09:29, Ted Byers wrote:
> Thanks Marc,
> That was it. 
> 
> For the last 30 years, I'd write my own code, in FORTRAN, C++,
> or even Java, to do whatever statistical analysis I needed.
> When at the office, sometimes I could use SAS, but that hasn't
> been an option for me in years.
> 
> This is the first time I have had to load real data into R
> (instead of generating random data to use while playing with
> some of the stats functions, or manually typing dummy data).
> 
> I take it, then, that the result of loading data is a data
> frame, and notjust a matrix or array. Using something like
> "refdata18[, 1]" feels rather alien, but I'm sure I'll quickly
> get used to it.  I'd seen it before in the R docs, but it didn't
> register that I had to use it to get the functions of most
> interest to me to recognise my data as a vector of numbers,
> given I'd provided only a vector of integers as input.
> 
> Thanks
> 
> Ted
> 
> 
> Marc Schwartz wrote:
>> 
>> on 09/21/2008 08:01 PM Ted Byers wrote:
>>> I have a number of files containing anywhere from a few dozen to a
>>> few
>>> thousand integers, one per record.
>>> 
>>> The statement "refdata18 =
>>> read.csv("K:\\MerchantData\\RiskModel\\Capture.Week.18.csv", header =
>>> TRUE,na.strings="")" works fine, and if I type refdata18, I get the
>>> integers
>>> displayed, one value per record (along with a record number). 
>>> However,
>>> when
>>> I try " fitdistr(refdata18,"negative binomial")", or
>>> hist.scott(refdata18,
>>> prob = TRUE), I get an error:
>>> 
>>> Error in fitdistr(refdata18, "negative binomial") : 
>>>   'x' must be a non-empty numeric vector
>>> Or
>>> Error in hist.default(x, nclass.scott(x), prob = prob, xlab = xlab,
>>> ...)
>>> : 
>>>   'x' must be numeric
>>> 
>>> How can it not recognise integers as numbers?
>>> 
>>> Thanks
>>> 
>>> Ted
>> 
>> 'refdata18' is a data frame and the two functions are expecting a
>> numeric vector.
>> 
>> If you use:
>> 
>>   fitdistr(refdata18[, 1], "negative binomial")
>> 
>> or
>> 
>>   hist(refdata18[, 1])
>> 
>> you should get a suitable result, presuming that the first column in
>> the
>> data frame is a numeric vector.
>> 
>> Use:
>> 
>>   str(refdata18)
>> 
>> to get a sense for the structure of the data frame, including the
>> column
>> names, which you could then use, instead of the above index based
>>

Re: [R] Manage huge database

2008/9/22 José E.  Lozano <[EMAIL PROTECTED]>:
>> I wouldn't call a 4GB csv text file a 'database'.

> It didn't help, sorry. I perfectly knew what a relational database is (and I
> humbly consider myself an advanced user on working with MSAccess+VBA, only
> that I've never face this problem with variables), you should not suppose
> everyone's stupid, though...

 Maybe you've not lurked on R-help for long enough :) Apologies!

A bit more googling tells me both MySQL and PostgreSQL have limits of
a few thousand on the number of columns in a table, not a few hundred
thousand. An insightful comment on one mailing list is:

"Of course, the real bottom line is that if you think you need more than
order-of-a-hundred columns, your database design probably needs revision
anyway ;-)"

 So, how much "design" is in this data? If none, and what you've
basically got is a 2000x50 grid of numbers, then maybe a more raw
binary-type format will help - HDF or netCDF? Although I'm not sure
how much R support for reading slices of these formats exists, you may
be able to use an external utility to write slices out on demand.
Random access to parts of these files is pretty fast.

http://cran.r-project.org/web/packages/RNetCDF/index.html
http://cran.r-project.org/web/packages/hdf5/index.html

 Thinking back to your 4GB file with 1,000,000,000 entries, that's
only 3 bytes per entry (+1 for the comma). What is this data? There
may be more efficient ways to handle it.

 Hope *that* helps...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database


> Maybe you've not lurked on R-help for long enough :) Apologies!

Probably.

> So, how much "design" is in this data? If none, and what you've
> basically got is a 2000x50 grid of numbers, then maybe a more raw

Exactly, raw data, but a little more complex since all the 50 variables
are in text format, so the width is around 2,500,000.

> http://cran.r-project.org/web/packages/RNetCDF/index.html
> http://cran.r-project.org/web/packages/hdf5/index.html

Thanks, I will check. Right now I am reading line by line the file. It's
time consuming, but since I will do it only once, just to rearrange the data
into smaller tables to query, it's ok.

> Thinking back to your 4GB file with 1,000,000,000 entries, that's
> only 3 bytes per entry (+1 for the comma). What is this data? There
> may be more efficient ways to handle it.

Is genetic DNA data (individuals genotyped), hence the large amount of
columns to analyze.

Best Regards,
Jose Lozano
--
Jose E. Lozano Alonso
Observatorio de Salud Pública.
Direccion General de Salud Pública e I+D+I.
Junta de Castilla y León.
Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] matrix balancing on margins

2008-09-22 Thread PALMIER Patrick - CETE NP/INFRA/TRF


Hello,

Is there any package in R for balancing matrix

I want to estimate a matrix with

   *  a initial matrix (1 everywhere for example)
   * Row margin
   * Col margin
   * distance class  vector  (each cell of the matrix  belong to a
 distance class) and I want that the distance class repartition
 will be preserved

How can I do such thing?
Is there any function already existing or should I compute an iterative 
script myself?


Thanks
--

**

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008-09-22 Thread jim holtman

What are you going to do with the data once you have read it in?  Are
all the data items numeric?  If they are numeric, you would need at
least 8GB to hold one copy and probably a machine with 32GB if you
wanted to do any manipulation on the data.

You can use a 'connection' and 'scan' to read the data in chunks and
then store it in a more accessible format.  A lot would depend on your
answer to my first question.

On Mon, Sep 22, 2008 at 6:26 AM, José E. Lozano <[EMAIL PROTECTED]> wrote:
>
> > Maybe you've not lurked on R-help for long enough :) Apologies!
>
> Probably.
>
> > So, how much "design" is in this data? If none, and what you've
> > basically got is a 2000x50 grid of numbers, then maybe a more raw
>
> Exactly, raw data, but a little more complex since all the 50 variables
> are in text format, so the width is around 2,500,000.
>
> > http://cran.r-project.org/web/packages/RNetCDF/index.html
> > http://cran.r-project.org/web/packages/hdf5/index.html
>
> Thanks, I will check. Right now I am reading line by line the file. It's
> time consuming, but since I will do it only once, just to rearrange the data
> into smaller tables to query, it's ok.
>
> > Thinking back to your 4GB file with 1,000,000,000 entries, that's
> > only 3 bytes per entry (+1 for the comma). What is this data? There
> > may be more efficient ways to handle it.
>
> Is genetic DNA data (individuals genotyped), hence the large amount of
> columns to analyze.
>
> Best Regards,
> Jose Lozano
> --
> Jose E. Lozano Alonso
> Observatorio de Salud Pública.
> Direccion General de Salud Pública e I+D+I.
> Junta de Castilla y León.
> Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008/9/22 José E.  Lozano <[EMAIL PROTECTED]>:

> Exactly, raw data, but a little more complex since all the 50 variables
> are in text format, so the width is around 2,500,000.

> Thanks, I will check. Right now I am reading line by line the file. It's
> time consuming, but since I will do it only once, just to rearrange the data
> into smaller tables to query, it's ok.

  A language like python, perl, or even awk might be able to help you
slice your data up.

> Is genetic DNA data (individuals genotyped), hence the large amount of
> columns to analyze.

 So is each line just ACCGTATAT etc etc?

 If you have fixed width fields in a file, so that every line is the
same length, then you can use random access methods to get to a
particular value - just multiply the line length by the row number you
want and add the column number. In R you can do this with seek() on a
connection. This should be fast because it seeks by bytes, instead of
having to scan all the comma-separated stuff. The only problem comes
when your data doesn't quite conform, and you can end up reading junk.
When doing this, it's a good idea to test your dataset first to make
sure the lines and fields are right.

Example with dummy.dna:

aaaccctttgggaaa
gattacagattacaa
aaacggg
gtgtggg
aac

 each line has 15 bases, and on my OS there's one additional invisible
character to mark the line end. Windows uses 2, but your data might
not be Windows format... So anyway, my multiplier is 16. Hence to get
a slice of the file of four columns from column 7 for some rows:

> dna=file("dummy.dna")
> open(dna,open="rb")
> for(r in 2:4){seek(dna,7+(r-1)*16);print(readChar(dna,4))}
[1] "gatt"
[1] ""
[1] ""

 The speed of this should be independent of the size of your data file.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rgl: How to position a window during open3d call

2008-09-22 Thread Koen Stegen

Duncan Murdoch wrote:
> This is fixed now on R-forge; eventually it will make it into the next
> rgl release on CRAN.  You should be able to download a binary of the
> development version from R-forge sooner.  Make sure you get version
> 0.81.706 or newer.

The R-forge version 0.81.706 works as advertised, both on Linux and Windows.
Thanks Duncan!

Koen Stegen
Royal Meteorological Institute of Belgium

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] auto.arima help.

2008-09-22 Thread rkevinburton

Hello,

I am calling the auto.arima method in the forecast package at it returns what 
seems to be valid Arima output. But when I feed this output to 'predict' I get:

Error in predict.Arima(catall.fit[[.index]], n.ahead = 12) : 
  'xreg' and 'newxreg' have different numbers of columns

Is there a way to tell what is being supplied to xreg from the Arima output? 

Any ideas?

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help for R

2008-09-22 Thread Uwe Ligges


Please read the posting guide an tell us:

- Which version of R
- Which OS?
- Which version of the "matlab" package (I guess you are using that one?)
- If Windows and a binary version of the matlab package: Does the binary 
it fit to your version of R?


Uwe Ligges




Mac wrote:

Dear R users£¬
   
  I've just started learning R and I'm having a problem with it. I was told as following when I tried to run R: 
   
  Error in loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = keep.source) : 
in 'matlab' methods specified for export, but none defined: sum, size, padarray, flipud, fliplr

Error: package/namespace load failed for 'matlab'
   
  Then I tried "package/load in package/matlab", however, the same message showed to me as above.
   
  I appreciate for any help and suggestion. Thanks.
   
  Kai


   
-

 ÑÅ»¢ÓÊÏä£¬ÄúµÄÖÕÉúÓÊÏä£¡
[[alternative HTML version deleted]]





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Joint maximum likelihood estimation for ordinal data

2008-09-22 Thread denn


Dear R users

>From what I understand, the joint maximum likelihood procedure for Rasch
(availabe in the package MiscPsycho) in R can only be used on binary data. 
I was wondering if the code is currently being adapted for application to
ordinal data?  I'm trying to replicate results obtained from Winsteps in R. 

Best wishes
denn
-- 
View this message in context: 
http://www.nabble.com/Joint-maximum-likelihood-estimation-for-ordinal-data-tp19606190p19606190.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Likelihood between observed and predicted response

2008-09-22 Thread Christophe LOOTS


Thank you so much for your help.

The function "dbinom" seems to work very well.

However, I'm a bit lost with the "dnorm" function.

Apparently, I have to compute the mean "mu" and the standard deviation 
"sd" but what does it mean exactly? I only have a vector of predicted 
response and a vector of observed response that I would like to compare!


What are "mu" and "sigma".

Thanks again.
Christophe


> Hi,
>
> I've two fitted models, one binomial model with presence-absence data
> that predicts probability of presence and one gaussian model (normal or
> log-normal abundances).
>
> I would like to evaluate these models not on their capability of
> adjustment but on their capability of prediction by calculating the
> (log)likelihood between predicted and observed values for each type of
> model.
>
> I found the following formula for Bernouilli model :
>
> -2 log lik = -2 sum (y*log phat + (1-y)*log(1-phat) ), with "phat" is
> the probaility (between 0 and 1) and "y" is the observed values (0 or 1).
>
> 1) Is anybody can tell me if this formula is statistically true?

  This looks correct.

> 2) Can someone tell me what is the formula of the likelihood between
> observed and predicted values for a gaussian model ?
>

   -2 L = sum( (x_i - mu_i)^2)/sigma^2 - 2*n*log(sigma) + C

assuming independence and equal variances:
but don't trust my algebra, see ?dnorm and take the log of the
likelihood shown there for yourself.
You're reinventing the wheel a bit here:

-2*sum(dbinom(y,prob=phat,size=1,log=TRUE))

and

-2*sum(dnorm(x,mean=mu,sd=sigma,log=TRUE))

will do what you want.

  Ben Bolker

--
Christophe LOOTS
PhD student - Hydrobiological modelling of fish habitats
Sea Fisheries Laboratory - IFREMER Boulogne sur Mer
150, Quai Gambetta. BP 699
62321 Boulogne sur Mer- FRANCE

Tél : +33(0)3 21 99 56 78
Fax : +33(0)3 21 99 56 01
E-mail : [EMAIL PROTECTED]
http://www.ifremer.fr/drvboulogne/labo/equipe.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Vincent Goulet


Matthew,

As per the CRAN Ubuntu README

http://cran.r-project.org/bin/linux/ubuntu/

install the Ubuntu r-base-dev package to compile R packages from  
sources.


Vincent

Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :


Hi,

I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
I tried getting Hmisc from within R by issuing the standard
'install.packages' command, but it said I needed 'gfortran' to
compile.  I thought I could circumvent this by using 'aptitude' to get
the package 'r-cran-hmisc', but when I got it, the package had
critical missing parts (got 404s).  So, I'll be trying to go back and
download 'gfortran', but can anybody tell me if this aptitude ubuntu
package should be kept up to date and is just currently overlooked?

Thanks,
Matt

--
It is from the wellspring of our despair and the places that we are
broken that we come to repair the world.
-- Murray Waas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Dirk Eddelbuettel

On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote:
> Matthew,
>
> As per the CRAN Ubuntu README
>
>   http://cran.r-project.org/bin/linux/ubuntu/
>
> install the Ubuntu r-base-dev package to compile R packages from  
> sources.

Well there should be a working r-cran-hmisc package.  You simply got a
'404' error indicating that your network access (using http) to the
external Ubuntu mirror was broken.   Fix that, or download the package
by hand.  It may be easier to just install the missing package.

That said, Vincent is of course entirely correct on the need for
r-base-dev.  

Dirk
  
>
> Vincent
>
> Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :
>
>> Hi,
>>
>> I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
>> I tried getting Hmisc from within R by issuing the standard
>> 'install.packages' command, but it said I needed 'gfortran' to
>> compile.  I thought I could circumvent this by using 'aptitude' to get
>> the package 'r-cran-hmisc', but when I got it, the package had
>> critical missing parts (got 404s).  So, I'll be trying to go back and
>> download 'gfortran', but can anybody tell me if this aptitude ubuntu
>> package should be kept up to date and is just currently overlooked?
>>
>> Thanks,
>> Matt
>>
>> -- 
>> It is from the wellspring of our despair and the places that we are
>> broken that we come to repair the world.
>> -- Murray Waas
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008-09-22 Thread Martin Morgan

"José E. Lozano" <[EMAIL PROTECTED]> writes:

>> Maybe you've not lurked on R-help for long enough :) Apologies!
>
> Probably.
>
>> So, how much "design" is in this data? If none, and what you've
>> basically got is a 2000x50 grid of numbers, then maybe a more raw
>
> Exactly, raw data, but a little more complex since all the 50 variables
> are in text format, so the width is around 2,500,000.
>
>> http://cran.r-project.org/web/packages/RNetCDF/index.html
>> http://cran.r-project.org/web/packages/hdf5/index.html
>
> Thanks, I will check. Right now I am reading line by line the file. It's
> time consuming, but since I will do it only once, just to rearrange the data
> into smaller tables to query, it's ok.
>
>> Thinking back to your 4GB file with 1,000,000,000 entries, that's
>> only 3 bytes per entry (+1 for the comma). What is this data? There
>> may be more efficient ways to handle it.
>
> Is genetic DNA data (individuals genotyped), hence the large amount of
> columns to analyze.

The Bioconductor package snpMatrix is designed for this type of
data. See

http://www.bioconductor.org/packages/2.2/bioc/html/snpMatrix.html

and if that looks promising

> source('http://bioconductor.org/biocLite.R')
> biocLite('snpMatrix')

Likely you'll quickly want a 64 bit (linux or Mac) machine.

Martin

> Best Regards,
> Jose Lozano
> --
> Jose E. Lozano Alonso
> Observatorio de Salud Pública.
> Direccion General de Salud Pública e I+D+I.
> Junta de Castilla y León.
> Direccion: Paseo de Zorrilla, nº1. Despacho 3103. CP 47071. Valladolid.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] use of system() under Linux

2008-09-22 Thread Rainer M Krug

H

I want to use the system() command to execute a command and have to
return the result in a r-variable, so I an using intern=TRUE.

On the other hand, I want to evaluate the return value of the command,
to determine if the command was successful.

According to the help, these to objectives are exclusive, either the
one or the other. Is this true, or is there another way of
accomplishing this?

My prefered return value would be a list, consisting of thre entries:
return code of the command
stderr
and the result

Thanks

Rainer



-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Faculty of Science
Natural Sciences Building
Private Bag X1
University of Stellenbosch
Matieland 7602
South Africa

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Likelihood between observed and predicted response

2008-09-22 Thread Ben Bolker

Christophe LOOTS  ifremer.fr> writes:

> 
> Thank you so much for your help.
> 
> The function "dbinom" seems to work very well.
> 
> However, I'm a bit lost with the "dnorm" function.
> 
> Apparently, I have to compute the mean "mu" and the standard deviation 
> "sd" but what does it mean exactly? I only have a vector of predicted 
> response and a vector of observed response that I would like to compare!
> 
> What are "mu" and "sigma".
> 

  mu is the mean (which you might as well set to the
predicted value).  sd is the standard deviation; in order
to calculate the likelihood in this case, you'll need an
*independent* estimate (from somewhere) of the standard
deviation.  Without thinking about it too carefully I think
you could probably get this from sqrt(sum((predicted-observed)^2)/(n-1))

> Thanks again.
> Christophe
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding layers in ggplot2 (data and code included)

Hi Juliet,

On Sun, Sep 21, 2008 at 11:47 PM, Juliet Hannah <[EMAIL PROTECTED]> wrote:
> Here is some sample data:
>
> mydata <- read.table(textConnection("Est GroupTri
>   00 4.639644
>   10 4.579189
>   20 4.590714
>   01 4.443696
>   11 4.588243
>   21 4.650505
>   02 4.296608
>   12 4.826036
>   22 4.765386"),header=TRUE);
>  closeAllConnections();
>
> I can form two plots, scatter and  lines, as follows:
>
> p <- ggplot(mydata, aes(x=Est, y=Tri))
> p + geom_point(aes(colour=factor(Group),shape=factor(Group)))
>
> and
>
> p+ geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F).
>
> However, I am unable to have the plots together.
>
> I obtain the following error:
>
>> p + 
>> geom_point(aes(colour=factor(Group),shape=factor(Group)))+geom_smooth(aes(group=factor(Group),color=factor(Group)),method=lm,se=F)
> Error in `[.data.frame`(df, , var) : undefined columns selected

Are you using R 2.7.2?  Something in R changed between R 2.7.1 and R
2.7.2 that breaks certain ggplot plots (you code works fine for me
without modification).  It's on my to do list to fix.

You can also simplify your code a little by relying on defaults set in
the ggplot() call:

ggplot(mydata, aes(Est, Tri, colour = factor(Group))) +
 geom_point(aes(shape = factor(Group))) +
 geom_smooth(method = lm, se = F)

(Andpleaseusespacesotherwiseitsveryhardtoreadyourcode)

Hadley

>
> Thanks,
>
> Juliet
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SmoothScatter plot range issue

2008-09-22 Thread Jason Pare

Hello,

I am attempting to use smoothScatter to plot a heatmap of locations of
events in an x-y axis. When I plot the heatmap without passing xlim and ylim
parameters, it fills the plot area but the perspective is a bit skewed. I
would like to standardize these plots to a uniform window size that does not
depend on the range of values in the dataframe. However, when I resize the
plot using xlim or ylim, there is a light blue background that surrounds the
immediate area of the data (correspnding to the range of the points listed
in the dataframe), surrounded by extra white space for the new xlim and ylim
values I have added. Some of the rings around the datapoints are also cut
off at the margins.

I would like to stop the plot from being cut off, and want this light blue
"range" to extend throughout the entire area of the resized plot. I have
attempted to add NAs, but it has no effect on expanding this light blue plot
area. Code is below.

 xyz is a dataframe containing two columns with corresponding x and y
values

library(geneplotter)
library(RColorBrewer)

layout(matrix(1:1, ncol=2, byrow=TRUE))

smoothScatter(xyz, nrpoints=0, xlim=c(-3,3),
ylim=c(0,5),colramp=colorRampPalette(c("#f8f8ff", "white",
"#736AFF", "cyan", "yellow", "#F87431", "#FF7F00", "red",
"#7E2217")))

###END

Thanks very much for any help,

Jason

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] paste with list

2008-09-22 Thread Antje


Hello,

I guess the solution is rather simple but whatever I tried, I don't manage to 
get the result as I want to have it:


I have several vectors of equal length in a list and I'd like to combine all 
first elements to a single string, all second elements to a single string, ..., 
all n-th elements to a single string.


# Example code (how it should look like):
t1 <- c(1,2,3)
t2 <- c(3.4,5.5,1.1)
paste(t1,t2, sep="\t")

# and now how the data is available
tl <- list(t1,t2)
??? what do I have to do to get the same output ???

Can anybody help me?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] zoo: hourly values (local time) not unique

2008-09-22 Thread vwl-mailingliste

Hi!

I've got a time series as a zoo object which contains hourly values. My problem 
is that these values occur in every "real" hour with regard to daylight savings 
time. I.e. the last sunday in march, i'll have 23values whereas the last sunday 
in october contains 25 values instead of 24. 
Thus if I try to aggregate the data using for example tapply (e.g. to get a 
monthly mean), I get the error 

"some methods for "zoo" objects do not work if the index entries in 'order.by' 
are not unique"

Any idea how I can solve this without having to remove/add an hour each year 
manually? Or, as I'm quite new to R, how I could easily manipulate my data so 
that the "missing" hour is introduced and the "double" hour is cut from the 
data (and the index)?

I'd really appreciate your help! Thanks in advance,
Arne
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] paste with list

2008-09-22 Thread Dimitris Rizopoulos


try this:

t1 <- c(1, 2, 3)
t2 <- c(3.4, 5.5, 1.1)
tl <- list(t1, t2)

do.call("paste", c(tl, sep = "\t"))


I hope it helps.

Best,
Dimitris


Antje wrote:

Hello,

I guess the solution is rather simple but whatever I tried, I don't 
manage to get the result as I want to have it:


I have several vectors of equal length in a list and I'd like to combine 
all first elements to a single string, all second elements to a single 
string, ..., all n-th elements to a single string.


# Example code (how it should look like):
t1 <- c(1,2,3)
t2 <- c(3.4,5.5,1.1)
paste(t1,t2, sep="\t")

# and now how the data is available
tl <- list(t1,t2)
??? what do I have to do to get the same output ???

Can anybody help me?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] paste with list

2008-09-22 Thread Henrique Dallazuanna

Try this:

paste(tl[[1]], tl[[2]], sep="\t")

On Mon, Sep 22, 2008 at 11:08 AM, Antje <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I guess the solution is rather simple but whatever I tried, I don't manage
> to get the result as I want to have it:
>
> I have several vectors of equal length in a list and I'd like to combine all
> first elements to a single string, all second elements to a single string,
> ..., all n-th elements to a single string.
>
> # Example code (how it should look like):
> t1 <- c(1,2,3)
> t2 <- c(3.4,5.5,1.1)
> paste(t1,t2, sep="\t")
>
> # and now how the data is available
> tl <- list(t1,t2)
> ??? what do I have to do to get the same output ???
>
> Can anybody help me?
>
> Antje
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] paste with list

2008-09-22 Thread Antje


Great! That's exactly what I was looking for.
(I see, I still have to learn a lot...)

Thank you!

Antje



Dimitris Rizopoulos schrieb:

try this:

t1 <- c(1, 2, 3)
t2 <- c(3.4, 5.5, 1.1)
tl <- list(t1, t2)

do.call("paste", c(tl, sep = "\t"))


I hope it helps.

Best,
Dimitris


Antje wrote:

Hello,

I guess the solution is rather simple but whatever I tried, I don't 
manage to get the result as I want to have it:


I have several vectors of equal length in a list and I'd like to 
combine all first elements to a single string, all second elements to 
a single string, ..., all n-th elements to a single string.


# Example code (how it should look like):
t1 <- c(1,2,3)
t2 <- c(3.4,5.5,1.1)
paste(t1,t2, sep="\t")

# and now how the data is available
tl <- list(t1,t2)
??? what do I have to do to get the same output ???

Can anybody help me?

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] zoo: hourly values (local time) not unique

See question #1 in the zoo faq:

library(zoo)
vignette("zoo-faq")

Also in the upcoming zoo 1.6-0, not yet on CRAN but in the development version
at R-Forge found here:

http://r-forge.r-project.org/projects/zoo/

there are a set of make.unique functions and a make.unique= argument in
read.zoo which will provide additional capabilities for uniquifying series.

On Mon, Sep 22, 2008 at 10:13 AM,  <[EMAIL PROTECTED]> wrote:
> Hi!
>
> I've got a time series as a zoo object which contains hourly values. My 
> problem is that these values occur in every "real" hour with regard to 
> daylight savings time. I.e. the last sunday in march, i'll have 23values 
> whereas the last sunday in october contains 25 values instead of 24.
> Thus if I try to aggregate the data using for example tapply (e.g. to get a 
> monthly mean), I get the error
>
> "some methods for "zoo" objects do not work if the index entries in 
> 'order.by' are not unique"
>
> Any idea how I can solve this without having to remove/add an hour each year 
> manually? Or, as I'm quite new to R, how I could easily manipulate my data so 
> that the "missing" hour is introduced and the "double" hour is cut from the 
> data (and the index)?
>
> I'd really appreciate your help! Thanks in advance,
> Arne
> --
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time series (ts) questions.

Try this to append 100 to the end of the series, say:

tt <- ts(1:12, frequency=5) # sample data
ts(c(tt, 100), start = start(tt), frequency = frequency(tt))


On Mon, Sep 22, 2008 at 2:17 AM,  <[EMAIL PROTECTED]> wrote:
> I have been working with the base time series object (ts) and I had a couple 
> of questions that hopefully this group can help me with:
>
> 1) What is the best why to append an observation to an existing time-series? 
> Suppose I have a time series:
>
> t <- ts(1:12, frequency=5)
>
> This would generate two complete cycles and one remainder. Now I would like 
> to append an observation to this time series. I could use 'c' but then I 
> would need to rebuild the whole time series and I would need to know the 
> frequency etc. I would like some operation like '+' that would simply append 
> the value to the end of the time series (incrementing the 'las time value so 
> thing like cycle() still output the correnct values) but alas
>
> t + 10
>
> is already taken as an equally useful operation by adding 10 to each element 
> in the time series (rather than in thie case, appending ts(10,frequency) with 
> a time value of 13 to the time series).
>
> 2) How is the best way to get the last time value in a time series? I can do 
> something like:
>
> (start(t)[2] - 1) + (end(t)[1]-1) * frequency(t) + end(t)[2]
>
> But there has to be an easier way.
>
> Thank you.
>
> Kevin
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

> What are you going to do with the data once you have read it in?  Are
> all the data items numeric?  If they are numeric, you would need at
> least 8GB to hold one copy and probably a machine with 32GB if you
> wanted to do any manipulation on the data.

Well, I will use only sets of variables to analyze, I cant manage the full
50 variables at a time, of course. So each time I make an analysis I
will extract the information I need, so that's why I wanted an easy way to
extract parts of the file.

Best regards,
Jose Lozano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

> So is each line just ACCGTATAT etc etc?

Exacty, A_G, A_A, G_G and the such.

> If you have fixed width fields in a file, so that every line is the
> same length, then you can use random access methods to get to a
> particular value - just multiply the line length by the row number you

Nice hint! I didnt think on this. But I fear that if I have missing values
on the file I wont be able to read the right information...

> When doing this, it's a good idea to test your dataset first to make
> sure the lines and fields are right.

Yes, I am trying to figure out if all the lines have the exact same lenght
to use a random access method to read it.

Thanks,
Jose Lozano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Combine data frames using column names as "key"

2008-09-22 Thread jimineep


Hi guys,

Suppose I have 2 data frames ie:
 values
one0.32
two0.25
three  0.11

and 
 values
two0.66
one0.74
three  0.19

nb the first column is the row names in both cases

How can I combine them on the row names column? Ie to make something like


 values.1 values.2
one0.32   0.74
two0.25   0.66
three  0.11   0.19

I guess its data.frame or c.bind but I keep getting errors when I try to
combine them on row names...

Many many thanks,

Jim
-- 
View this message in context: 
http://www.nabble.com/Combine-data-frames-using-column-names-as-%22key%22-tp19609173p19609173.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Combine data frames using column names as "key"

2008-09-22 Thread Henrique Dallazuanna

Try:

data.frame(merge(df1, df2, by = "row.names"), row.names = 1)

On Mon, Sep 22, 2008 at 12:34 PM, jimineep <[EMAIL PROTECTED]> wrote:
>
> Hi guys,
>
> Suppose I have 2 data frames ie:
> values
> one0.32
> two0.25
> three  0.11
>
> and
> values
> two0.66
> one0.74
> three  0.19
>
> nb the first column is the row names in both cases
>
> How can I combine them on the row names column? Ie to make something like
>
>
> values.1 values.2
> one0.32   0.74
> two0.25   0.66
> three  0.11   0.19
>
> I guess its data.frame or c.bind but I keep getting errors when I try to
> combine them on row names...
>
> Many many thanks,
>
> Jim
> --
> View this message in context: 
> http://www.nabble.com/Combine-data-frames-using-column-names-as-%22key%22-tp19609173p19609173.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008-09-22 Thread jim holtman

Why don't you make one pass through your data and encode you
characters as integers (it would appear that you only have 16
combinations).  You might also want to consider using the 'raw' object
since these only take up one byte of storage -- will reduce your
storage requirements by 4.  Then store each row in a 'filehash' object
so you can quickly retrieve a row at a time and then index directly to
the byte(s) that have the information that you want.

On Mon, Sep 22, 2008 at 7:00 AM, José E. Lozano <[EMAIL PROTECTED]> wrote:
>> So is each line just ACCGTATAT etc etc?
>
> Exacty, A_G, A_A, G_G and the such.
>
>> If you have fixed width fields in a file, so that every line is the
>> same length, then you can use random access methods to get to a
>> particular value - just multiply the line length by the row number you
>
> Nice hint! I didn't think on this. But I fear that if I have missing values
> on the file I wont be able to read the right information...
>
>> When doing this, it's a good idea to test your dataset first to make
>> sure the lines and fields are right.
>
> Yes, I am trying to figure out if all the lines have the exact same lenght
> to use a random access method to read it.
>
> Thanks,
> Jose Lozano
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008/9/22 jim holtman <[EMAIL PROTECTED]>:
> Why don't you make one pass through your data and encode you
> characters as integers (it would appear that you only have 16
> combinations).  You might also want to consider using the 'raw' object
> since these only take up one byte of storage -- will reduce your
> storage requirements by 4.  Then store each row in a 'filehash' object
> so you can quickly retrieve a row at a time and then index directly to
> the byte(s) that have the information that you want.

 My original response of specifying a relational database now seems
somewhat comical :)

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Re lative novice: Working with fitdistr(MASS): 3 questions

2008-09-22 Thread Ted Byers

OK, I am now at the point where I can use fitdistr to obtain a fit of one of
the standard distributions to mydata.

It is quite remarkable how different the parameters are for different
samples through from the same system. Clearly the system itself is not
stationary.

Anyway, question 1: I require a visual perspective of the fit I get. I can
use hist.scott to get a hisogram (and just have to figure out how to get
finer granularity from it - my samples are taken weekly, but the histogram
bars cover two weeks of data and the most interesting changes happen in the
first three to four weeks - after that things slow down tremendously), but
how would I overlay a plot of the best distribution I get from fitdistr over
it?

Second question: I don't see anything in the documentation for fitdistr that
says anything about using the distribution obtained to integrate the
distribution over some range of values. I get weekly sampled, and for each
sample I get a certain number of events each week for about three months. I
need to be able to use the distribution to estimate the number of such
events next week or the week after, and how long it will be that the
probability of such an event is so low that no more of them are likely to be
observed from that sample ever. What package or functions should I be
looking at here to get this done?

Third question: I see nothing in the docs about non-central distributions.
The distribution most likely to fit is cauchy, but we know that there is
skew that depends on the magnitude: large positive deviates are more common
that large negative deviates, but extremely large positive deviates are less
common that extremely large negative deviates. What we don't know is how
significant such skewness is for the overall distribution. How can I assess
this, or can I assess this, using fitdistr (or some other function I haven't
found yet)?

Thanks

Ted
--
View this message in context:
http://www.nabble.com/Relative-novice%3A-Working-with-fitdistr%28MASS%29%3A-3-questions-tp19610812p19610812.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008-09-22 Thread Ted Harding

On 22-Sep-08 11:00:30, José E. Lozano wrote:
>> So is each line just ACCGTATAT etc etc?
> 
> Exacty, A_G, A_A, G_G and the such.
> 
>> If you have fixed width fields in a file, so that every line is the
>> same length, then you can use random access methods to get to a
>> particular value - just multiply the line length by the row number you
> 
> Nice hint! I didnt think on this. But I fear that if I have missing
> values on the file I wont be able to read the right information...
> 
>> When doing this, it's a good idea to test your dataset first to make
>> sure the lines and fields are right.
> 
> Yes, I am trying to figure out if all the lines have the exact same
> lenght to use a random access method to read it.

If you were using Linux, I would suggest a command on the lines of

  cat filename | awk '{print(length($0))}'

which would give you the length of each line. But since you have
around 2000 lines, to simply check whether they all have the same
length (in bytes/characters) you can extend the above to

  cat filename | awk '{print(length($0))}' | sort -u

which will present you with all the different line-lengths. If they
are all the same length you will get one number.

I just tested this on a file with lines exceeding 500,000 characters
in length, and it worked perfectly well even for such long lines.

Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Sep-08   Time: 17:03:21
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to set rownames / colnames for matrices in a list

2008-09-22 Thread Antje


Hello,

I have another stupid question. I hope you can give me a hint how to solve this:

I have a list and one element is again a list containing matrices, all of the 
same dimensions. Now, I'd like to set the dimnames for all matrices:


example code:

m1 <- matrix(1:25, nrow=5)
m2 <- matrix(26:50, nrow=5)
# ... there can be much more than two matrices

l <- list()
l[[1]] <- list(m1,m2)

r_names <- LETTERS[1:5]
c_names <- LETTERS[6:10]

? how can I apply these names to any number of matrices within this list-list ?

Ciao,
Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using wildcards in subsets

2008-09-22 Thread Daniel Münch

Hi there,

I am looking for a way to use wildcards in a subset, this is not
working:


subset(data, colname-1=="value"&colname2=="value*",
select=colx:coly)


is there a way to use wildcards here?

Thanks for your help,
Daniel

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to set rownames / colnames for matrices in a list

2008-09-22 Thread Alain Guillet


Hi,

If all your matrices have the same size, you should work with an array 
and not with a list. Then you can use dimnames to set the names of the 
rows, columns, and so on..


Alain

Antje wrote:

Hello,

I have another stupid question. I hope you can give me a hint how to 
solve this:


I have a list and one element is again a list containing matrices, all 
of the same dimensions. Now, I'd like to set the dimnames for all 
matrices:


example code:

m1 <- matrix(1:25, nrow=5)
m2 <- matrix(26:50, nrow=5)
# ... there can be much more than two matrices

l <- list()
l[[1]] <- list(m1,m2)

r_names <- LETTERS[1:5]
c_names <- LETTERS[6:10]

? how can I apply these names to any number of matrices within this 
list-list ?


Ciao,
Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Alain Guillet
Statistician and Computer Scientist

Institut de statistique - Université catholique de Louvain
Bureau d.126
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium

tel: +32 10 47 30 50

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help creating spatial correlation for MC simulation

2008-09-22 Thread jjh21


Thank you for the input.

Which command in the spatstat package am I looking for? The documentation is
unclear to me.


milton ruser wrote:
> 
> Dear J.J.Harden
> 
> I think that on spatial stat you will find several ways of simulate
> spatial
> pattern that (point or line) that may be what you are looking for. Case
> not,
> please let me know and may be we can improve some solution.
> 
> Best wishes,
> 
> miltinho astronauta
> brazil
> 
> 
> 
> On Wed, Sep 17, 2008 at 7:36 PM, jjh21 <[EMAIL PROTECTED]> wrote:
> 
>>
>> I want to create a dataset in R with spatial correlation (i.e.
>> clustering)
>> built in for a linear regression analysis. Any tips on how to do this?
>> Thanks.
>> --
>> View this message in context:
>> http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19542145.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Need-help-creating-spatial-correlation-for-MC-simulation-tp19542145p19610885.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SmoothScatter plot range issue

2008-09-22 Thread Henrik Bengtsson

Hi,

Bioconductor.org is the home of the geneplotter package.  You get a
quicker response if you ask there.

/Henrik

On Mon, Sep 22, 2008 at 7:06 AM, Jason Pare <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I am attempting to use smoothScatter to plot a heatmap of locations of
> events in an x-y axis. When I plot the heatmap without passing xlim and ylim
> parameters, it fills the plot area but the perspective is a bit skewed. I
> would like to standardize these plots to a uniform window size that does not
> depend on the range of values in the dataframe. However, when I resize the
> plot using xlim or ylim, there is a light blue background that surrounds the
> immediate area of the data (correspnding to the range of the points listed
> in the dataframe), surrounded by extra white space for the new xlim and ylim
> values I have added. Some of the rings around the datapoints are also cut
> off at the margins.
>
> I would like to stop the plot from being cut off, and want this light blue
> "range" to extend throughout the entire area of the resized plot. I have
> attempted to add NAs, but it has no effect on expanding this light blue plot
> area. Code is below.
>
>  xyz is a dataframe containing two columns with corresponding x and y
> values
>
> library(geneplotter)
> library(RColorBrewer)
>
> layout(matrix(1:1, ncol=2, byrow=TRUE))
>
> smoothScatter(xyz, nrpoints=0, xlim=c(-3,3),
> ylim=c(0,5),colramp=colorRampPalette(c("#f8f8ff", "white",
> "#736AFF", "cyan", "yellow", "#F87431", "#FF7F00", "red",
> "#7E2217")))
>
> ###END
>
> Thanks very much for any help,
>
> Jason
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Statistical question re assessing fit of distribution functions.

2008-09-22 Thread Ted Byers


I am in a situation where I have to fit a distrution, such as cauchy or
normal, to an empirical dataset.  Well and good, that is easy.

But I wanted to assess just how good the fit is, using ks.test.

I am concerned about the following note in the docs (about the example
provided):  "Note that the distribution theory is not valid here as we have
estimated the parameters of the normal distribution from the same sample"

This implies I should not use ks.test(x,"pnorm",mean =1.187, sd =0.917),
where the numbers shown are estimated from 'x'.  If this is so, how do I get
a correct test?  I know I can not use different samples because of just how
different the parameters are from one sample to the next, so using
parameters estimated from the sample from week one to define the
distribution function for ks.test will give a poor fit for the data from
week two.  And the sample size is small enough that I would not have
confidence in the parameters estimated from a portion of a samlpe to fit
against the remainder of the sample.

Thanks

Ted

-- 
View this message in context: 
http://www.nabble.com/Statistical-question-re-assessing-fit-of-distribution-functions.-tp19611539p19611539.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

Try this:

read.table(pipe("/Rtools/bin/gawk -f cut.awk bigdata.dat"))

where cut.awk contains the single line (assuming you
want fields 101 through 110 and none other):

{ for(i = 101; i <= 110; i++) printf("%s ", $i); printf "\n" }

or just use cut.  I tried the gawk command above on Windows
Vista with an artificial file of 500,000 columns and 2 rows and it seemed
instantaneous.

On Windows the above uses gawk from Rtools available at:
   http://www.murdoch-sutherland.com/Rtools/
or you can separately install gawk.  Rtools also has cut if you
prefer that.

On Mon, Sep 22, 2008 at 2:50 AM, José E. Lozano <[EMAIL PROTECTED]> wrote:
> Hello,
>
>
>
> Recently I have been trying to open a huge database with no success.
>
>
>
> It's a 4GB csv plain text file with around 2000 rows and over 500,000
> columns/variables.
>
>
>
> I have try with The SAS System, but it reads only around 5000 columns, no
> more. R hangs up when opening.
>
>
>
> Is there any way to work with "parts" (a set of columns) of this database,
> since its impossible to manage it all at once?
>
>
>
> Is there any way to establish a link to the csv file and to state the
> columns you want to fetch every time you make an analysis?
>
>
>
> I've been searching the net, but found little about this topic.
>
>
>
> Best regards,
>
> Jose Lozano
>
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] changing the text offset for axis labels

2008-09-22 Thread Arthur Roberts


Hi, all,

I was wondering if there is a way to change the offset of axis labels  
from the axis.  In other words, I need the axis labels closer to the  
acis than the default.  Thanks for the help.


Best wishes,
Art Roberts
University of Washington
Seattle, WA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] as.day() Function (zoo question)

2008-09-22 Thread stephen sefick

I am was going to look at the as.yearmon function in the zoo package
and write a as.day function to aggregate a time series of 96
observations per day into the mean for each day, but I don't know how
to look at the code so that I can convert it into something I can use.
 On top of that I believe that it is probably an S3 method and I
haven't quite gotten that far in my programming experience.

How I want the mean for each day.  the real data set has NA s randomly
interspersed.

library(chron)
library(zoo)
t1 <- chron("1/1/2006", "00:00:00")
t2 <- chron("12/31/2006", "23:45:00")
deltat <- times("00:15:00")
tt <- seq(t1, t2, by = times("00:15:00"))
value <- rnorm(35040)
z <- zoo(value, tt)

thanks

-- 
Stephen Sefick
Research Scientist
Southeastern Natural Sciences Academy

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods. We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.day() Function (zoo question)

chron values are represented as day + fraction of a day so:
try this:

aggregate(z, floor, mean)

On Mon, Sep 22, 2008 at 12:56 PM, stephen sefick <[EMAIL PROTECTED]> wrote:
> I am was going to look at the as.yearmon function in the zoo package
> and write a as.day function to aggregate a time series of 96
> observations per day into the mean for each day, but I don't know how
> to look at the code so that I can convert it into something I can use.
>  On top of that I believe that it is probably an S3 method and I
> haven't quite gotten that far in my programming experience.
>
> How I want the mean for each day.  the real data set has NA s randomly
> interspersed.
>
> library(chron)
> library(zoo)
> t1 <- chron("1/1/2006", "00:00:00")
> t2 <- chron("12/31/2006", "23:45:00")
> deltat <- times("00:15:00")
> tt <- seq(t1, t2, by = times("00:15:00"))
> value <- rnorm(35040)
> z <- zoo(value, tt)
>
> thanks
>
> --
> Stephen Sefick
> Research Scientist
> Southeastern Natural Sciences Academy
>
> Let's not spend our time and resources thinking about things that are
> so little or so large that all they really do for us is puff us up and
> make us feel like gods. We are mammals, and have not exhausted the
> annoying little problems of being mammals.
>
>-K. Mullis
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.day() Function (zoo question)

2008-09-22 Thread stephen sefick

perfect thanks

On Mon, Sep 22, 2008 at 1:07 PM, Gabor Grothendieck
<[EMAIL PROTECTED]> wrote:
> chron values are represented as day + fraction of a day so:
> try this:
>
> aggregate(z, floor, mean)
>
> On Mon, Sep 22, 2008 at 12:56 PM, stephen sefick <[EMAIL PROTECTED]> wrote:
>> I am was going to look at the as.yearmon function in the zoo package
>> and write a as.day function to aggregate a time series of 96
>> observations per day into the mean for each day, but I don't know how
>> to look at the code so that I can convert it into something I can use.
>>  On top of that I believe that it is probably an S3 method and I
>> haven't quite gotten that far in my programming experience.
>>
>> How I want the mean for each day.  the real data set has NA s randomly
>> interspersed.
>>
>> library(chron)
>> library(zoo)
>> t1 <- chron("1/1/2006", "00:00:00")
>> t2 <- chron("12/31/2006", "23:45:00")
>> deltat <- times("00:15:00")
>> tt <- seq(t1, t2, by = times("00:15:00"))
>> value <- rnorm(35040)
>> z <- zoo(value, tt)
>>
>> thanks
>>
>> --
>> Stephen Sefick
>> Research Scientist
>> Southeastern Natural Sciences Academy
>>
>> Let's not spend our time and resources thinking about things that are
>> so little or so large that all they really do for us is puff us up and
>> make us feel like gods. We are mammals, and have not exhausted the
>> annoying little problems of being mammals.
>>
>>-K. Mullis
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



-- 
Stephen Sefick
Research Scientist
Southeastern Natural Sciences Academy

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods. We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading in results from system(). There must be an easier way...

2008-09-22 Thread Michael A. Gilchrist

Sorry, I misunderstood what I was doing and misspoke.  I don't think there's 
a bug.  I had called COMMAND w/in read.delim.


Thanks for all of your help and sorry for the misinformation.

Sincerely,

Mike

-
Department of Ecology & Evolutionary Biology
569 Dabney Hall
University of Tennessee
Knoxville, TN 37996-1610

phone:(865) 974-6453
fax:  (865) 974-6042

web: http://eeb.bio.utk.edu/gilchrist.asp
-


On Thu, 18 Sep 2008, Henrik Bengtsson wrote:


On Thu, Sep 18, 2008 at 1:39 PM, Michael A. Gilchrist <[EMAIL PROTECTED]> wrote:

Wow, that's elegant and simple.  It's also faster than my approach.

NB, you don't need to use close(), read.delim() closes the pipe when its
done reading.


If read.delim() close the connection in this case, it's a bug.  It
should only close the connection if it opens it.

/Henrik



Thank you all for your suggestions, they really helped me with this problem
and understand R just a bit better.

Sincerely,

Mike
-
Department of Ecology & Evolutionary Biology
569 Dabney Hall
University of Tennessee
Knoxville, TN 37996-1610

phone:(865) 974-6453
fax:  (865) 974-6042

web: http://eeb.bio.utk.edu/gilchrist.asp
-


On Fri, 12 Sep 2008, Prof Brian Ripley wrote:


Why not use

con <- pipe(COMMAND)
foo <- read.delim(con, colClasses="numeric")
close(con)

?  See the 'R Data Input/Output Manual'.

On Fri, 12 Sep 2008, Michael A. Gilchrist wrote:


Hello,

I am currently using R to run an external program and then read the
results the external program sends to the stdout which are tsv data.

When R reads the results in it converts it to to a list of strings which
I then have to maniuplate with a whole slew of commands (which, figuring out
how to do was a reall challenge for a newbie like myself)--see below.

Here's the code I'm using.  COMMAND runs the external program.

  rawInput= system(COMMAND,intern=TRUE);##read in tsv values
  rawInput = strsplit(rawInput, split="\t");##split elements w/in the
list
 ##of character strings by
"\t"
  rawInput = unlist(rawInput); ##unlist, making it one long vector
  mode(rawInput)="double"; ##convert from strings to double
  finalInput = data.frame(t(matrix(rawInput, nrow=6))); ##convert

Because I will be doing this 100,000 of times as part of an optimization
problem, I am interested in learning a more efficient way of doing this
conversion.

Any suggestions would be appreciated.


Thanks in advance.

Mike


-
Department of Ecology & Evolutionary Biology
569 Dabney Hall
University of Tennessee
Knoxville, TN 37996-1610

phone:(865) 974-6453
fax:  (865) 974-6042

web: http://eeb.bio.utk.edu/gilchrist.asp



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Statistical question re assessing fit of distribution functions.

2008-09-22 Thread Timur Shtatland

If one of the goals is the normality test, then there may be better
alternatives to the Kolmogorov-Smirnov test.
See an explanation on:
http://graphpad.com/FAQ/viewfaq.cfm?faq=959

The R implementation:
?shapiro.test

A casual search also turned this up:
http://tolstoy.newcastle.edu.au/R/help/04/09/3201.html
http://tolstoy.newcastle.edu.au/R/help/04/08/3121.html
http://www.karlin.mff.cuni.cz/~pawlas/2008/MAI061/dagost.R

Best,

Timur
--
Timur Shtatland, Ph.D.
Senior Bioinformatics Scientist
Agencourt Bioscience Corporation - A Beckman Coulter Company
500 Cummings Center, Suite 2450
Beverly, MA 01915
www.agencourt.com

On Mon, Sep 22, 2008 at 12:26 PM, Ted Byers <[EMAIL PROTECTED]> wrote:
>
> I am in a situation where I have to fit a distrution, such as cauchy or
> normal, to an empirical dataset.  Well and good, that is easy.
>
> But I wanted to assess just how good the fit is, using ks.test.
>
> I am concerned about the following note in the docs (about the example
> provided):  "Note that the distribution theory is not valid here as we have
> estimated the parameters of the normal distribution from the same sample"
>
> This implies I should not use ks.test(x,"pnorm",mean =1.187, sd =0.917),
> where the numbers shown are estimated from 'x'.  If this is so, how do I get
> a correct test?  I know I can not use different samples because of just how
> different the parameters are from one sample to the next, so using
> parameters estimated from the sample from week one to define the
> distribution function for ks.test will give a poor fit for the data from
> week two.  And the sample size is small enough that I would not have
> confidence in the parameters estimated from a portion of a samlpe to fit
> against the remainder of the sample.
>
> Thanks
>
> Ted
>
> --
> View this message in context: 
> http://www.nabble.com/Statistical-question-re-assessing-fit-of-distribution-functions.-tp19611539p19611539.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Matthew Pettis

Thank You All,

I think all of this may have been due to shared library conflict
headaches.  At one point, I inadvertently upgraded my Perl install to
5.10, and I think that messed up a lot of my libraries.  I have now
started with a clean Ubuntu install, and am going to see if I can work
my way back up to installing R and making that work.  I will recontact
the list if this problem persists through this reimaging of my server.

Thanks again,
Matt

On Mon, Sep 22, 2008 at 8:20 AM, Dirk Eddelbuettel <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote:
>> Matthew,
>>
>> As per the CRAN Ubuntu README
>>
>>   http://cran.r-project.org/bin/linux/ubuntu/
>>
>> install the Ubuntu r-base-dev package to compile R packages from
>> sources.
>
> Well there should be a working r-cran-hmisc package.  You simply got a
> '404' error indicating that your network access (using http) to the
> external Ubuntu mirror was broken.   Fix that, or download the package
> by hand.  It may be easier to just install the missing package.
>
> That said, Vincent is of course entirely correct on the need for
> r-base-dev.
>
> Dirk
>
>>
>> Vincent
>>
>> Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :
>>
>>> Hi,
>>>
>>> I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
>>> I tried getting Hmisc from within R by issuing the standard
>>> 'install.packages' command, but it said I needed 'gfortran' to
>>> compile.  I thought I could circumvent this by using 'aptitude' to get
>>> the package 'r-cran-hmisc', but when I got it, the package had
>>> critical missing parts (got 404s).  So, I'll be trying to go back and
>>> download 'gfortran', but can anybody tell me if this aptitude ubuntu
>>> package should be kept up to date and is just currently overlooked?
>>>
>>> Thanks,
>>> Matt
>>>
>>> --
>>> It is from the wellspring of our despair and the places that we are
>>> broken that we come to repair the world.
>>> -- Murray Waas
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Three out of two people have difficulties with fractions.
>



-- 
It is from the wellspring of our despair and the places that we are
broken that we come to repair the world.
-- Murray Waas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing the text offset for axis labels

2008-09-22 Thread Greg Snow

Look at ?par and scroll down to the section on 'mgp'.  Or you can suppress the 
axis when you make the plot, then use the axis function to include it with more 
control (see ?axis).

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
801.408.8111


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> project.org] On Behalf Of Arthur Roberts
> Sent: Monday, September 22, 2008 10:23 AM
> To: [EMAIL PROTECTED]
> Subject: [R] changing the text offset for axis labels
>
> Hi, all,
>
> I was wondering if there is a way to change the offset of axis labels
> from the axis.  In other words, I need the axis labels closer to the
> acis than the default.  Thanks for the help.
>
> Best wishes,
> Art Roberts
> University of Washington
> Seattle, WA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to execute external programs with R?

2008-09-22 Thread Arthur Roberts


Hi, all,

Could anyone give me advise on who the execute external programs with  
R?  It would be greatly appreciated.


Art Roberts
University of Washington.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to execute external programs with R?

2008-09-22 Thread Duncan Murdoch


On 9/22/2008 2:50 PM, Arthur Roberts wrote:

Hi, all,

Could anyone give me advise on who the execute external programs with  
R?  It would be greatly appreciated.


The system() or shell() functions can do this; Windows also has 
shell.exec().


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Building binary package fails because of missing dependent package

2008-09-22 Thread Hans-Peter Suter

On an (Intel Leopard) Mac I try to build a package (mxFinance) which
depends on another package (mxGraphics). The dependendy is 1) a
'Depends:' in DESCRIPTION and 2) an import in NAMESPACE.

- The build fails if the dependent package (mxGraphics) is not
installed in the R.framework

Do I need to have installed all packages which are required by
packages to be built binary (source builds are ok)?

Cheers,
Hans-Peter


---
Macintosh:mxFinance chappi$ R CMD BUILD --binary mxFinance
* checking for file 'mxFinance/DESCRIPTION' ... OK
* preparing 'mxFinance':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* removing junk files
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building binary distribution
* Installing *source* package 'mxFinance' ...
** libs
** arch - i386
gcc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99
-I/Library/Frameworks/R.framework/Resources/include
-I/Library/Frameworks/R.framework/Resources/include/i386  -msse3
-fPIC  -g -O2 -march=nocona -c init.c -o init.o
gcc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99 -dynamiclib
-Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined
dynamic_lookup -single_module -multiply_defined suppress
-L/usr/local/lib -o mxFinance.so init.o
-F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework
-Wl,CoreFoundation
ld: warning, duplicate dylib
/Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libgcc_s.1.dylib
** arch - ppc
gcc -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99
-I/Library/Frameworks/R.framework/Resources/include
-I/Library/Frameworks/R.framework/Resources/include/ppc
-I/usr/local/include-fPIC  -g -O2 -c init.c -o init.o
gcc -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk
-mmacosx-version-min=10.4 -std=gnu99 -dynamiclib
-Wl,-headerpad_max_install_names -mmacosx-version-min=10.4 -undefined
dynamic_lookup -single_module -multiply_defined suppress
-L/usr/local/lib -o mxFinance.so init.o
-F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework
-Wl,CoreFoundation
ld: warning, duplicate dylib
/Developer/SDKs/MacOSX10.4u.sdk/usr/local/lib/libgcc_s.1.dylib
** R
** data
** preparing package for lazy loading
Loading required package: mxGraphics
Warning in library(pkg, character.only = TRUE, logical.return = TRUE,
lib.loc = lib.loc) :
  there is no package called 'mxGraphics'
Error: package 'mxGraphics' could not be loaded
Execution halted
ERROR: lazy loading failed for package 'mxFinance'
** Removing 
'/var/folders/xr/xr01D7JAEtGe4S5uaDQSgTI/-Tmp-/Rinst881133514/mxFinance'
 ERROR
* installation failed

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gbm error

2008-09-22 Thread Darin Brooks

Good afternoon
 
Has anyone tried using Dr. Elith's BRT script?  I cannot seem to run
gbm.step  from the installed gbm package.  Is it something external to gbm?
 
When I run the script itself
 
<- gbm.step(data=model.data, 

gbm.x = colx:coly,

gbm.y = colz,

family = "bernoulli",

tree.complexity = 5,

learning.rate = 0.01,

bag.fraction = 0.5)

 
... I keep encountering the same error:
 
ERROR:  
  unexpected ')' in "bag.fraction = 0.5)"
 
I've tried all sorts of variations (such as)
 
sep22BRT.lr01 <- gbm{data=sep22BRT, 
gbm.x = sep22BRT[,3:42], 
gbm.y = sep22BRT[,1], 
family = "bernoulli", 
tree.complexity = 5, 
learning.rate = 0.01, 
bag.fraction = 0.5}
 
and cannot find the problem. 
 
Is there a glaring error that I am overlooking? 
 
 
Darin Brooks
Geomatics/GIS/Remote Sensing Coordinator
Kim Forest Management Ltd. Cranbrook Office
Cranbrook, BC
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] change the panel name in xyplot

2008-09-22 Thread Ronaldo Reis Junior

Hi,

I try to change the panel name in a xyplot without success.

Look this example from xyplot manual:

xyplot(Murder ~ Population | state.region,data=states)

The panel title are: 
Northeast, South, North Central, West, that are factor from state.region.

I need do change some names and, for example, put some of these in italic. I 
dont find how change this. 

I looking for this in Deepayan Sakar lattice book, but I dont find the way.

Any help?

Thanks
Ronaldo
-- 
"You can't make a program without broken egos."
--
> Prof. Ronaldo Reis Júnior
|  .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional
| : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia
| `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
|   `- Fone: (38) 3229-8192 | [EMAIL PROTECTED] | [EMAIL PROTECTED]
| http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366
--
Favor NÃO ENVIAR arquivos do Word ou Powerpoint
Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] change the panel name in xyplot

2008-09-22 Thread Henrique Dallazuanna

Try this:

xyplot(Murder ~ Population | state.region,
   data = states,
   strip = strip.custom(factor.levels = c(expression(italic(A)),
"B",  "C",  "D")))

On Mon, Sep 22, 2008 at 4:33 PM, Ronaldo Reis Junior <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I try to change the panel name in a xyplot without success.
>
> Look this example from xyplot manual:
>
> xyplot(Murder ~ Population | state.region,data=states)
>
> The panel title are:
> Northeast, South, North Central, West, that are factor from state.region.
>
> I need do change some names and, for example, put some of these in italic. I
> dont find how change this.
>
> I looking for this in Deepayan Sakar lattice book, but I dont find the way.
>
> Any help?
>
> Thanks
> Ronaldo
> --
> "You can't make a program without broken egos."
> --
>> Prof. Ronaldo Reis Júnior
> |  .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional
> | : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia
> | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
> |   `- Fone: (38) 3229-8192 | [EMAIL PROTECTED] | [EMAIL PROTECTED]
> | http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366
> --
> Favor NÃO ENVIAR arquivos do Word ou Powerpoint
> Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] the package of R which is related Hosmer?Lemeshow test?

2008-09-22 Thread leo_wa


the package of R which is related Hosmer?Lemeshow test?
-- 
View this message in context: 
http://www.nabble.com/the-package-of-R-which-is-related-Hosmer-Lemeshow-test--tp19613179p19613179.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] findInterval(), binary search, log(N) complexity

2008-09-22 Thread Markus Loecher

Dear R users,
the help for findInterval(x,vec) suggests a logarithmic dependence on N
(=length(vec)), which would imply a binary search type algorithm.
However, when I "test" this hypothesis, in the following manner:

set.seed(-3645);
l <- vector();
N.seq <- c(5000, 50, 100, 1000, 5000);k <- 1
for (N in N.seq){
  tmp <- sort(round(stats::rt(N, df=2), 2));
  l[k] <- system.time(it3 <- findInterval(-1, tmp))[2];k <- k + 1;
}
plot(N.seq,l,type="b",xlab="length(vec)", ylab="CPU time");

the resulting plot suggests a linear relationship.
I must be missing sth. here ?

Thanks !

Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to find a shift between two curves or data sets

2008-09-22 Thread Sébastien Durand


Dear Hans,

Thanks for your reply.

I will read that book. 


Cheers!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] graphing netCDF files

2008-09-22 Thread Paul Hiemstra


Hi Steve,

If you read your netCDF files into R you end up with sp-classes which 
can be displayed using spplot. But you do not seem to use rgdal.


If you can make a data.frame with the x, y and z coordinates this can 
quite easily be transformed into an sp-class:


library(sp)
dat = data.frame(x = UTMx, y = UTMy, z = wat.data2001q1,,i])
coordinates(dat) = ~x+y   # "tell" spplot what the names of the columns 
with the x and y coordinates are

gridded(dat) = TRUE   # make clear it is a grid
spplot(dat)

For more details see the documentation for the sp-package, especially 
spplot. These kinds of questions are more suitable for the r-sig-geo 
mailing list and not the general r-help list.


hope this helps,

Paul

[EMAIL PROTECTED] schreef:

Hello

I'm working with a large hydrological data set stored in a netCDF format.
The file stores x and y coordinates in the UTM projected coordinate system,
yet when I use image to graphically display the z variable, the image is
distorted in the sense that it does not plot the map in the correct spatial
organization.

I'm wondering if I need to define the projection of the netCDF file with
rgdal or proj4 routines first before I send it to the graphics device.
  

Defining the projection is not needed

My code is as follows:

 q1_2001 <- open.ncdf("H:\\SKF_DESKTOP FILES\\My
Documents\\EDEN\\EDEN\\Surfaces\\2000_q1.nc", readunlimi=FALSE) #opens ncdf
file for reading
   wat.data2000q1 <- get.var.ncdf(q1_2001,  verbose=FALSE ) # gets the real
information

 # GENERAL EXAMINATION OF HEADER DATA in the wat.data file
   day <- get.var.ncdf(q1_2001, "time")   # length(day) 91 days in quarter
   UTMx <-   get.var.ncdf(q1_2001, "x")   # columns (eastings)  # should
return 405
   UTMy <-   get.var.ncdf(q1_2001, "y")   # rows (northings)   #
should return 287

# plot first 91 days (3 months of the year)
for(i in 1:91) {
   !is.na( image(UTMx, UTMy, z = wat.data2001q1[,,i], col=brewer.pal(8,
"YlGnBu"),
 axes=T, pty="s", ylab="UTM Northing", xlab="UTM Easting",
 main = "First Quater 2001")  )
 }

As I indicated above the map is displayed on the graphics device. However
the orientation is distorted pulling the x axis to wide and the y axis too
tall.  How can I set the graphics device to know the orientation and
scaling (if these are the correct terms) in order to display this map
correctly?

All insights will be greatly appreciated.

Thanks
Steve

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

Office (305) 224 - 4282
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone: +31302535773
Fax:+31302531145
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Profiling on Multicore and Parallel Systems

2008-09-22 Thread Imanpreet

Hello All,

In general when we use Rprof for performance evaluation on
Multicore systems the output provides the time on the basis of the "user"
time and the sampling time is equal to the the user time as reported by
system.time. This does not seem right behavior when R is linked to
BLAS/Lapack or other libraries which are optimized for parallel or multicore
architectures as over there user time can be more than the elapsed time and
one would be more interested in just the "elapsed"  time taken by
computation returned by gettimeofday()  per routine rather than "user" time
as returned by getrusage().


  Could anyone provide any pointers on how to best do R
profiling on parallel and multicore systems.

Regards,

-- 
Imanpreet Singh Arora

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manage huge database

2008-09-22 Thread Thomas Lumley


On Mon, 22 Sep 2008, Martin Morgan wrote:


"José E. Lozano" <[EMAIL PROTECTED]> writes:


Maybe you've not lurked on R-help for long enough :) Apologies!


Probably.


So, how much "design" is in this data? If none, and what you've
basically got is a 2000x50 grid of numbers, then maybe a more raw


Exactly, raw data, but a little more complex since all the 50 variables
are in text format, so the width is around 2,500,000.

>

Is genetic DNA data (individuals genotyped), hence the large amount of
columns to analyze.


The Bioconductor package snpMatrix is designed for this type of
data. See

http://www.bioconductor.org/packages/2.2/bioc/html/snpMatrix.html

and if that looks promising


source('http://bioconductor.org/biocLite.R')
biocLite('snpMatrix')


Likely you'll quickly want a 64 bit (linux or Mac) machine.



netCDF is another useful option -- we have been using the ncdf package for 
large genomic datasets.  We read the data in one person at a time and 
write to netCDF.  For analysis we can then read any subsets.  Since we 
have imputed SNP data  as well as measured this comes to about 2.5 million 
variables on 4000 people for one of our data sets.



-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hmisc and Ubuntu (aptitude install)

2008-09-22 Thread Matthew Pettis

Hi All,

After rebuilding my Ubuntu image, I followed the instruction in this
thread, and everything worked out fine -- thank you again.

So, I'll just add: if you use R and perl, and don't have to download
perl5.10, then don't do it, at least not yet.  Or, if you do, then you
will have a lot of shared object tweaking.

Matt

On Mon, Sep 22, 2008 at 1:22 PM, Matthew Pettis
<[EMAIL PROTECTED]> wrote:
> Thank You All,
>
> I think all of this may have been due to shared library conflict
> headaches.  At one point, I inadvertently upgraded my Perl install to
> 5.10, and I think that messed up a lot of my libraries.  I have now
> started with a clean Ubuntu install, and am going to see if I can work
> my way back up to installing R and making that work.  I will recontact
> the list if this problem persists through this reimaging of my server.
>
> Thanks again,
> Matt
>
> On Mon, Sep 22, 2008 at 8:20 AM, Dirk Eddelbuettel <[EMAIL PROTECTED]> wrote:
>> On Mon, Sep 22, 2008 at 08:48:12AM -0400, Vincent Goulet wrote:
>>> Matthew,
>>>
>>> As per the CRAN Ubuntu README
>>>
>>>   http://cran.r-project.org/bin/linux/ubuntu/
>>>
>>> install the Ubuntu r-base-dev package to compile R packages from
>>> sources.
>>
>> Well there should be a working r-cran-hmisc package.  You simply got a
>> '404' error indicating that your network access (using http) to the
>> external Ubuntu mirror was broken.   Fix that, or download the package
>> by hand.  It may be easier to just install the missing package.
>>
>> That said, Vincent is of course entirely correct on the need for
>> r-base-dev.
>>
>> Dirk
>>
>>>
>>> Vincent
>>>
>>> Le lun. 22 sept. à 00:08, Matthew Pettis a écrit :
>>>
 Hi,

 I'm trying to get the Hmisc module on my Ubuntu Hardy Heron install.
 I tried getting Hmisc from within R by issuing the standard
 'install.packages' command, but it said I needed 'gfortran' to
 compile.  I thought I could circumvent this by using 'aptitude' to get
 the package 'r-cran-hmisc', but when I got it, the package had
 critical missing parts (got 404s).  So, I'll be trying to go back and
 download 'gfortran', but can anybody tell me if this aptitude ubuntu
 package should be kept up to date and is just currently overlooked?

 Thanks,
 Matt

 --
 It is from the wellspring of our despair and the places that we are
 broken that we come to repair the world.
 -- Murray Waas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Three out of two people have difficulties with fractions.
>>
>
>
>
> --
> It is from the wellspring of our despair and the places that we are
> broken that we come to repair the world.
> -- Murray Waas
>

-- 
It is from the wellspring of our despair and the places that we are
broken that we come to repair the world.
-- Murray Waas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] findInterval(), binary search, log(N) complexity

2008-09-22 Thread Duncan Murdoch


On 9/22/2008 1:51 PM, Markus Loecher wrote:

Dear R users,
the help for findInterval(x,vec) suggests a logarithmic dependence on N
(=length(vec)), which would imply a binary search type algorithm.
However, when I "test" this hypothesis, in the following manner:


R is open source.  Why test things this way, when you can look at the 
source?  You don't even need to go to C code for this:


> findInterval
function (x, vec, rightmost.closed = FALSE, all.inside = FALSE)
{
if (any(is.na(vec)))
stop("'vec' contains NAs")
if (is.unsorted(vec))
stop("'vec' must be sorted non-decreasingly")
if (has.na <- any(ix <- is.na(x)))
x <- x[!ix]
nx <- length(x)
index <- integer(nx)
.C("find_interv_vec", xt = as.double(vec), n = 
as.integer(length(vec)),
x = as.double(x), nx = as.integer(nx), 
as.logical(rightmost.closed),

as.logical(all.inside), index, DUP = FALSE, NAOK = TRUE,
PACKAGE = "base")
if (has.na) {
ii <- as.integer(ix)
ii[ix] <- NA
ii[!ix] <- index
ii
}
else index
}


Notice the "is.unsorted" test.  How could that be anything other than 
linear execution time in N? Similarly for any(ix <- is.na(x)).


If you know the answers to those tests (as you do in your simulation), 
you could presumably get O(log(n)) behaviour by writing a new function 
that skipped them.  But you could take a look at the source code (in 
https://svn.r-project.org/R/trunk/src/appl/interv.c) if you want to 
check, or if you notice any weird timings.


Duncan Murdoch




set.seed(-3645);
l <- vector();
N.seq <- c(5000, 50, 100, 1000, 5000);k <- 1
for (N in N.seq){
  tmp <- sort(round(stats::rt(N, df=2), 2));
  l[k] <- system.time(it3 <- findInterval(-1, tmp))[2];k <- k + 1;
}
plot(N.seq,l,type="b",xlab="length(vec)", ylab="CPU time");

the resulting plot suggests a linear relationship.
I must be missing sth. here ?

Thanks !

Markus

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lme problems

2008-09-22 Thread Tommaso Pizzari

Hi, 
I'm analysing a dataset in which the same 5 subjects (male.pair) were subjected 
to two treatments (treatment) and were measured for 12 successive days within 
each treatment (layingday). Overall 5*2*12=120 observations. 

I want to test the effect of treatment, time (layingday) and their interaction. 
I have done so through the ANOVA below:

> bmc3<-aov(Mean1~treatment*layingday+Error(male.pair/treatment/layingday))
> summary(bmc3)

Error: male.pair
  Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  1 0.13850 0.13850   

Error: male.pair:treatment
  Df  Sum Sq Mean Sq
treatment  1 0.60525 0.60525

Error: male.pair:treatment:layingday
  Df  Sum Sq Mean Sq
layingday  1 0.64037 0.64037

Error: Within
 Df  Sum Sq Mean Sq F valuePr(>F)
treatment 1 0.02015 0.02015  0.73400.3934
layingday 1 0.52937 0.52937 19.2878 2.545e-05 ***
treatment:layingday   1 0.02959 0.02959  1.07820.3013
Residuals   113 3.10135 0.02745  
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

I then wanted to compare this outcome with an lme, and used the model below. 
However, its outcome doesn't make much sense to me. 

> bmc4<- lme(Mean1 ~ treatment*layingday, random = ~1|male.pair)
> summary(bmc4)
Linear mixed-effects model fit by REML
 Data: NULL 
AIC   BIC   logLik
  -118.4522 -101.9306 65.22609

Random effects:
 Formula: ~1 | male.pair
(Intercept)  Residual
StdDev:   0.1313573 0.1185902

Fixed effects: Mean1 ~ treatment * layingday 
 Value  Std.Error  DF   t-value p-value
(Intercept)  0.5311005 0.09369140 112  5.668615  0.
treatment0.0495373 0.04616116 112  1.073138  0.2855
layingday   -0.0488055 0.00991701 112 -4.921389  0.
treatment:layingday  0.0138449 0.00627207 112  2.207388  0.0293
 Correlation: 
(Intr) trtmnt lyngdy
treatment   -0.739  
layingday   -0.688  0.838   
treatment:layingday  0.653 -0.883 -0.949

Standardized Within-Group Residuals:
Min  Q1 Med  Q3 Max 
-2.44529424 -0.68505388  0.01663401  0.59009515  3.53354000 

Number of Observations: 120
Number of Groups: 5 

I struggle to understand the discrepancy in df between the anova and lme, and 
the fact that the interaction term is not significant in the anova but 
significant in lme. Any help would be greatly appreciated. 
Best
Tom

-- 
Dr. Tommaso Pizzari
Edward Grey Institute, Dept of Zoology, 
University of Oxford, Oxford OX1 3PS
Tel: (44) 1865 271279, Fax: (44) 1865 271168

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Warranty on Accuracy, Precision, Legality, ... of R in Research

2008-09-22 Thread Marc Schwartz

on 09/22/2008 11:26 AM Bert Chan wrote:
> Warranty on Accuracy, Precision, Legality, ... of R in Research
> 
> (These questions may well have been raised.)
> 
> What is the implied warranty of using R for research & publications, 
> consulting, etc.?
> 
> Alternately, how does one obtain such a warranty?
> 
> Your answers will be much appreciated.
> 
> Perhaps you can point me to some websites which discussed this subject in the 
> past.
> 
> Thanks & regards -
> 
> Bert
> 
> (Bertram K. C. Chan, PhD)

As per the banner that appears whenever you start up R:

"R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details."

The suitability of R for any particular application is entirely up to
the user. Legally, there is nothing preventing you from using R for such
applications relative to the license under which R is made available.

You did not indicate the specific type of research you have in mind, but
if it might be in the domain of clinical trials, please review:

  http://www.r-project.org/doc/R-FDA.pdf

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Coefficients, OR and 95% CL

2008-09-22 Thread Luciano La Sala

Dear R-users,

After running a logistic regression, I need to calculate OR by exponentiating 
the coefficient, and then I need the 95% CL for the OR as well. For the 
following example (taken from P. Dalaagard's book), what would be the most 
straightforward method of getting what I need? Could anyone enlight me please?  
 

Thank you!
Lucho 

> summary(glm(menarche~age,binomial))

Call:
glm(formula = menarche ~ age, family = binomial)

Deviance Residuals: 
 Min1QMedian3Q   Max  
-4.68654  -0.13049  -0.01067   0.09608   2.35254  

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -17.9175 1.7074  -10.49   <2e-16 ***
age   1.3549 0.1296   10.45   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 974.31  on 703  degrees of freedom
Residual deviance: 223.95  on 702  degrees of freedom
  (635 observations deleted due to missingness)
AIC: 227.95

Number of Fisher Scoring iterations: 9






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coefficients, OR and 95% CL

2008-09-22 Thread Jorge Ivan Velez

Dear Luciano,
See ?logistic.display in the epicalc package. If glm1 is your model,
something like

logistic.display(glm1)

should do the job.


HTH,


Jorge


On Mon, Sep 22, 2008 at 5:28 PM, Luciano La Sala
<[EMAIL PROTECTED]>wrote:

> Dear R-users,
>
> After running a logistic regression, I need to calculate OR by
> exponentiating the coefficient, and then I need the 95% CL for the OR as
> well. For the following example (taken from P. Dalaagard's book), what would
> be the most straightforward method of getting what I need? Could anyone
> enlight me please?
>
> Thank you!
> Lucho
>
> > summary(glm(menarche~age,binomial))
>
> Call:
> glm(formula = menarche ~ age, family = binomial)
>
> Deviance Residuals:
> Min1QMedian3Q   Max
> -4.68654  -0.13049  -0.01067   0.09608   2.35254
>
> Coefficients:
>Estimate Std. Error z value Pr(>|z|)
> (Intercept) -17.9175 1.7074  -10.49   <2e-16 ***
> age   1.3549 0.1296   10.45   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>Null deviance: 974.31  on 703  degrees of freedom
> Residual deviance: 223.95  on 702  degrees of freedom
>  (635 observations deleted due to missingness)
> AIC: 227.95
>
> Number of Fisher Scoring iterations: 9
>
>
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Deleting multiple variables

2008-09-22 Thread Michael Pearmain

Hi All,
i have searched the web for a simple solution but have been unable to find
one.  Can anyone recommend a neat way of deleting multiple variable?
I see, i need to use dataframe$VAR<-NULL to get rid of one variable,
In my situation i need to delete all vars between two points.

I've used the 'which' function to find these out and have assigned to myvar
>myvars
[1]  2 17

but i can't figure out how i should apply this?

Should i loop through the values? (Psydo code below?)

for (x in c(myvars[1]:myvars[2]))
(M_UC$x<-NULL))

Any help gratful

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Deleting multiple variables

2008-09-22 Thread Andrew Robinson

Mike,

how about

M_UC <- M_UC[,-(myvars[1]:myvars[2])]

?

Andrew

On Mon, Sep 22, 2008 at 11:04:34PM +0100, Michael Pearmain wrote:
> Hi All,
> i have searched the web for a simple solution but have been unable to find
> one.  Can anyone recommend a neat way of deleting multiple variable?
> I see, i need to use dataframe$VAR<-NULL to get rid of one variable,
> In my situation i need to delete all vars between two points.
> 
> I've used the 'which' function to find these out and have assigned to myvar
> >myvars
> [1]  2 17
> 
> but i can't figure out how i should apply this?
> 
> Should i loop through the values? (Psydo code below?)
> 
> for (x in c(myvars[1]:myvars[2]))
> (M_UC$x<-NULL))
> 
> Any help gratful
> 
> Mike
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weights for polr

2008-09-22 Thread Gregory Wawro


Hello,

I'm estimating an ordered logit model on a probability weighted survey 
sample.  polr permits case weights with the "weights" option, but I cannot 
figure out from existing documentation what it actually does with these 
weights.  I'm concerned about this because I get somewhat different 
results using Stata's ologit command with the pweights option and very 
different results using proc logistic in SAS with its weight option.  So 
my basic question is whether or not it is appropriate to use the weight 
option for polr with my data.


Best,
Greg


.

Gregory Wawro   [EMAIL PROTECTED]
Associate Professor phone:  212-854-8540
Dept. of Political Science  fax:212-222-0598
741 International Affairs   http://www.columbia.edu/~gjw10/
Columbia University
New York, NY 10027

.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help for SUR model

2008-09-22 Thread Xianchun Liao

I am an R beginner and trying to run a SUR model in R framework.

 

subset(esasp500, Obs <=449 & Obs>=197, select = -Date) ->ev13sub

c(Obs>=397) & c(Obs<=399) ->d13

c(Obs>=400) & c(Obs<=449) ->f13

SP500*f13 ->SP500f13

 

BBC~SP500+d13+SP500f13  ->sur132

BOW~SP500+d13+SP500f13  ->sur133

CSK~SP500+d13+SP500f13  ->sur134

DTC~SP500+d13+SP500f13  ->sur135

GP~SP500+d13+SP500f13   ->sur136

HAN~SP500+d13+SP500f13  ->sur137

IP~SP500+d13+SP500f13   ->sur138

KMB~SP500+d13+SP500f13  ->sur139

LPX~SP500+d13+SP500f13  ->sur1310

MWV~SP500+d13+SP500f13  ->sur1311

PCH~SP500+d13+SP500f13  ->sur1312

PCL~SP500+d13+SP500f13  ->sur1313

PNR~SP500+d13+SP500f13  ->sur1314

POP~SP500+d13+SP500f13  ->sur1315

SON~SP500+d13+SP500f13  ->sur1316

TIN~SP500+d13+SP500f13  ->sur1317

W~SP500+d13+SP500f13->sur1318

WPP~SP500+d13+SP500f13  ->sur1319

WY~SP500+d13+SP500f13   ->sur1320

 

system13 <- list(sur132, sur133, sur134, sur135, sur136, sur137, sur138,
sur139, sur1310, sur1311, sur1312, sur1313, sur1314, sur1315, sur1316,
sur1317, sur1318,sur1319,sur1320)

labels13 <-
ist("sur132","sur133","sur134","sur135","sur136","sur137","sur138","sur1
39","sur1310","sur1311","sur1312","sur1313","sur1314","sur1315","sur1316
","sur1317","sur1318","sur1319","sur1320")  

res13 <- systemfit("SUR", system13,labels13, data=ev13sub)

summary(res13)

 

But the results show  Error: could not find function "systemfit".

 

So, how to write a R code to implement the formula and get right
results.

 

 

Thanks,

 

Bill

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Prediction errors from forecast()?

2008-09-22 Thread Laura Pyle

Hello,

I am using forecast() in the forecast package to predict future values of an
ARIMA model fit to a time series.  I have read most of the documentation for
the forecast package, but I can't figure out how to obtain the forecast
variance for the predicted values.  I tried using the argument
"se.fit=TRUE," hoping this would work since forecast() calls predict().

Is there an easy way to do this?  Sample code is below.

ar <- Arima(as.matrix(Y), order= c(1,0,0),include.drift=TRUE))
f <- forecast(ar,h=9,se.fit=TRUE)
summary(f)

Thanks,
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weights for polr

2008-09-22 Thread Thomas Lumley


On Mon, 22 Sep 2008, Gregory Wawro wrote:


Hello,

I'm estimating an ordered logit model on a probability weighted survey 
sample.


You could use svyolr() in the survey package.

polr permits case weights with the "weights" option, but I cannot 
figure out from existing documentation what it actually does with these 
weights.


They are frequency weights.

I'm concerned about this because I get somewhat different results 
using Stata's ologit command with the pweights option


You should get the same point estimates, but different standard errors.

and very different 
results using proc logistic in SAS with its weight option.


Again, it should be the same point estimates but different standard 
errors.


 So my basic 
question is whether or not it is appropriate to use the weight option for 
polr with my data.


No.

-thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Warranty on Accuracy, Precision, Legality, ... of R in Research

On Mon, Sep 22, 2008 at 4:07 PM, Marc Schwartz
<[EMAIL PROTECTED]> wrote:
> on 09/22/2008 11:26 AM Bert Chan wrote:
>> Warranty on Accuracy, Precision, Legality, ... of R in Research
>>
>> (These questions may well have been raised.)
>>
>> What is the implied warranty of using R for research & publications, 
>> consulting, etc.?
>>
>> Alternately, how does one obtain such a warranty?
>>
>> Your answers will be much appreciated.
>>
>> Perhaps you can point me to some websites which discussed this subject in 
>> the past.
>>
>> Thanks & regards -
>>
>> Bert
>>
>> (Bertram K. C. Chan, PhD)
>
> As per the banner that appears whenever you start up R:
>
> "R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details."

And surely this the most that any software could provide?

SAS has:

"EXCEPT WHERE EXPRESSLY PROVIDED OTHERWISE IN AN AGREEMENT BETWEEN YOU
AND SAS, ALL INFORMATION, SOFTWARE, PRODUCTS AND SERVICES ARE PROVIDED
"AS IS" WITHOUT WARRANTY OF ANY KIND INCLUDING WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NON-INFRINGEMENT."

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Prediction errors from forecast()?

2008-09-22 Thread Laura Pyle

Sorry, I am resending in plain text.

Hello,

I am using forecast() in the forecast package to predict future values
of an ARIMA model fit to a time series.  I have read most of the
documentation for the forecast package, but I can't figure out how to
obtain the forecast variance for the predicted values.  I tried using
the argument "se.fit=TRUE," hoping this would work since forecast()
calls predict().

Is there an easy way to do this?  Sample code is below.

ar <- Arima(as.matrix(Y), order= c(1,0,0),include.drift=TRUE))
f <- forecast(ar,h=9,se.fit=TRUE)
summary(f)

Thanks,
Laura

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sort a data matrix by all the values and keep the names

2008-09-22 Thread zhihuali


Dear all,

If I have a data frame  x<-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
   x1  x2  x3
   1 4  8
   7 6  2

I want to sort the whole data and get this:
x1 1
x3  2
x2  4
x2  6
x1   7
x3   8

 If I do sort(X), R reports:
Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = 
FALSE) : 
  unimplemented type 'list' in 'orderVector1'

The only way I can sort all the data is by converting it to a matrix:
> sort(as.matrix(x))
[1] 1 2 4 6 7 8

But now I lost all the names attributes.

Is it possible to sort a data frame and keep all the names?

Thanks!

Zhihua Li

_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] perl expression question

2008-09-22 Thread markleeds

If I have the string below. does someone know a regular expression to 
just get the "BLC.NYSE". I bought the O'Reilley
book and read it when I can  and I study the solutions on the list but 
I'm still not self sufficient with these things. Thanks.



stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread Moshe Olshansky

One possibility is:

> x <- data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2))
> names <- t(matrix(rep(names(x),times=nrow(x)),nrow=ncol(x)))
> m <- as.matrix(x)
> ind <- order(m)
> df <- data.frame(name=names[ind],value=m[ind])
> df
  name value
1   x1 1
2   x3 2
3   x2 4
4   x2 6
5   x1 7
6   x3 8



--- On Tue, 23/9/08, zhihuali <[EMAIL PROTECTED]> wrote:

> From: zhihuali <[EMAIL PROTECTED]>
> Subject: [R] sort a data matrix by all the values and keep the names
> To: [EMAIL PROTECTED]
> Received: Tuesday, 23 September, 2008, 9:54 AM
> Dear all,
> 
> If I have a data frame 
> x<-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
>x1  x2  x3
>1 4  8
>7 6  2
> 
> I want to sort the whole data and get this:
> x1 1
> x3  2
> x2  4
> x2  6
> x1   7
> x3   8
> 
>  If I do sort(X), R reports:
> Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8,
> 2)), decreasing = FALSE) : 
>   unimplemented type 'list' in
> 'orderVector1'
> 
> The only way I can sort all the data is by converting it to
> a matrix:
> > sort(as.matrix(x))
> [1] 1 2 4 6 7 8
> 
> But now I lost all the names attributes.
> 
> Is it possible to sort a data frame and keep all the names?
> 
> Thanks!
> 
> Zhihua Li
> 
> _
> [[elided Hotmail spam]]
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-2.7.2 infected?

2008-09-22 Thread Dave DeBarr

I tried downloading R-2.7.2 
(http://cran.cnr.berkeley.edu/bin/windows/base/R-2.7.2-win32.exe, both from 
Berkeley and cran) and both times I got a warning from Computer Associates 
eTrust Antivirus (version 7.1.710) that the Win32/Adclicker.JO trojan was 
detected:
The Win32/Adclicker.JO was detected in 
C:\USERS\USER\APPDATA\LOCAL\MICROSOFT\WINDOWS\TEMPORARY INTERNET 
FILES\LOW\CONTENT.IE5\61HAYRTG\R-2.7.2-WIN32[1].EXE.

Has anyone else seen this?

Thanks,
Dave


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sort a data matrix by all the values and keep the names

On Mon, Sep 22, 2008 at 6:54 PM, zhihuali <[EMAIL PROTECTED]> wrote:
>
> Dear all,
>
> If I have a data frame  x<-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
>   x1  x2  x3
>   1 4  8
>   7 6  2
>
> I want to sort the whole data and get this:
> x1 1
> x3  2
> x2  4
> x2  6
> x1   7
> x3   8
>
>  If I do sort(X), R reports:
> Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = 
> FALSE) :
>  unimplemented type 'list' in 'orderVector1'
>
> The only way I can sort all the data is by converting it to a matrix:
>> sort(as.matrix(x))
> [1] 1 2 4 6 7 8
>
> But now I lost all the names attributes.
>
> Is it possible to sort a data frame and keep all the names?

Here's one way:

dfm <- melt(x, id = c())
dfm[order(dfm$value), ]

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread zhihuali


This is exactly what I wanted!

Thank you so much!

Z



> Date: Mon, 22 Sep 2008 19:21:43 -0500
> From: [EMAIL PROTECTED]
> Subject: RE: [R] sort a data matrix by all the values and keep the names
> To: [EMAIL PROTECTED]
> 
> Hi: there might be a quicker way but you can use stack and order. stack 
> creates a dataframe with 2 columns, values and ind,  with ind
> being the associate columns.
> 
> order(temp$values) creates the  indices of the ordered values so you 
> index by that to make it sorted.
> 
> temp <- stack(x)
> print(temp)
> print(str(temp))
> 
> sortedx <- temp[order(temp$values),]
> print(sortedx)
> 
> 
> 
> On Mon, Sep 22, 2008 at  7:54 PM, zhihuali wrote:
> 
> > Dear all,
> >
> > If I have a data frame  x<-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
> >x1  x2  x3
> >1 4  8
> >7 6  2
> >
> > I want to sort the whole data and get this:
> > x1 1
> > x3  2
> > x2  4
> > x2  6
> > x1   7
> > x3   8
> >
> >  If I do sort(X), R reports:
> > Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), 
> > decreasing = FALSE) :   unimplemented type 'list' in 'orderVector1'
> >
> > The only way I can sort all the data is by converting it to a matrix:
> >> sort(as.matrix(x))
> > [1] 1 2 4 6 7 8
> >
> > But now I lost all the names attributes.
> >
> > Is it possible to sort a data frame and keep all the names?
> >
> > Thanks!
> >
> > Zhihua Li
> >
> > _
> > [[elided Hotmail spam]]
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

_
[[elided Hotmail spam]]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] suppress legend in ggplot(data, aes(y=Y, x=X,fill=Z))?

On Sun, Sep 21, 2008 at 5:25 PM, Tom Bonen <[EMAIL PROTECTED]> wrote:
> hi,
>
> is there any way to suppress the legend in ggplot(data, aes(y=Y,
> x=X,fill=Z)) ? i'd like the values to be displayed in different colors
> as specified by fill= and this works just fine. but i do not want to
> have the legend on the right that is automactially created when fill
> is specified.

Hi Tom,

+ opts(legend.position = "none")

should do the trick.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] perl expression question

2008-09-22 Thread Moshe Olshansky

Hi Mark,

stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"
> gsub(".*/([^/]+)$", "\\1",stock)
[1] "BLC.NYSE"



--- On Tue, 23/9/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> Subject: [R] perl expression question
> To: r-help@r-project.org
> Received: Tuesday, 23 September, 2008, 10:29 AM
> If I have the string below. does someone know a regular
> expression to 
> just get the "BLC.NYSE". I bought the
> O'Reilley
> book and read it when I can  and I study the solutions on
> the list but 
> I'm still not self sufficient with these things.
> Thanks.
> 
>  
> stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to view or export values of 'names' in a lm

2008-09-22 Thread Jhunk Emale

Hello,
I have been using:

model <- lm(y~x+I(x^2))

I am namely interested in the values of the residuals. If I use the 'names'
command I get the following:

 names(model)
 [1] "coefficients"  "residuals" "effects"   "rank"
 [5] "fitted.values" "assign""qr""df.residual"
 [9] "xlevels"   "call"  "terms" "model"

I know I can view 'residuals' or 'resid' but how can I view the available
values of 'names' together or, perhaps even better, how can I export them.
If this is a case of read the manual, could someone direct me to where this
is discussed.

Thank you kindly,
JE

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sort a data matrix by all the values and keep the names

2008-09-22 Thread Steven McKinney

Is something missing in the melt()?

> x<-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2))
> require("reshape")
Loading required package: reshape
> dfm <- melt(x, id = c())
Error in if (!missing(id.var) && !(id.var %in% varnames)) { : 
  missing value where TRUE/FALSE needed
> dfm[order(dfm$value), ]
Error: object "dfm" not found
> x
  x1 x2 x3
1  1  4  8
2  7  6  2
> melt(x, id = c())
Error in if (!missing(id.var) && !(id.var %in% varnames)) { : 
  missing value where TRUE/FALSE needed
>


Steve McKinney


-Original Message-
From: [EMAIL PROTECTED] on behalf of hadley wickham
Sent: Mon 9/22/2008 5:47 PM
To: zhihuali
Cc: [EMAIL PROTECTED]
Subject: Re: [R] sort a data matrix by all the values and keep the names
 
On Mon, Sep 22, 2008 at 6:54 PM, zhihuali <[EMAIL PROTECTED]> wrote:
>
> Dear all,
>
> If I have a data frame  x<-data.frame(x1=c(1,7),x2=c(4,6),x3=c(8,2)):
>   x1  x2  x3
>   1 4  8
>   7 6  2
>
> I want to sort the whole data and get this:
> x1 1
> x3  2
> x2  4
> x2  6
> x1   7
> x3   8
>
>  If I do sort(X), R reports:
> Error in order(list(x1 = c(1, 7), x2 = c(4, 6), x3 = c(8, 2)), decreasing = 
> FALSE) :
>  unimplemented type 'list' in 'orderVector1'
>
> The only way I can sort all the data is by converting it to a matrix:
>> sort(as.matrix(x))
> [1] 1 2 4 6 7 8
>
> But now I lost all the names attributes.
>
> Is it possible to sort a data frame and keep all the names?

Here's one way:

dfm <- melt(x, id = c())
dfm[order(dfm$value), ]

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plot implicit function

2008-09-22 Thread Ying-Ying Lee

Hi,

I would like to know how to plot the implicit function.  For example,
f(x,y)=0.  I'd like to plot x-y figure.

Thanks,
Ying

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] perl expression question

2008-09-22 Thread Andrew Robinson

Hi Mark,

do you mean the regex to get the portion of the address after the
final slash?  Something like

gsub(".*/([^/]*$)", "\\1", stock, fixed=FALSE)

Cheers

Andrew

On Mon, Sep 22, 2008 at 07:29:25PM -0500, [EMAIL PROTECTED] wrote:
> If I have the string below. does someone know a regular expression to 
> just get the "BLC.NYSE". I bought the O'Reilley
> book and read it when I can  and I study the solutions on the list but 
> I'm still not self sufficient with these things. Thanks.
> 
> 
> stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] perl expression question

Try this:

> sub(".*/", "", stock)
[1] "BLC.NYSE"

On Mon, Sep 22, 2008 at 8:29 PM,  <[EMAIL PROTECTED]> wrote:
> If I have the string below. does someone know a regular expression to just
> get the "BLC.NYSE". I bought the O'Reilley
> book and read it when I can  and I study the solutions on the list but I'm
> still not self sufficient with these things. Thanks.
>
>
> stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] perl expression question

By the way, although a regular expression solutions was asked for
if one expands that to any solution then R does have a function
specifically for this case:

> basename(stock)
[1] "BLC.NYSE"

On Mon, Sep 22, 2008 at 9:23 PM, Gabor Grothendieck
<[EMAIL PROTECTED]> wrote:
> Try this:
>
>> sub(".*/", "", stock)
> [1] "BLC.NYSE"
>
> On Mon, Sep 22, 2008 at 8:29 PM,  <[EMAIL PROTECTED]> wrote:
>> If I have the string below. does someone know a regular expression to just
>> get the "BLC.NYSE". I bought the O'Reilley
>> book and read it when I can  and I study the solutions on the list but I'm
>> still not self sufficient with these things. Thanks.
>>
>>
>> stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] perl expression question

2008-09-22 Thread jim holtman

If this is a path name, then 'basename' will work for you:

> stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"
> basename(stock)
[1] "BLC.NYSE"
>


On Mon, Sep 22, 2008 at 8:29 PM,  <[EMAIL PROTECTED]> wrote:
> If I have the string below. does someone know a regular expression to just
> get the "BLC.NYSE". I bought the O'Reilley
> book and read it when I can  and I study the solutions on the list but I'm
> still not self sufficient with these things. Thanks.
>
>
> stock<-"/opt/limsrv/mark/research/equity/projects/testDL/stock_data/fhdb/US/BLC.NYSE"
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sort a data matrix by all the values and keep the names