ble example (check it with package reprex) and try again here, or
> ask one of the maintainers of that package.
> --
> Sent from my phone. Please excuse my brevity.
>
> On October 2, 2017 8:56:46 AM PDT, Matthew Keller
> wrote:
> >Hi all,
> >
> >I used to use fwri
Hi all,
I used to use fwrite() function in data.table but I cannot get it to work
now. The function is not in the data.table package, even though a help page
exists for it. My session info is below. Any ideas on how to get fwrite()
to work would be much appreciated. Thanks!
> sessionInfo()
R vers
d <- cbind(rep(index[,2], ind_len), ind)
> #
> # new indices
> new.ind <- cbind(rep(index[,1], ind_len), ind)
> #
> # create the new matrix
> result <- matrix(NA_integer_, max(index[,1]), max(index[,4]))
> #
> # fill the new matrix
> result[new.ind] <- old.mat
HI all,
Sorry for the title here but I find this difficult to describe succinctly.
Here's the problem.
I want to create a new matrix where each row is a composite of an old
matrix, but where the row & column indexes of the old matrix change for
different parts of the new matrix. For example, the
Hi all,
Simple question I should know: I'm unclear on the logic of why the sum of a
row of a data.frame returns a valid sum but the mean of a row of a
data.frame returns NA:
sum(rock[2,])
[1] 10901.05
mean(rock[2,],trim=0)
[1] NA
Warning message:
In mean.default(rock[2, ], trim = 0) :
argument
[1] 0
> > z[2]==0.15
> [1] TRUE
>
> Peter
>
> On Thu, Jul 3, 2014 at 11:28 AM, Matthew Keller
> wrote:
> > Hi all,
> >
> > A bit stumped here.
> >
> > z <- seq(.05,.85,by=.1)
> > z==.05 #good
> > [1] TRUE FALSE FALSE FALSE
Hi all,
A bit stumped here.
z <- seq(.05,.85,by=.1)
z==.05 #good
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
z==.15 #huh
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
More generally:
> sum(z==.25)
[1] 1
> sum(z==.35)
[1] 0
> sum(z==.45)
[1] 1
> sum(z==.55)
[1]
Hello all,
I have some genetic datasets (gzipped) that contain 6 columns and
upwards of 10s of billions of rows. The largest dataset is about 16 GB
on file, gzipped (!). I need to sort them according to columns 1, 2,
and 3. The setkey() function in the data.table package does this
quickly, but of
Hi all,
I'm trying to use the package read.table within a foreach loop. I'm
grabbing 500M rows of data at a time from two different files and then
doing an aggregate/tapply like function in read.table after that. I
had planned on doing a foreach loop 39 times at once for the 39 files
I have, but o
bset(ss,select=1:2))
newvec <- as.vector(as.matrix(subset(ss,select=3)))
ans[idd] <- ans[idd] + newvec
cat("OK\n")
}
ans
}
On Wed, Feb 22, 2012 at 3:20 PM, ilai wrote:
> On Tue, Feb 21, 2012 at 4:04 PM, Matthew Keller
> wrote:
>
>> X
Hi all,
SETUP:
I have pairwise data on 22 chromosomes. Data matrix X for a given
chromosome looks like this:
1 13 58 1.12
6 142 56 1.11
18 307 64 3.13
22 320 58 0.72
Where column 1 is person ID 1, column 2 is person ID 2, column 3 can
be ignored, and column 4 is how much chromosomal sharing thos
t how to accomplish this?
Thank you,
Matthew Keller
--
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do
Hi all,
I'm trying to use a parallel script on a 20 node cluster, 8 processors
per node. Each node has a name, e.g., "vm", "vm0001", etc. Within
a foreach loop, I would like R to tell me what node it is actually
running on. How can this be accomplished? Thanks!
require(doMPI)
cl <- startMPIcl
d
> 0.292 0.001 0.291
>> identical(temp3, temp)
> [1] TRUE
>> system.time(temp3 <- gsub("^(.*)\\..*", '\\1', y, perl=TRUE))
> user system elapsed
> 0.160 0.001 0.161
>
>
>
>
>
>
> On 5/29/11 7:40 PM, "jim holtman" w
hi all,
I'm full of questions today :). Thanks in advance for your help!
Here's the problem:
x <- c('18x.6','12x.9','302x.3')
I want to get a vector that is c('18x','12x','302x')
This is easily done using this code:
unlist(lapply(strsplit(x,".",fixed=TRUE),function(x) x[1]))
So far so good. T
Hi all,
My code:
x <- scan(gzfile("file"),what='integer')
x is imported, but as mode "character" rather than "integer". I know I
can do as.integer() when importing, but am still trying to figure out
why the above occurs. When I do
summary(as.integer(x)), there are no NAs introduced by coercion,
Nice, Steve - I think this will work. I'll just call Sys.getpid() at
the top of each session and then look at the .Rout files to figure out
which is related to which...
On Sat, May 28, 2011 at 1:10 PM, Steve Lianoglou
wrote:
> Hi,
>
> On Sat, May 28, 2011 at 2:48 PM, Matthew Ke
Hi all,
Perhaps this is more of a unix question, but I'll give it a try here.
I am running 9 different R processes at the same time (called from a
shell script using R CMD BATCH). When I use the top program to
monitor how they are doing, it is impossible to tell which R process
is related to whic
Not to rehash an old statistical argument, but I think David's reply
here is too strong ("In the presence of interactions there is little
point in attempting to assign meaning to individual coefficients.").
As David notes, the "simple effect" of your coefficients (e.g., a) has
an interpretation: it
I sometimes have to work with vectors/matrices with > 2^31 - 1
elements. I have found the bigmemory package to be of great help. My
lab is also going to learn sqldf package for getting bits of big data
into/out of R. Learning both of those packages should help you work
with large datasets in R.
Th
Hi Marsh,
I taught an intro to R course and have posted all the materials up on
the web: http://psych-swiki.colorado.edu:8080/LearnR.
Most learning in R comes from doing, not reading, and that's how I
structured my course. All the lectures/HWs can be done individually,
and the keys are there to c
I've found that opening a connection, and scanning (in a loop)
line-by-line, is far faster than either read.table or read.fwf. E.g,
here's a file (temp2) that has 1500 rows and 550K columns:
showConnections(all=TRUE)
con <- file("temp2",open='r')
system.time({
for (i in 0:(num.samp-1)){
new.gen[
Hi Gerald,
A matrix and an array *are* vectors that can be indexed by 2+ indices.
Thus, matrices and arrays are also limited to 2^31-1 elements. You
might check out the bigmemory package, which can help with these
issues...
Matt
On Wed, Apr 28, 2010 at 11:01 AM, wrote:
>
> Hello,
>
> I am r
Rolf: "Well then, why don't you go away and design and build your own
statistics and data analysis language/package to replace R?"
What a nice reply! The fellow is just trying to understand R. That
response reminds me of citizens of my own country who cannot abide by
any criticism of the USA: "If
Hi all,
Quickly received an answer off the list. To do this is easy. Pull it
in using e.g., scan(). Then use strsplit:
z <- '10001011010010'
strsplit(z,'')
On Sun, Apr 25, 2010 at 10:52 AM, Matthew Keller wrote:
> Hi all,
>
> Probably a rudimentary question. I
Hi all,
Probably a rudimentary question. I have a flat file that looks like
this (the real one has ~10e6 elements):
10110100101001011101011
and I want to pull that into R as a vector, but with each digit being
it's own element. There are no separators between the digits. How can
I accomplish thi
within the next few years.
Best,
Matt
On Fri, Apr 9, 2010 at 6:36 PM, Duncan Murdoch wrote:
> On 09/04/2010 7:38 PM, Matthew Keller wrote:
>>
>> Hi all,
>>
>> My institute will hopefully be working on cutting-edge genetic
>> sequencing data by the Fall of 2010.
Hi all,
My institute will hopefully be working on cutting-edge genetic
sequencing data by the Fall of 2010. The datasets will be 10's of GB
large and growing. I'd like to use R to do primary analyses. This is
OK, because we can just throw $ at the problem and get lots of RAM
running on 64 bit R. H
Hi all,
I would like to run the following from within R:
awk '{$3=$4="";gsub(" ","");print}' myfile > outfile
However, this obviously won't work:
system("awk '{$3=$4="";gsub(" ","");print}' myfile > outfile")
and this won't either:
system("awk '{$3=$4='';gsub(' ','');print}' myfile > outfile
; command):
>
> killall -1 R or killall -2 R
>
> However, this will kill every running instance of R (if you
> two or more running simultaneously), and you may not want that!
>
> Hoping this helps,
> Ted.
>
>
>
> On 15-Mar-10 20:20:29, Matthew Keller wrote:
le many years ago) creates leaks and
> unstable states.
>
> Cheers,
> Simon
>
>
>
>> Google searching suggests no solution, timeline, or anything, but the
>> problem has been annoying users for at least twelve years:
>> http://tolstoy.newcastle.edu.au/R/help/97
Hi Jay and Benilton,
Thank you both for your help. When I do not use the dimnames argument,
everything works fine:
x <- big.matrix(nrow=2e4,ncol=5e5,type='short',init=0) #18 Gb RAM used
rm(x) #18 Gb RAM used
gc() #no RAM used
However, when I use dimnames, I get this problem, reproducibly:
x <-
datasets to still use R.
Matt
On Fri, Feb 5, 2010 at 9:27 PM, Steve Lianoglou
wrote:
> Hi,
>
> On Fri, Feb 5, 2010 at 9:24 PM, Matthew Keller wrote:
>> Hi all,
>>
>> I'm on a Linux server with 48Gb RAM. I did the following:
>>
>> x <-
>> b
Hi all,
I'm on a Linux server with 48Gb RAM. I did the following:
x <-
big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
#Gets around the 2^31 issue - yeah!
in Unix, when I hit the "top" command, I see R is taking up about 18Gb
RAM, even though the object x
Hello all,
I hate to add to the daily queries regarding R's handling of large
datsets ;), but...
I read in an online powerpoint about the ff package something about
the "length of an ff object" needing to be smaller than
.Machine$integer.max. Does anyone know if this means that the # of
elements
Hi all,
I'm logging into a Debian server and running R remotely using ESS. The
steps I use to do this are below (pasted from my webpage). However,
we're having a problem whenever we want to use the help function,
e.g.,
?hist
The remote buffer gives a warning:
"WARNING: terminal is not fully func
5127
> [9] 4.857627 6.00 7.534746
>
>
> On Wed, Jun 17, 2009 at 5:54 PM, Matthew Keller wrote:
>> Hi all,
>>
>> I have a vector, most of which is missing. The data is always
>> increasing, but may do so in jumps. I would like to interpolate the
>> NAs w
Hi all,
I have a vector, most of which is missing. The data is always
increasing, but may do so in jumps. I would like to interpolate the
NAs with 'best guesses', using something like filter(), which doesn't
work due to the NAs. Here is an example:
> x <- c(2,3,NA,NA,NA,3.2,3.5,NA,NA,6,NA)
> x
[
ll provide some help to
other ESS Aquamacs Mac users who want to remotely get interactive
graphics. Best,
Matt
On Wed, May 20, 2009 at 5:07 PM, Matthew Keller wrote:
> Hi all,
>
> I figured out how to get this to work. Not saying this is the best way
> to do it, but it is working for me
perhaps
a setting?
Best,
Matt
On Wed, May 20, 2009 at 3:13 PM, Matthew Keller wrote:
> Hi all,
>
> My graduate student is logging onto my macpro and running R through
> ESS aquamacs (with Mx ssh and then Mx ess-remote). Everything is
> working fine until we get to graphing.
>
Hi all,
My graduate student is logging onto my macpro and running R through
ESS aquamacs (with Mx ssh and then Mx ess-remote). Everything is
working fine until we get to graphing.
We are trying to give him the ability to look at graphics
interactively. The ESS manual is not too helpful: "If you
Hi Johan,
Interesting question. I'm (trying) to write a lecture on this as we
speak. I'm no expert, but here are my two cents.
I think that your method works fine WHEN the sampling distribution
doesn't change its variance or shape depending on where it's centered.
Of course, for normally, t-, or
er.ca/jfox
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
>> Behalf Of Matthew Keller
>> Sent: March-11-09 6:20 PM
>> To: r-help@r-project.org
>> Subject: [R] non-positive definite ma
Hi all,
For computational reasons, I need to estimate an 18x18 polychoric
correlation matrix two variables at a time (rather than trying to
estimate them all simultaneously using ML). The resulting polychoric
correlation matrix I am getting is non-positive definite, which is
problematic because I'
Hi all,
Put me in the camp that says more information is better than less
information - even if imperfect. Interpretation can be left to those
using the data.
Also, "popular" can mean many things. An alternative to number of
times a package is downloaded would be a ratings system, where R users
c
I also work with very large datasets, and currently am using an early
2008 MacPro 2x3GHz Quad-Cpre Intel Xeon with 32 GB RAM. This works
very well, although (I'm ashamed to say, since it's partly a
reflection of my programming skills) I still run out of RAM
occasionally! But this system works well
Hello all,
I am about to send off a manuscript and, although I am fairly
confident I have used the lme function correctly, I want to be 100%
sure. Could some kind soul out there put my mind at ease?
I am simply interested in whether a predictor (SPI) is related to
height. However, there are five
Hi Joseph,
For what it is worth (which might not be that much!), I have written
down step by step instructions on my website for getting 64 bit R
working under Leopard - it should be much different with Tiger:
http://www.matthewckeller.com/html/64_bit_r_on_mac.html. I think it'll
work but there ma
Nanmdi,
I think this is simply because a lot of time is taken transforming the
matrix from logical (default when you create it) to numeric (when you
add the number to [1,1]. If you do the same thing again to [1,2], it
is done instantaneously:
> a <- matrix(nrow=1,ncol=1)
> system.time(a[1,
Yes Chuck, you're right.
Thanks for the help. It was a data.frame not a matrix (I had called
as.matrix() in my script much earlier but that line of code didn't run
because I misnamed the object!). My bad. Thanks for the help. And I'm
VERY relieved R isn't that inefficient...
Matt
On Wed, Apr 16
Hello all,
I should probably know this by now... Anyway:
I have a large matrix (dim(data) is 3000 18000). In each element are
one of the following character strings "0/0", "1/1", "1/2", "2/2". I
wanted to replace "0/0" with NA and the other three with 0,1,2
respectively. To accomplish just the f
Ken,
not sure, but you might try
data.frame(whatever1=x[,1],whatever2=y)
this should maintain the classes of the vectors. I'm guessing that y
and x are of different classes. From ?cbind:
"For the default method, a matrix combining the ... argument. The
type of a matrix result determined from t
Hi Mohamed,
You want to return the matrix - you're returning an element of the
matrix. So in your formula, insert:
return(w)
instead of
return(w[i,j])
On Feb 8, 2008 8:42 AM, mohamed nur anisah <[EMAIL PROTECTED]> wrote:
> Dear lists,
>
> I'm in my process of learning of writing a function.
Hello all,
Thank you all very much for the many helpful suggestions. I think this
discussion has been extremely informative. Rather than try to list all
these examples in my talk, I sent out a link to everyone so that they
could read the discussion for themselves. If you would like to access
my po
Hi all,
I'm giving a talk in a few days to a group of psychology faculty and
grad students re the R statistical language. Most people in my dept.
use SAS or SPSS. It occurred to me that it would be nice to have a few
concrete examples of things that are fairly straightforward to do in R
but that a
Also just MHO, but I think it is a good idea, if for no other reason
than it creates an additional incentive for people to respond and to
do so in a thoughtful way. I think the rating systems on some other
boards, and those used at commercial sites such as Amazon, testify to
the fact that people of
Tom,
Check out ?merge. Does exactly what you need
Matt
On Nov 28, 2007 11:27 AM, tom soyer <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have two sets of data that I would like to put into a data frame. But
> since they have different length, I am not sure how to do this. Here is an
> example of my dat
Maybe there is a more elegant solution, but here is one possibility:
mat1[is.na(mat1)]<-mat2[is.na(mat1)]
mat2[is.na(mat2)]<-mat1[is.na(mat2)]
(mat1+mat2)/2
On Nov 21, 2007 12:30 PM, Gregory Gentlemen <[EMAIL PROTECTED]> wrote:
> Hello fellow R users,
>
> I have a matrix computation that I imagi
Spencer,
There have been a lot of discussions on these boards re working with
large datasets in R, so looking through those will probably inform you
better than I'll be able to. So with that said...
I have been trying to work with very large datasets as well (genetic
datasets... maybe we're in th
ot;)
>sqldf("select * from myfile where rowid % 2 = 0 and rowid >= 5",
> dbname = tempfile())
>
> See example 6 on the home page:
> http://sqldf.googlecode.com
>
>
> On Nov 8, 2007 4:19 AM, Matthew Keller <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
Hi all,
Is there a way to skip non-sequential lines using the "skip" argument
in the scan function?
E.g., I have a matrix with 100 rows and 1e7 columns. I open a
connection and want to read only lines 5, 7, 9, etc [i.e.,
seq(5,99,2)]
It might seem that the syntax to do this would be something li
Alexandre,
Try rereading FAX 7.10, it explains why as.numeric() won't do it:
"In any case, do not call as.numeric() or their likes directly for the
task at hand as as.numeric() or unclass() give the internal codes"
I.e., the INTERNAL CODE of the factor is what as.numeric() is working
on rather t
Not having run your script, it looks to me like you have an extra
comma after the final element of legend.list...
On 11/5/07, Patrick Richardson <[EMAIL PROTECTED]> wrote:
> Hoping someone can offer me some assistance. I'm trying to execute a script
> and I keep getting this error message about "
Hi Oscara and Emelio,
You guys might want to do a search on the R mailing list - this issue
has been discussed and I think there's a pretty simple work-around.
Best,
Matt
On 11/4/07, Oscar Moreno <[EMAIL PROTECTED]> wrote:
> I am having the very SAME problem. I removed Version 2.5.1 before
> tr
ng to solve in more detail, it may
> help us give you a better solution.
>
> Btw, we use the filehash package with great success in accessing very large
> amounts of data.
>
> Best,
> Adrian Dragulescu
>
>
>
> On 11/1/07, Matthew Keller <[EMAIL PROTECTED]> wrote:
ssuming XCode is
> installed.
>
> B
>
> On Nov 1, 2007, at 7:34 PM, "Matthew Keller" <[EMAIL PROTECTED]>
> wrote:
>
> > Hi all,
> >
> > I've had one of my most miserable R weeks in memory. I'm trying to
> > deal with huge datasets (&
Hi all,
I've had one of my most miserable R weeks in memory. I'm trying to
deal with huge datasets (>1GB each) but am running up against those
pesky memory limits. The libraries filehash and g.data are not very
suitable for what I need. I haven't gotten into the sql thing yet.
Most recently I've b
Hi Santanu,
You can write it out to a PDF:
a.new = 1.2
value.b = seq(from = 0, to = 1, by = 0.001)
pdf('MyGraphs.pdf',width=11, height=8.5,pointsize=12, paper='special')
par(mfrow = c(3,2))
for (i in 1:length(d)){
plot(value.b, dbeta(shape1 = value.b, shape2 = d[i], a.new), main =
paste("B[",i,"]
I think what Ted is getting at is a new kind of wiki. Maybe in the
short term (unless someone out there is willing to jump in and put
this sort of thing together), it isn't practical, but as a concept, I
found it quite appealing. The problem with learning from books is that
you don't get to see the
I appreciate the input. Off-list, someone suggested that I set up a
class wiki, and have this be the first sieve. I could do some quality
control there first (perhaps sending the link to this list serve at
the end of the semester for others to check over), and then post the
final manuals on the R w
Hi all,
I will be teaching a graduate-level course on R at CU Boulder next
semester. I have a teaching idea that might also help improve the R
wiki page... I wanted to know what you all thought of it and wanted to
solicit some advice about doing it.
During the latter part of the course, students
constrain some of the observed correlations to be the same value. I
can do this sort of thing in the Mx SEM software, but wanted to do it
from within R. Any help would be very appreciated!
Matt
On 10/11/07, Matthew Keller <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I've searc
Hello,
I've searched for an answer to no avail. I am wondering if anyone
knows how to constrain certain correlations to be equal. I have family
data with 2 twins per family plus up to 2 siblings. I would like to
somehow constrain all the sibling correlations (twin-sib and sib-sib)
to be the same w
Hi Michael,
This type of thing is pretty simple to do in R (take a look at ?plot
and then ?par - very useful... necessary actually... for graphing).
One way to do it is to use the "axis" function:
my.labels <- c("a","b","c","d","e","f","g","h","i","j")
plot(1:10,rnorm(10),axes=F,ylab="whatev",xl
Hi all,
This question involves using a "for" loop to make a "decision" in a script.
I've written a rather intricate script, and near the start of it, I
want it either to do a loop (if a variable called "number.runs" > 1)
or not do a loop (if "number.runs" is 1). This is probably trivial but
I can
Hi Soren,
What I do in cases like this is just copy the function and place it
into my script at the top (or write it into its own source file and
call it from script). Best,
Matt
On 9/28/07, Søren Højsgaard <[EMAIL PROTECTED]> wrote:
> Dear List
>
> In a package I want to import the mApply funct
Is this easier?
x.index <- duplicated(x.sample)==FALSE
cbind(x.sample[x.index],y[x.index])
- Matt
On 9/28/07, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> On Fri, 2007-09-28 at 17:48 -0400, Brian Perron wrote:
> > Hello all,
> >
> > An elementary question that I am sure can be easily cracked by a
hi Riddle,
You subscript is out of bounds because this line:
for (j in seq(20,100,20)){
is incorrect - it is trying to index the 20th, 40th, 60th... column of
a matrix that has only 5 columns. Try for (j in 1:5) instead. I'm not
sure what the purpose of your script is so can't comment on what e
Hi Sumit,
Here are a couple of functions I've picked up along the way and
modified. The first lists all objects, their class, and their
dimensions (I grabbed this from the web and modified - sorry for not
acknowledging the person who first wrote it). The second is much the
same but gives their siz
79 matches
Mail list logo