Hello,
Sometimes sqldf::sqldf tends to save memory. Maybe if you try
library(sqldf)
sqldf('select l4.*, asign.gene, asign.chr_pos, asign.`p.val.Retina`
from l4
inner join asign
on X1 = asign.chr and X2 = asign.pos')
Or you can filter the rows that match first, then merge the results.
Something along the lines of
# read in only the columns needed with fread, it's fast
l4join <- data.table::fread(l4_file, select = c("X1", "X2"))
ajoin <- data.table::fread(asign_file, select = c("chr", "pos"))
# create indices with the matches on both sides
i1 <- (l4join$X1 %in% ajoin$chr) & (l4join$X2 %in% ajoin$pos)
i2 <- (ajoin$chr %in% l4join$X1) & (ajoin$pos %in% l4join$X2)
rm(l4join, ajoin) # don't need this any more, remove them
# now the real fread's
l4 <- data.table::fread(l4_file)
asign <- data.table::fread(asign_file)
# extract the relevant rows and merge
res <- l4[i1, ]
res2 <- asign[i2, setdiff(names(asign), names(l4))]
merge(res, res2, by.x = c("X1", "X2"), by.y = c("chr", "pos"))
Hope this helps,
Rui Barradas
Às 00:08 de 24/10/19, Ana Marija escreveu:
Hi Jim,
I think one of the issue is that data frames are so big,
dim(l4)
[1] 166941635 8
dim(asign)
[1] 107371528 5
so my example would not reproduce the error
On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <drjimle...@gmail.com> wrote:
Hi Ana,
When I run this example taken from your email:
l4<-read.table(text="X1 X2 X3 X4 X5 variant_id pval_nominal gene_id.LCL
chr1 13550 G A b38 1:13550:G:A 0.375614 ENSG00000227232
chr1 14671 G C b38 1:14671:G:C 0.474708 ENSG00000227232
chr1 14677 G A b38 1:14677:G:A 0.699887 ENSG00000227232
chr1 16841 G T b38 1:16841:G:T 0.127895 ENSG00000227232
chr1 16856 A G b38 1:16856:A:G 0.627822 ENSG00000227232
chr1 17005 A G b38 1:17005:A:G 0.802803 ENSG00000227232",
header=TRUE,stringsAsFactors=FALSE)
asign<-read.table(text="gene chr chr_pos pos p.val.Retina
ENSG00000227232 chr1 1:10177:A:AC 10177 0.381708
ENSG00000227232 chr1 rs145072688:10352:T:TA 10352 0.959523
ENSG00000227232 chr1 1:11008:C:G 11008 0.218132
ENSG00000227232 chr1 1:11012:C:G 11012 0.218132
ENSG00000227232 chr1 1:13110:G:A 13110 0.998262
ENSG00000227232 chr1 rs201725126:13116:T:G 13116 0.438572",
header=TRUE,stringsAsFactors=FALSE)
merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
[1] X1 X2 X3 X4 X5
[6] variant_id pval_nominal gene_id.LCL gene chr_pos
[11] p.val.Retina
<0 rows> (or 0-length row.names)
It works okay, but there are no matches in the join. So I can't even
guess what the problem is.
Jim
On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <sokovic.anamar...@gmail.com> wrote:
Hello,
I have two data frames like this:
head(l4)
X1 X2 X3 X4 X5 variant_id pval_nominal gene_id.LCL
1 chr1 13550 G A b38 1:13550:G:A 0.375614 ENSG00000227232
2 chr1 14671 G C b38 1:14671:G:C 0.474708 ENSG00000227232
3 chr1 14677 G A b38 1:14677:G:A 0.699887 ENSG00000227232
4 chr1 16841 G T b38 1:16841:G:T 0.127895 ENSG00000227232
5 chr1 16856 A G b38 1:16856:A:G 0.627822 ENSG00000227232
6 chr1 17005 A G b38 1:17005:A:G 0.802803 ENSG00000227232
head(asign)
gene chr chr_pos pos p.val.Retina
1: ENSG00000227232 chr1 1:10177:A:AC 10177 0.381708
2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352 0.959523
3: ENSG00000227232 chr1 1:11008:C:G 11008 0.218132
4: ENSG00000227232 chr1 1:11012:C:G 11012 0.218132
5: ENSG00000227232 chr1 1:13110:G:A 13110 0.998262
6: ENSG00000227232 chr1 rs201725126:13116:T:G 13116 0.438572
m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr", :
negative length vectors are not allowed
sapply(l4,class)
X1 X2 X3 X4 X5 variant_id
"character" "character" "character" "character" "character" "character"
pval_nominal gene_id.LCL
"numeric" "character"
sapply(asign,class)
gene chr chr_pos pos p.val.Retina
"character" "character" "character" "character" "character"
Please advise as to why I am getting this error when merging?
Thanks
Ana
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.