Dear all,
I am new in R and I have been faced with the following problem, that slows
me down a lot.  I am short of ideas to circumvent it. So, any help would be
highly appreciated:

I have 2 dataframes x and y.  x is very big (70 million observations),
whereas y is smaller (300000 observations).
All the observations of y are present in x. But y has one additional
variable that I would like to incorporate to the dataframe x.

For instance, imagine they have the following variable names:
colnames(x)<- c("V1", "V2", "V3", "V4") and colnames(y)<- c("V1", "V2",
"V5")

-Since the observations of y are present in x, my strategy was to merge x
and y so that the dataframe x would get the values of the variable V5 for
the observations that are both in x and y.

-So, I did the following:
dat<- merge(x, y, all=TRUE).

On a small example, it works fine. The only problem is that when I apply it
to my big dataframe x, it really take for ever (several days and not done
yet) and I have a very  fast computer. So, I don't know whether I should
stop now or keep on waiting.

Does anyone have any idea to perform this operation in a more efficient way
(in terms of computation time)?
In addition, does anyone know how to incoporate some sort of counter in a
program to check what how much work has been done at a given point of time?

Any comments are very welcome,
Thanks,

Best,
Aurelien

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to