On Feb 15, 2011, at 5:20 PM, Sam Steingold wrote:
I am trying to get stock metadata from Yahoo finance (or maybe there
is
a better source?)
here is what I did so far:
yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some
samples
socket <-
url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
data <- read.csv(socket, header = FALSE);
close(socket);
data is now:
V1 V2 V3 V4
1 200.5B 116.00 166.25 4965150
2 19.1M 3.75 5.47 8521
3 226.6B 22.73 31.58 57127000
4 886.4M 30.80 74.54 226690
5 142.4B 3.21 5.15 541804992
6 276.4M 11.98 21.30 149656
7 55.823B 9.75 18.97 89369000
now I need to do this:
--> convert 55.823B to 55e9 and 19.1M to 19e6
parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B
$","e9",s))); }
seems like awfully inefficient (two regexp substitutions),
is there a better way?
I haven't come up with a better approach at least for a two
substitution task, having considered using strapply from pkg gsubfn
but deciding it would be just as much, if not more, code. But why are
you using lapply on a single vector. Why not:
data[1] <- parse.num( data[[1]] ) # as.numeric and gsub are vectorized
--> iterate over stocks & data at the same time and put the results
into
a hash table:
for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];
I do get the right results,
but I am wondering if I am doing it "the right R way".
E.g., the hash table value is a data frame.
A structure(record?) seems more appropriate.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.