Ha! -- A bug! "Corrected" version inline below: Bert Gunter On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter <bgunter.4...@gmail.com> wrote:
> Brute force approach, possibly inefficient: > > 1. You have a vector of file names. Sort them in the appropriate (time) > order. These names are also the component names of all the data frames in > your list that you read in, call it yourlist. > > 2. Create a vector of all the unique ticker names, perhaps by creating a > vector of all the names and then unique() -ing it. Call this vector snames > with n.names in it. It will probably have length several hundred at least I > assume. > > 3. Suppose the 6 columns of data of each data frame that you want are > named cnames = c("stocknames","Open", "High", "Low", "Close", "Volume"). > > 4. You could proceed as you suggested, but it would likely be more > efficient, since all data that you want are numeric, to create a 3D array > of NA's via: > > yourdat <- array(dim = c(n.dates, n.names, 5), dimnames = list(NULL, > snames, cnames[-1])) > > 5. Then just loop through your list of files and use indexing to fill in > the columns x category slices for each date. Stocks that are missing will > be NA automatically. e.g. (warning: UNTESTED): > > For date "d", let df be the data frame from date "d" in your list, i.e. > > df <- yourlist[["d"]][, cnames] > ## Note The order of the listed stocks in the "stocknames" column can be > different from frame to frame of your master list. > > Then fill in the flat for the dth date (i.e. dth row) in your array by: > > ## corrected line here: > yourdat[ ,df[ ,"stocknames"], cnames[-1] <- as.matrix(df[ ,-1]) ## need > to omit the column names so it converts to numeric matrix > ## need to get the names of the stocks in the "stocknames" column in the order they appear in df. > > This should fill in the values of the 2nd and 3rd dimensions of the array > for all the stocks on the dth date with the data for each stock in the data > frame matched to the appropriate column in the array. > > The entire loop will give all dates for all stocks and all categories with > NA's for missing days. (*IF IT WORKS!*) > You may need to modify this sightly if, for example, your stock names are > row names rather than a field in your data frame. I leave such adjustments > to you. > > Note again that this is fairly elementary with just arrays and indexing. > Basic tutorials should tell you about all of this. Also, when plotting, > you'll have to convert your dates to suitable date-time format. > > Cheers, > Bert > > > > > On Thu, Nov 14, 2019 at 4:55 PM Nhan La <lathanhn...@gmail.com> wrote: > >> Hi Bert, >> >> I've attempted to find the answer and actually been able to import the >> individual data sets into a list of data frames. >> >> But I'm not sure how to go ahead with the next step. I'm not necessarily >> asking for a final answer. Perhaps if you (I mean others as well) would >> like a constructive coaching, you would suggest a few key words to look at? >> >> Sorry for the HTML thing, this is my first post. I'll do better next >> times. >> >> Thanks, >> Nathan >> >> >> >> On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter <bgunter.4...@gmail.com> >> wrote: >> >>> So you've made no attempt at all to do this for yourself?! >>> >>> That suggests to me that you need to spend time with some R tutorials. >>> >>> Also, please post in plain text on this plain text list. HTML can get >>> mangled, as it may have here. >>> >>> -- Bert >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhn...@gmail.com> wrote: >>> >>>> I have many separate data files in csv format for a lot of daily stock >>>> prices. Over a few years there are hundreds of those data files, whose >>>> names are the dates of data record. >>>> >>>> In each file there are variables of ticker (or stock trading code), >>>> date, >>>> open price, high price, low price, close price, and trading volume. For >>>> example, inside a data file named 20150128.txt it looks like this: >>>> >>>> FB,20150128,1.075,1.075,0.97,0.97,725221 >>>> AAPL,20150128,2.24,2.24,2.2,2.24,63682 >>>> AMZN,20150128,0.4,0.415,0.4,0.415,194900 >>>> NFLX,20150128,50.19,50.21,50.19,50.19,761845 >>>> GOOGL,20150128,1.62,1.645,1.59,1.63,684835 ...................and many >>>> more.................. >>>> >>>> In case it's relevant, the number of stocks in these files are not >>>> necessarily the same (so there will be missing data). I need to import >>>> and >>>> create 5 separate time series data frames from those files, one each for >>>> Open, High, Low, Close and Volume. In each data frame, rows are indexed >>>> by >>>> date, and columns by ticker. For example, the data frame Open may look >>>> like >>>> this: >>>> >>>> DATE,FB,AAPL,AMZN,NFLX,GOOGL,... 20150128,1.5,2.2,0.4,5.1,1.6,... >>>> 20150129,NA,2.3,0.5,5.2,1.7,... ... >>>> >>>> What will be an efficient way to do that? I've used the following codes >>>> to >>>> read the files into a list of data frames but don't know what to do next >>>> from here. >>>> >>>> files = list.files(pattern="*.txt") mydata = lapply(files, >>>> read.csv,head=FALSE) >>>> >>>> Thanks, >>>> >>>> Nathan >>>> >>>> Disclaimer: In case it's relevant, this question is also posted on >>>> stackoverflow. >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.