Hi, On Wed, Sep 7, 2011 at 5:25 AM, Divyam <divyamural...@gmail.com> wrote: > Hi, > > I am new to R and here is what I am doing in it now. I am using machine > learning technique (svm) to do predictive modeling. The data that I am using > is one that is bound to grow perpetually. what I want to know is, say, I fed > in a data set with 5000 data points to svm initially. The algorithm derives > a certain intelligence (i.e.,output) based on these 5000 data points. I > have an additional 10000 data points today. Now if i remove the first fed > 5000 data and then feed in this new additional 10000 data, I want the > algorithm to make use of the intelligence derived from the initial data(5000 > data points) too while evaluating the new delta data points(10000) and the > end result to be an aggregated measure of the total 15000 data. This is > important to me from an efficiency point of view. If there are any other > packages in r that does the same (i.e., enable statistical models to learn > from the past experience continuously while deleting the prior data used > from which the intelligence is derived) kindly post about them. This will be > of immense help to me.
I'm not sure that I understand what you mean ... maybe because some of the terminology you are using is a bit nonstandard. If you want the predictive model you build to be "immediately effective" and learn from new data later, you can: (1) Train an SVM on the data you have now (ie. do it "offline"). Use this for future/new data. At some point in the future, retrain your SVM on all of the data you have available to you (or some subset of it) -- again, offline. You can see if your new SVM outperforms your old one on your new data to see where your point of diminishing returns is: when it stops making sense to try to learn a new model after you have x many data points already. (2) You can look into "online learning" methods -- search google for online svms and other online methods that might interest you (if you're not married to the SVM). For what it's worth, you mention "extremely large data," but not sure what you mean (certainly 10k datapoints isn't that). If you *really* mean "big data," and you want to explore online learning, take a look at vowpal wabbit: http://hunch.net/~vw/code.html https://github.com/JohnLangford/vowpal_wabbit That's not R though. The recent 1.0 release of the shogun-toolbox includes support for online learning, too (with vw I believe): http://www.shogun-toolbox.org/ It has an R interface of different flavors, but might be a bit painful to use through it (I'm working on making a better one on my spare time, but not too much of that lately). If the features in shogun strike your fancy, from what I understand the best supported way to use it is through its "python_modular" interface. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.