I am trying something I haven't attempted before and the available documentation doesn't quite answer my questions (at least in a way I can understand). My usual course of action would be to extract my data from my DB, do whatever manipulation is necessary, either manually or using a C++ program, and then import the data into R. Now I need to try to do it all within R+RMySQL+MySQL.
I just managed to connect to MySQL and retrieve data using RMySQL as follows: > library(DBI) > library(RMySQL) > MySQL(max.con = 16, fetch.default.rec = 500, force.reload = F) <MySQLDriver:(3800)> > m <- dbDriver("MySQL") > con <- dbConnect(m, user="rejbyers", password = "jesakos", > host="localhost", dbname = "merchants2") > rs <- dbSendQuery(con, "select * from merchants") > df <- fetch(rs, n = 150) > df And of course, that last statement is followed by the entire contents of "merchants" Now, I have a script like the following: refdata18 = read.csv("K:\\MerchantData\\RiskModel\\ndays18.csv", na.strings="") x1 = refdata18[,1] library(MASS) ex1 = fitdistr(x1,"exponential") str(ex1) Now, the contents of ndaysXX.csv represent records where one of the date values is in week XX of the current year. We don't yet have data spanning multiple years, and will have to modify the SQL that gets the data accordingly. At present, my SQL statement groups records by WEEK of the year, and then I manually separate weeks in a CSV file outside the DB. Suppose I make a query like: SELECT ndays FROM xxxxxxx GROUP BY WEEK(tdate); There is no a priori of knowing just how many weeks of data there are. My reason for asking is I see information in the documentation about "dbApply(RMySQL)" which says: "Applies R functions to groups of remote DBMS rows without bringing an entire result set all at once. The result set is expected to be sorted by the grouping field." There is an example, but the example doesn't make much sense (the query used, for example, does not contain a GROUP BY clause). I can easily set up a table that could be used to manage the output I need (primarily the rate value estimated for each week, and the SD of the estimate), but at present I am at a loss as to how to proceed to set this up. Can some kind soul out there give me rather pedantic instructions on how to use RMySQL to apply, in my case fitdistr, independantly to each group of values returned by my simplistic SQL query above, and insert the rate and sd into another table? I know I can handle all this using a perl script to create a suite of temporary files, and process them one by one, but I have also been advised to try to use R instead of Perl for this kind of task. A slightly related question is this: "Assuming I can get this all working from within R, how would I make it a scheduled task on the one hand, or, on the other hand, run it on demand from an event on a web page (which at present is made using a combination of PHP, Apache's httpd server and MySQL, if that matters)?" Of course, if I can make such an R script (or even store it as a function) there should be no memory from one instance to another, because the same analysis would have to be done on different users' data. Thanks Ted -- View this message in context: http://www.nabble.com/Applying-an-R-script-to-data-within-MySQL---How-to--tp19888407p19888407.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.