On 07/21/2014 09:24 PM, John McKown wrote:
I'm designing an R based application for my boss. It's not much, but
it might save him some time. What it will be doing is reading data
from an MS-SQL database and creating a number of graphs. At present,
he must log into one server to run a vendor application to display the
data in a grid. He then cuts this data and pastes it into an Excel
spreadsheet. He then generates some graphs in Excel. Which he then
cuts and pastes into a Power Point presentation. Which is the end
result for distribution to others up the food chain.
What I would like to do is read the MS-SQL data base using RODBC and
create the graphs using ggplot2 instead of using Excel. I may end up
being told to create an Excel file as well.
My real question is organizing the R programs to do this. Basically
what I was thinking of was a "master" program. It does the ODBC work
and fetches the data into one, or more, data.frames. I was then
thinking that it would be better to have separate source files for
each graph produced. I would use the source() function in the "master"
R program to load & execute each one in order. Is this a decent
origination? Or would it be better for each "create a graph" R file to
really just define a unique function which the "master" program would
then invoke? I guess this latter would be a good way to keep the
workspace "clean" since all the variables in the functinon would "go
away" when the function ended.
I guess what I'm asking is how others organize the R applications. Oh,
I plan for this to be run by my boss by double clicking on the
"master" R source file, which I will associate with the Rscript
program in Windows. Yes, this is Windows based <sigh/>.
Appreciate your thoughts. Especially if I'm really off track.
John,
Your original plan is simple and straight forward, and is similar to
many of the processes that I have automated using R. I use both RJDBC
and RODBC to connect to a database. I have wrapper JDBC and ODBC.Pull
scripts that are wrappers to the other libraries that accept the
connection parameters and a sql string. These will return a data.frame,
and if needed set the mode/class of the columns according to the
database meta data. You can create a master job that call a function to
pull the data you need, calls plotting functions, and generates graphs,
tabular reports, and as suggested Excel files. I use xlsx to build
Excel reports. I use hwriter to generate html reports that I publish.
I have a small function that loads when I start R call setSources().
This allows me load my custom code from different directories like using
library() or require().
Keep it simple until you are comfortable with your results.
1. Master Job script (loaded at startup)
2. Methods for each sub process
data pull, plots, etc.
3. An email to or other notification that the process is completed (if
you want it to run and have your boss log into a share point or other BI
delivery portal).
Good luck
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.