R Version : 2.14.1 x64
Running on Windows 7

Connecting to a database on a remote Microsoft SQL Server 2012

The short form of my problem is the following.

I have an unordered vectors of names, say:

names<-c("A", "B", "A", "C","C")

each of which have an id in a table in my db.  I need to convert the names to 
their corresponding ids.

I currently have the following code to do it.
###
names<-c("A", "B", "A", "C","C")
dbConn<-odbcDriverConnect(connection="connection string") #successfully connects

nameToID<-function(name, dbConn){
                #dbConn : active db connection formed via odbcDriverConnect
                #name     : a char string

                sqlQuery(dbConn, paste("select id from table where name='", 
name, "'", sep=""))
}
sapply(names, nameToID, dbConn=dbConn)
###

Barring better ways to do this, which could involve loading the table into R 
then working with the problem there (which is possible), I understand why the 
following doesn't work, but I cannot seem to find a solution.  Attempting to 
use parallelization via the package 'parallel' :

###
names<-c("A", "B", "A", "C","C")
dbConn<-odbcDriverConnect(connection="connection string") #successfully connects

nameToID<-function(name, dbConn){
                #dbConn : active db connection formed via odbcDriverConnect
                #name     : a char string

                sqlQuery(dbConn, paste("select id from table where name='", 
name, "'", sep=""))
}

mc<-detectCores()
cl<-makeCluster(mc)
clusterExport(cl, c("sqlQuery", "dbConn"))
parSapply(cl, names, nameToID, dbConn=dbConn)    #incorrect passing of 
nameToID's second argument
###

As in the comment, this is not the correct way to assign the second argument to 
nameToID.

I have also tried the following:

parSapply(cl, names, function(x) nameToID(x, dbConn))

in place of the previous parSapply call, but that also does not work, with the 
error being thrown saying "the first parameter is not an open RODBC 
connection", presumably referring to the first parameter of the sqlQuery()

The following code does work with parallization.

###
names<-c("A", "B", "A", "C","C")
dbConn<-odbcDriverConnect(connection="connection string") #successfully connects
nameToID<-function(name){
                #name     : a char string
                dbConn<-odbcDriverConnect(connection="string")
                result<-sqlQuery(dbConn, paste("select id from table where 
name='", name, "'", sep=""))
                odbcClose(dbConn)
                result
}

mc<-detectCores()
cl<-makeCluster(mc)
clusterExport(cl, c("sqlQuery", "odbcDriverConnect", "odbcClose", "dbConn", 
"nameToID"))      #throwing everything in
parSapply(cl, names, nameToID)
###

But the constant opening and closing a ton of the gains from parallelization, 
and seems just a bit silly.

So the overall question would be how to pass the second parameter (the open db 
connection) to the function within parSapply, in much the same way as it is 
done in the regular apply?  In general, how does one pass a second, third, nth 
parameter to a function within a parallel routine?

Thanks and if you need any more information let me know.

-DT



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to