Hi again, anyone in this group tried to access SAS dataset through Spark SQL ? Thank you
Regards, Ajay On Friday, June 10, 2016, Ajay Chander <[email protected]> wrote: > Hi Spark Users, > > I hope everyone here are doing great. > > I am trying to read data from SAS through Spark SQL and write into HDFS. > Initially, I started with pure java program please find the program and > logs in the attached file sas_pure_java.txt . My program ran successfully > and it returned the data from Sas to Spark_SQL. Please note the > highlighted part in the log. > > My SAS dataset has 4 rows, > > Program ran successfully. So my output is, > > [2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT > a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result set > 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590) > > [2016-06-10 10:35:21,630] INFO rs(1.1.1)#next (first call to next); time= > 0.045 secs (com.sas.rio.MVAResultSet:773) > > 1,'2016-01-01','2016-01-31' > > 2,'2016-02-01','2016-02-29' > > 3,'2016-03-01','2016-03-31' > > 4,'2016-04-01','2016-04-30' > > > Please find the full logs attached to this email in file sas_pure_java.txt. > > _______________________ > > > Now I am trying to do the same via Spark SQL. Please find my program and > logs attached to this email in file sas_spark_sql.txt . > > Connection to SAS dataset is established successfully. But please note > the highlighted log below. > > [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT > "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement > 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538) > > [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT > "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set > 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) > Please find the full logs attached to this email in file > sas_spark_sql.txt > > I am using same driver in both pure java and spark sql programs. But the > query generated in spark sql has quotes around the column names(Highlighted > above). > So my resulting output for that query is like this, > > +-----+--------+------+ > | _c0| _c1| _c2| > +-----+--------+------+ > |SR_NO|start_dt|end_dt| > |SR_NO|start_dt|end_dt| > |SR_NO|start_dt|end_dt| > |SR_NO|start_dt|end_dt| > +-----+--------+------+ > > Since both programs are using the same driver com.sas.rio.MVADriver . > Expected output should be same as my pure java programs output. But > something else is happening behind the scenes. > > Any insights on this issue. Thanks for your time. > > > Regards, > > Ajay >
