Thanks for testing. We should probably include a section for this in the SparkR programming guide given how popular CSV files are in R. Feel free to open a PR for that if you get a chance.
Shivaram On Tue, Jun 2, 2015 at 2:20 PM, Eskilson,Aleksander < alek.eskil...@cerner.com> wrote: > Seems to work great in the master build. It’s really good to have this > functionality. > > Regards, > Alek Eskilson > > From: <Eskilson>, Aleksander Eskilson <alek.eskil...@cerner.com> > Date: Tuesday, June 2, 2015 at 2:59 PM > To: "shiva...@eecs.berkeley.edu" <shiva...@eecs.berkeley.edu> > Cc: Burak Yavuz <brk...@gmail.com>, "dev@spark.apache.org" < > dev@spark.apache.org> > > Subject: Re: CSV Support in SparkR > > Ah, alright, cool. I’ll rebuild and let you know. > > Thanks again, > Alek > > From: Shivaram Venkataraman <shiva...@eecs.berkeley.edu> > Reply-To: "shiva...@eecs.berkeley.edu" <shiva...@eecs.berkeley.edu> > Date: Tuesday, June 2, 2015 at 2:57 PM > To: Aleksander Eskilson <alek.eskil...@cerner.com> > Cc: "shiva...@eecs.berkeley.edu" <shiva...@eecs.berkeley.edu>, Burak > Yavuz <brk...@gmail.com>, "dev@spark.apache.org" <dev@spark.apache.org> > Subject: Re: CSV Support in SparkR > > There was a bug in the SparkContext creation that I fixed yesterday. > https://github.com/apache/spark/commit/6b44278ef7cd2a278dfa67e8393ef30775c72726 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_commit_6b44278ef7cd2a278dfa67e8393ef30775c72726&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=kO95UBEkBrQwNCQwa2x0MOiUxhLQvBQ1B2q5EDG_bt4&s=UjoHyjJhx1vf6fqNiq3P-MqcvN2FnssT16FJ8o98pF4&e=> > > > If you build from master it should be fixed. Also I think we might have > a rc4 which should have this > > Thanks > Shivaram > > On Tue, Jun 2, 2015 at 12:56 PM, Eskilson,Aleksander < > alek.eskil...@cerner.com> wrote: > >> Hey, that’s pretty convenient. Unfortunately, although the package >> seems to pull fine into the session, I’m getting class not found exceptions >> with: >> >> Caused by: org.apache.spark.SparkExcetion: Job aborted due to stage >> failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task >> 0.3 in stage 6.0: java.lang.ClassNotFoundException: >> com.databricks.spark.csv.CsvRelation$anonfun$buildScan$1 >> >> Which smells like a path issue to me, and I made sure the ivy repo was >> part of my PATH, but functions like showDF() still fail with that error. >> Did I miss a setting, or should the package inclusion in the sparkR >> execution load that in? >> >> I’ve run >> df <- read.df(sqlCtx, “./data.csv”, “com.databricks.spark.csv”, >> header=“true”, delimiter=“|”) >> showDF(df, 10) >> >> (my data is pipeline delimited, and the default SQL context is sqlCtx) >> >> Thanks, >> Alek >> >> From: Shivaram Venkataraman <shiva...@eecs.berkeley.edu> >> Reply-To: "shiva...@eecs.berkeley.edu" <shiva...@eecs.berkeley.edu> >> Date: Tuesday, June 2, 2015 at 2:08 PM >> To: Burak Yavuz <brk...@gmail.com> >> Cc: Aleksander Eskilson <alek.eskil...@cerner.com>, "dev@spark.apache.org" >> <dev@spark.apache.org>, Shivaram Venkataraman <shiva...@eecs.berkeley.edu >> > >> Subject: Re: CSV Support in SparkR >> >> Hi Alek >> >> As Burak said, you can already use the spark-csv with SparkR in the 1.4 >> release. So right now I use it with something like this >> >> # Launch SparkR >> ./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3 >> df <- read.df(sqlContext, "./nycflights13.csv", >> "com.databricks.spark.csv", header="true") >> >> You can also pass in other options to the spark csv as arguments to >> `read.df`. Let us know if this works >> >> Thanks >> Shivaram >> >> >> On Tue, Jun 2, 2015 at 12:03 PM, Burak Yavuz <brk...@gmail.com> wrote: >> >>> Hi, >>> >>> cc'ing Shivaram here, because he worked on this yesterday. >>> >>> If I'm not mistaken, you can use the following workflow: >>> ```./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3``` >>> >>> and then >>> >>> ```df <- read.df(sqlContext, "/data", "csv", header = "true")``` >>> >>> Best, >>> Burak >>> >>> On Tue, Jun 2, 2015 at 11:52 AM, Eskilson,Aleksander < >>> alek.eskil...@cerner.com> wrote: >>> >>>> Are there any intentions to provide first class support for CSV files >>>> as one of the loadable file types in SparkR? Data brick’s spark-csv API [1] >>>> has support for SQL, Python, and Java/Scala, and implements most of the >>>> arguments of R’s read.table API [2], but currently there is no way to load >>>> CSV data in SparkR (1.4.0) besides separating our headers from the data, >>>> loading into an RDD, splitting by our delimiter, and then converting to a >>>> SparkR Data Frame with a vector of the columns gathered from the header. >>>> >>>> Regards, >>>> Alek Eskilson >>>> >>>> [1] -- https://github.com/databricks/spark-csv >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=mPtlFYdyx5Rp7pZr-bQ15QMIrq4qE26ECfJCzoMwYhI&s=wT5PU54lVmR2R_o3GidPhDQD9kMMNVYotZEqCd4ASm4&e=> >>>> [2] -- http://www.inside-r.org/r-doc/utils/read.table >>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.inside-2Dr.org_r-2Ddoc_utils_read.table&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=mPtlFYdyx5Rp7pZr-bQ15QMIrq4qE26ECfJCzoMwYhI&s=h87nnmV5D3soOFo5wasj1J34zbhvukHd1WcSitsjB6s&e=> >>>> CONFIDENTIALITY NOTICE This message and any included attachments are >>>> from Cerner Corporation and are intended only for the addressee. The >>>> information contained in this message is confidential and may constitute >>>> inside or non-public information under international, federal, or state >>>> securities laws. Unauthorized forwarding, printing, copying, distribution, >>>> or use of such information is strictly prohibited and may be unlawful. If >>>> you are not the addressee, please promptly delete this message and notify >>>> the sender of the delivery error by e-mail or you may call Cerner's >>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024 >>>> . >>>> >>> >>> >> >