Re: CSV Support in SparkR

Shivaram Venkataraman Tue, 02 Jun 2015 13:00:18 -0700

There was a bug in the SparkContext creation that I fixed yesterday.
https://github.com/apache/spark/commit/6b44278ef7cd2a278dfa67e8393ef30775c72726



If you build from master it should be fixed. Also I think we might have a
rc4 which should have this

Thanks
Shivaram

On Tue, Jun 2, 2015 at 12:56 PM, Eskilson,Aleksander <
alek.eskil...@cerner.com> wrote:

>  Hey, that’s pretty convenient. Unfortunately, although the package seems
> to pull fine into the session, I’m getting class not found exceptions with:
>
>  Caused by: org.apache.spark.SparkExcetion: Job aborted due to stage
> failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task
> 0.3 in stage 6.0: java.lang.ClassNotFoundException:
> com.databricks.spark.csv.CsvRelation$anonfun$buildScan$1
>
>  Which smells like a path issue to me, and I made sure the ivy repo was
> part of my PATH, but functions like showDF() still fail with that error.
> Did I miss a setting, or should the package inclusion in the sparkR
> execution load that in?
>
>  I’ve run
> df <- read.df(sqlCtx, “./data.csv”, “com.databricks.spark.csv”,
> header=“true”, delimiter=“|”)
> showDF(df, 10)
>
>  (my data is pipeline delimited, and the default SQL context is sqlCtx)
>
>  Thanks,
> Alek
>
>   From: Shivaram Venkataraman <shiva...@eecs.berkeley.edu>
> Reply-To: "shiva...@eecs.berkeley.edu" <shiva...@eecs.berkeley.edu>
> Date: Tuesday, June 2, 2015 at 2:08 PM
> To: Burak Yavuz <brk...@gmail.com>
> Cc: Aleksander Eskilson <alek.eskil...@cerner.com>, "dev@spark.apache.org"
> <dev@spark.apache.org>, Shivaram Venkataraman <shiva...@eecs.berkeley.edu>
> Subject: Re: CSV Support in SparkR
>
>   Hi Alek
>
>  As Burak said, you can already use the spark-csv with SparkR in the 1.4
> release. So right now I use it with something like this
>
>  # Launch SparkR
> ./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3
>  df <- read.df(sqlContext, "./nycflights13.csv",
> "com.databricks.spark.csv", header="true")
>
>  You can also pass in other options to the spark csv as arguments to
> `read.df`. Let us know if this works
>
>  Thanks
> Shivaram
>
>
> On Tue, Jun 2, 2015 at 12:03 PM, Burak Yavuz <brk...@gmail.com> wrote:
>
>> Hi,
>>
>>  cc'ing Shivaram here, because he worked on this yesterday.
>>
>>  If I'm not mistaken, you can use the following workflow:
>>  ```./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3```
>>
>>  and then
>>
>>  ```df <- read.df(sqlContext, "/data", "csv", header = "true")```
>>
>>  Best,
>> Burak
>>
>> On Tue, Jun 2, 2015 at 11:52 AM, Eskilson,Aleksander <
>> alek.eskil...@cerner.com> wrote:
>>
>>>  Are there any intentions to provide first class support for CSV files
>>> as one of the loadable file types in SparkR? Data brick’s spark-csv API [1]
>>> has support for SQL, Python, and Java/Scala, and implements most of the
>>> arguments of R’s read.table API [2], but currently there is no way to load
>>> CSV data in SparkR (1.4.0) besides separating our headers from the data,
>>> loading into an RDD, splitting by our delimiter, and then converting to a
>>> SparkR Data Frame with a vector of the columns gathered from the header.
>>>
>>>  Regards,
>>> Alek Eskilson
>>>
>>>  [1] -- https://github.com/databricks/spark-csv
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dcsv&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=mPtlFYdyx5Rp7pZr-bQ15QMIrq4qE26ECfJCzoMwYhI&s=wT5PU54lVmR2R_o3GidPhDQD9kMMNVYotZEqCd4ASm4&e=>
>>> [2] -- http://www.inside-r.org/r-doc/utils/read.table
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.inside-2Dr.org_r-2Ddoc_utils_read.table&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=mPtlFYdyx5Rp7pZr-bQ15QMIrq4qE26ECfJCzoMwYhI&s=h87nnmV5D3soOFo5wasj1J34zbhvukHd1WcSitsjB6s&e=>
>>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>> from Cerner Corporation and are intended only for the addressee. The
>>> information contained in this message is confidential and may constitute
>>> inside or non-public information under international, federal, or state
>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>> or use of such information is strictly prohibited and may be unlawful. If
>>> you are not the addressee, please promptly delete this message and notify
>>> the sender of the delivery error by e-mail or you may call Cerner's
>>> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>>>
>>
>>
>

Re: CSV Support in SparkR

Reply via email to