-Original Message-
> *From: *ÐΞ€ρ@Ҝ (๏̯͡๏) [deepuj...@gmail.com]
> *Sent: *Thursday, August 06, 2015 12:41 AM Eastern Standard Time
> *To: *Philip Weaver
> *Cc: *user
> *Subject: *Re: How to read gzip data in Spark - Simple question
>
> how do i persist the RDD to HDFS ?
>
> On Wed,
d Time
To: Philip Weaver
Cc: user
Subject: Re: How to read gzip data in Spark - Simple question
how do i persist the RDD to HDFS ?
On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver
mailto:philip.wea...@gmail.com>> wrote:
This message means that java.util.Date is not supported by Spark DataFr
I encourage you to find the answer this this on your own :).
On Wed, Aug 5, 2015 at 9:43 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> Code:
>
> val summary = rowStructText.map(s => s.split(",")).map(
> {
> s =>
> Summary(formatStringAsDate(s(0)),
> s(1).replaceAll("\"", "").toLong,
>
Code:
val summary = rowStructText.map(s => s.split(",")).map(
{
s =>
Summary(formatStringAsDate(s(0)),
s(1).replaceAll("\"", "").toLong,
s(3).replaceAll("\"", "").toLong,
s(4).replaceAll("\"", "").toInt,
s(5).replaceAll("\"", ""),
how do i persist the RDD to HDFS ?
On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver
wrote:
> This message means that java.util.Date is not supported by Spark
> DataFrame. You'll need to use java.sql.Date, I believe.
>
> On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>
>> That seem to be work
This message means that java.util.Date is not supported by Spark DataFrame.
You'll need to use java.sql.Date, I believe.
On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> That seem to be working. however i see a new exception
>
> Code:
> def formatStringAsDate(dateStr: String) = new
> Simpl
That seem to be working. however i see a new exception
Code:
def formatStringAsDate(dateStr: String) = new
SimpleDateFormat("-MM-dd").parse(dateStr)
//(2015-07-27,12459,,31242,6,Daily,-999,2099-01-01,2099-01-02,1,0,0.1,0,1,-1,isGeo,,,204,694.0,1.9236856708701322E-4,0.0,-4.48,0.0,0.0,0.0,)
val
The parallelize method does not read the contents of a file. It simply
takes a collection and distributes it to the cluster. In this case, the
String is a collection 67 characters.
Use sc.textFile instead of sc.parallelize, and it should work as you want.
On Wed, Aug 5, 2015 at 8:12 PM, ÐΞ€ρ@Ҝ (๏
I have csv data that is embedded in gzip format on HDFS.
*With Pig*
a = load
'/user/zeppelin/aggregatedsummary/2015/08/03/regular/part-m-3.gz' using
PigStorage();
b = limit a 10
(2015-07-27,12459,,31243,6,Daily,-999,2099-01-01,2099-01-02,4,0,0.1,0,1,203,4810370.0,1.4090459061723766,1.01