ing
`spark.read.parquet` API to read parquet files directly. Spark has
partition-awareness for partitioned directories.
But still, I would like to know if there is a way to leverage
partition-awareness via Hive by using `spark.sql` API?
Any help is highly appreciated!
Thank you.
--
Hao Ren
Ints are
>> serializable?
>>
>>
>>
>> Just thinking out loud
>>
>>
>>
>> Simon Scott
>>
>>
>>
>> Research Developer @ viavisolutions.com
>>
>>
>>
>> *From:* Hao Ren [mailto:inv...@gmail.com]
>> *Sent:
Yes, it is.
You can define a udf like that.
Basically, it's a udf Int => Int which is a closure contains a non
serializable object.
The latter should cause Task not serializable exception.
Hao
On Mon, Aug 8, 2016 at 5:08 AM, Muthu Jayakumar wrote:
> Hello Hao Ren,
>
>
uot;key" === 2).show() // *It does not work as expected
(org.apache.spark.SparkException: Task not serializable)*
}
run()
}
Also, I tried collect(), count(), first(), limit(). All of them worked
without non-serializable exceptions.
It seems only filter() throws the exception
?
--
Hao Ren
Data Engineer @ leboncoin
Paris, France
ich is implied as context bound, Java does not have the
> equivalence, so here change the java class to the ClassTag, and make it as
> implicit value, it will be used by createDirectStream.
>
>
> Thanks
> Saisai
>
>
> On Thu, Dec 17, 2015 at 9:49 PM, Hao Ren wrote:
>
>&
cordClass)
val cleanedHandler = jssc.sparkContext.clean(messageHandler.call _)
createDirectStream[K, V, KD, VD, R](
jssc.ssc,
Map(kafkaParams.toSeq: _*),
Map(fromOffsets.mapValues { _.longValue() }.toSeq: _*),
cleanedHandler
)
}
--
Hao Ren
Data Engineer @ leboncoin
Paris, France
rk/sql/catalyst/expressions/complexTypeExtractors.scala#L49
It seems that the pattern matching does not take UDT into consideration.
Is this an intended feature? If not, I would like to create a PR to fix it.
--
Hao Ren
Data Engineer @ leboncoin
Paris, France
gt;);
> perhaps you could push for this to happen by creating a Jira and pinging
> jkbradley and mengxr. Thanks!
>
> On Thu, Sep 17, 2015 at 8:07 AM, Hao Ren wrote:
>
>> Working on spark.ml.classification.LogisticRegression.scala (spark 1.5),
>>
>> It might be useful
to summary any data set we want.
If there is a way to summary test set, please let me know. I have browsed
LogisticRegression.scala, but failed to find one.
Thx.
--
Hao Ren
Data Engineer @ leboncoin
Paris, France
mmon use case.
Any help on this issue is highly appreciated.
If you need more info, checkout the jira I created:
https://issues.apache.org/jira/browse/SPARK-8869
On Thu, Jul 16, 2015 at 11:39 AM, Hao Ren wrote:
> Given the following code which just reads from s3, then saves files to s3
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
--
Hao Ren
Data Engineer @ leboncoin
Paris, France
12 matches
Mail list logo