Have you had a look at this issue?
https://issues.apache.org/jira/browse/SPARK-12279
There is a comment by Y Bodnar on how they successfully got Kerberos and
HBase working.
2016-05-18 18:13 GMT+10:00 :
> Hi all,
>
> I have been puzzling over a Kerberos problem for a while now and wondered
> if
You would want to add a listener to your Spark Streaming context. Have a
look at the StatsReportListener [1].
[1]
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.scheduler.StatsReportListener
2016-05-17 7:18 GMT+10:00 Samuel Zhou :
> Hi,
>
> Does anyone know h
If you are wanting to share RDDs it might be a good idea to check out
Tachyon / Alluxio.
For the Thrift server, I believe the datasets are located in your Spark
cluster as RDDs and you just communicate with it via the Thrift
JDBC Distributed Query Engine connector.
2016-05-17 5:12 GMT+10:00 Micha
Assuming you are refering to running SparkSubmit.main programatically
otherwise read this [1].
I can't find any scaladocs for org.apache.spark.deploy.* but Oozie's [2]
example of using SparkSubmit is pretty comprehensive.
[1] http://spark.apache.org/docs/latest/submitting-applications.html
[2]
ht
You could handle null values by using the DataFrame.na functions in a
preprocessing step like DataFrame.na.fill().
For reference:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameNaFunctions
John
On 21 April 2016 at 03:41, Andres Perez wrote:
> so the mi