No way to set mesos cluster driver memory overhead?

2016-10-13 Thread drewrobb
When using spark on mesos and deploying a job in cluster mode using dispatcher, there appears to be no memory overhead configuration for the launched driver processes ("--driver-memory" is the same as Xmx which is the same as the memory quota). This makes it almost a guarantee that a long running d

Re: No way to set mesos cluster driver memory overhead?

2016-10-13 Thread drewrobb
It seems like this is a real issue, so I've opened an issue: https://issues.apache.org/jira/browse/SPARK-17928 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-way-to-set-mesos-cluster-driver-memory-overhead-tp27897p27901.html Sent from the Apache Spark Us

_SUCCESS file validation on read

2017-04-03 Thread drewrobb
When writing a dataframe, a _SUCCESS file is created to mark that the entire dataframe is written. However, the existence of this _SUCCESS does not seem to be validated by default on reads. This would allow in some cases for partially written dataframes to be read back. Is this behavior configurabl

Join pushdown on two external tables from the same external source?

2017-06-13 Thread drewrobb
I'm trying to figure out how to multiple tables from a single external source directly in spark sql. Say I do the following in spark SQL: CREATE OR REPLACE TEMPORARY VIEW t1 USING jdbc OPTIONS ( dbtable 't1' ...) CREATE OR REPLACE TEMPORARY VIEW t2 USING jdbc OPTIONS ( dbtable 't2' ...) SELECT *