Hive Context incompatible with Sentry enabled Cluster

2015-11-15 Thread Charmee Patel
Hi, We have recently run into this issue: https://issues.apache.org/jira/browse/SPARK-9042 My organization's application reads raw data from files, processes/cleanses it and pushes the results to Hive tables. To keep reads efficient, we have partitioned our tables. In a Sentry enabled cluster, ou

Re: How to force statistics calculation of Dataframe?

2015-11-04 Thread Charmee Patel
adcast function is in org.apache.spark.sql.functions > > > > On Wed, Nov 4, 2015 at 10:19 AM, Charmee Patel wrote: > >> Hi, >> >> If I have a hive table, analyze table compute statistics will ensure >> Spark SQL has statistics of that table. When I have a dataframe, is there

How to force statistics calculation of Dataframe?

2015-11-04 Thread Charmee Patel
Hi, If I have a hive table, analyze table compute statistics will ensure Spark SQL has statistics of that table. When I have a dataframe, is there a way to force spark to collect statistics? I have a large lookup file and I am trying to avoid a broadcast join by applying a filter before hand. Thi

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Charmee Patel
A similar issue occurs when interacting with Hive secured by Sentry. https://issues.apache.org/jira/browse/SPARK-9042 By changing how Hive Context instance is created, this issue might also be resolved. On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran wrote: > On 22 Oct 2015, at 08:25, Chester C