subject:"SparkSQL exception on cached parquet table"

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Sadhan Sood

Thanks Michael, opened this https://issues.apache.org/jira/browse/SPARK-4520 On Thu, Nov 20, 2014 at 2:59 PM, Michael Armbrust wrote: > Can you open a JIRA? > > On Thu, Nov 20, 2014 at 10:39 AM, Sadhan Sood > wrote: > >> I am running on master, pulled yesterday I believe but saw the same issue

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Michael Armbrust

Can you open a JIRA? On Thu, Nov 20, 2014 at 10:39 AM, Sadhan Sood wrote: > I am running on master, pulled yesterday I believe but saw the same issue > with 1.2.0 > > On Thu, Nov 20, 2014 at 1:37 PM, Michael Armbrust > wrote: > >> Which version are you running on again? >> >> On Thu, Nov 20, 20

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Sadhan Sood

I am running on master, pulled yesterday I believe but saw the same issue with 1.2.0 On Thu, Nov 20, 2014 at 1:37 PM, Michael Armbrust wrote: > Which version are you running on again? > > On Thu, Nov 20, 2014 at 8:17 AM, Sadhan Sood > wrote: > >> Also attaching the parquet file if anyone wants

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Michael Armbrust

Which version are you running on again? On Thu, Nov 20, 2014 at 8:17 AM, Sadhan Sood wrote: > Also attaching the parquet file if anyone wants to take a further look. > > On Thu, Nov 20, 2014 at 8:54 AM, Sadhan Sood > wrote: > >> So, I am seeing this issue with spark sql throwing an exception wh

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Sadhan Sood

Also attaching the parquet file if anyone wants to take a further look. On Thu, Nov 20, 2014 at 8:54 AM, Sadhan Sood wrote: > So, I am seeing this issue with spark sql throwing an exception when > trying to read selective columns from a thrift parquet file and also when > caching them: > On some

Re: SparkSQL exception on cached parquet table

2014-11-20 Thread Sadhan Sood

So, I am seeing this issue with spark sql throwing an exception when trying to read selective columns from a thrift parquet file and also when caching them: On some further digging, I was able to narrow it down to at-least one particular column type: map> to be causing this issue. To reproduce this

Re: SparkSQL exception on cached parquet table

2014-11-16 Thread Sadhan Sood

Hi Cheng, I tried reading the parquet file(on which we were getting the exception) through parquet-tools and it is able to dump the file and I can read the metadata, etc. I also loaded the file through hive table and can run a table scan query on it as well. Let me know if I can do more to help re

Re: SparkSQL exception on cached parquet table

2014-11-16 Thread Cheng Lian

(Forgot to cc user mail list) On 11/16/14 4:59 PM, Cheng Lian wrote: Hey Sadhan, Thanks for the additional information, this is helpful. Seems that some Parquet internal contract was broken, but I'm not sure whether it's caused by Spark SQL or Parquet, or even maybe the Parquet file itself w

Re: SparkSQL exception on cached parquet table

2014-11-15 Thread sadhan

Hi Cheng, Thanks for your response.Here is the stack trace from yarn logs: -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-exception-on-cached-parquet-table-tp18978p19020.html Sent from the Apache Spark User List mailing list archive at

Re: SparkSQL exception on cached parquet table

2014-11-15 Thread Cheng Lian

Hi Sadhan, Could you please provide the stack trace of the |ArrayIndexOutOfBoundsException| (if any)? The reason why the first query succeeds is that Spark SQL doesn’t bother reading all data from the table to give |COUNT(*)|. In the second case, however, the whole table is asked to be cached

SparkSQL exception on cached parquet table

2014-11-14 Thread Sadhan Sood

While testing SparkSQL on a bunch of parquet files (basically used to be a partition for one of our hive tables), I encountered this error: import org.apache.spark.sql.SchemaRDD import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path;

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

Re: SparkSQL exception on cached parquet table

SparkSQL exception on cached parquet table

11 matches

Site Navigation

Mail list logo

Footer information