Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

2014-12-23 Thread Jianshi Huang
FYI, Latest hive 0.14/parquet will have column renaming support. Jianshi On Wed, Dec 10, 2014 at 3:37 AM, Michael Armbrust wrote: > You might also try out the recently added support for views. > > On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang > wrote: > >> Ah... I see. T

Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

2014-12-08 Thread Jianshi Huang
> > > > On Sat, Dec 6, 2014 at 8:28 PM, Jianshi Huang > wrote: > >> Ok, found another possible bug in Hive. >> >> My current solution is to use ALTER TABLE CHANGE to rename the column >> names. >> >> The problem is after renaming the colum

Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

2014-12-06 Thread Jianshi Huang
a> sql("select cre_ts from pmt limit 1").collect res16: Array[org.apache.spark.sql.Row] = Array([null]) I created a JIRA for it: https://issues.apache.org/jira/browse/SPARK-4781 Jianshi On Sun, Dec 7, 2014 at 1:06 AM, Jianshi Huang wrote: > Hmm... another issue I found

Re: drop table if exists throws exception

2014-12-05 Thread Jianshi Huang
xception in the logs, but that exception does not propogate to user code. >> >> On Thu, Dec 4, 2014 at 11:31 PM, Jianshi Huang >> wrote: >> >> > Hi, >> > >> > I got exception saying Hive: NoSuchObjectException(message: table >> > not found)

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
With Liancheng's suggestion, I've tried setting spark.sql.hive.convertMetastoreParquet false but still analyze noscan return -1 in rawDataSize Jianshi On Fri, Dec 5, 2014 at 3:33 PM, Jianshi Huang wrote: > If I run ANALYZE without NOSCAN, then Hive can successfully

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
30 PM, Jianshi Huang wrote: > Sorry for the late of follow-up. > > I used Hao's DESC EXTENDED command and found some clue: > > new (broadcast broken Spark build): > parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892, > COLUMN_STATS_ACCURATE

drop table if exists throws exception

2014-12-04 Thread Jianshi Huang
Hi, I got exception saying Hive: NoSuchObjectException(message: table not found) when running "DROP TABLE IF EXISTS " Looks like a new regression in Hive module. Anyone can confirm this? Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
is will print the detail physical plan. > > > > Let me know if you still have problem. > > > > Hao > > > > *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] > *Sent:* Thursday, November 27, 2014 10:24 PM > *To:* Cheng, Hao > *Cc:* user > *Subject:* Re: Auto B

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
I created a ticket for this: https://issues.apache.org/jira/browse/SPARK-4757 Jianshi On Fri, Dec 5, 2014 at 1:31 PM, Jianshi Huang wrote: > Correction: > > According to Liancheng, this hotfix might be the root cause: > > > https://github.com/a

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Correction: According to Liancheng, this hotfix might be the root cause: https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde008c2f0698ce Jianshi On Fri, Dec 5, 2014 at 12:45 PM, Jianshi Huang wrote: > Looks like the datanucleus*.jar shouldn't appear in the hdfs

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Looks like the datanucleus*.jar shouldn't appear in the hdfs path in Yarn-client mode. Maybe this patch broke yarn-client. https://github.com/apache/spark/commit/a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53 Jianshi On Fri, Dec 5, 2014 at 12:02 PM, Jianshi Huang wrote: > Act

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Actually my HADOOP_CLASSPATH has already been set to include /etc/hadoop/conf/* export HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml:/usr/lib/hbase/lib/hbase-protocol.jar:$(hbase classpath) Jianshi On Fri, Dec 5, 2014 at 11:54 AM, Jianshi Huang wrote: > Looks like somehow Spark failed

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
SPATH? Jianshi On Fri, Dec 5, 2014 at 11:37 AM, Jianshi Huang wrote: > I got the following error during Spark startup (Yarn-client mode): > > 14/12/04 19:33:58 INFO Client: Uploading resource > file:/x/home/jianshuang/spark/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar > -&g

Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
ter HEAD yesterday. Is this a bug? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/

Re: How to do broadcast join in SparkSQL

2014-11-25 Thread Jianshi Huang
/usr/lib/hive/lib doesn’t show any of the parquet jars, but ls /usr/lib/impala/lib shows the jar we’re looking for as parquet-hive-1.0.jar Is it removed from latest Spark? Jianshi On Wed, Nov 26, 2014 at 2:13 PM, Jianshi Huang wrote: > Hi, > > Looks like the latest SparkSQL with Hive 0

Re: How to do broadcast join in SparkSQL

2014-11-25 Thread Jianshi Huang
) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) Using the same DDL and Analyze script above. Jianshi On Sat, Oct 11, 2014 at 2:18 PM, Jianshi Huang wrote: > It works fine, thanks for the help Michael. > > Liancheng also told m

Re: Build with Hive 0.13.1 doesn't have datanucleus and parquet dependencies.

2014-10-27 Thread Jianshi Huang
Ah I see. Thanks Hao! I'll wait for the fix. Jianshi On Mon, Oct 27, 2014 at 4:57 PM, Cheng, Hao wrote: > Hive-thriftserver module is not included while specifying the profile > hive-0.13.1. > > -Original Message- > From: Jianshi Huang [mailto:jianshi.hu...@gmail

Build with Hive 0.13.1 doesn't have datanucleus and parquet dependencies.

2014-10-27 Thread Jianshi Huang
ssing anything? Jianshi -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Jianshi Huang
occurrence when preemption is enabled. That being said, it's a > configurable option, so you can set "x" to a very large value and your > job should keep on chugging along. > > The options you'd want to take a look at are: spark.task.maxFailures > and spark.yarn.max.executor.failures > &

Re: SPARK-3106 fixed?

2014-10-13 Thread Jianshi Huang
On Tue, Oct 14, 2014 at 4:36 AM, Jianshi Huang wrote: > Turned out it was caused by this issue: > https://issues.apache.org/jira/browse/SPARK-3923 > > Set spark.akka.heartbeat.interval to 100 solved it. > > Jianshi > > On Mon, Oct 13, 2014 at 4:24 PM, Jianshi Huang

Re: SPARK-3106 fixed?

2014-10-13 Thread Jianshi Huang
Turned out it was caused by this issue: https://issues.apache.org/jira/browse/SPARK-3923 Set spark.akka.heartbeat.interval to 100 solved it. Jianshi On Mon, Oct 13, 2014 at 4:24 PM, Jianshi Huang wrote: > Hmm... it failed again, just lasted a little bit longer. > > Jianshi > >

Re: SPARK-3106 fixed?

2014-10-13 Thread Jianshi Huang
Hmm... it failed again, just lasted a little bit longer. Jianshi On Mon, Oct 13, 2014 at 4:15 PM, Jianshi Huang wrote: > https://issues.apache.org/jira/browse/SPARK-3106 > > I'm having the saming errors described in SPARK-3106 (no other types of > errors confirmed), running a

SPARK-3106 fixed?

2014-10-13 Thread Jianshi Huang
dozen dim tables (using HiveContext) and then map it to my class object. It failed a couple of times and now I cached the intermediate table and currently it seems working fine... no idea why until I found SPARK-3106 Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & B

Re: How to do broadcast join in SparkSQL

2014-10-10 Thread Jianshi Huang
MAT 'parquet.hive.DeprecatedParquetInputFormat' > |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' > |LOCATION '$file'""".stripMargin > sql(ddl) > setConf("spark.sql.hive.convertMetastoreParquet", "true"

Re: How to do broadcast join in SparkSQL

2014-10-08 Thread Jianshi Huang
at 2:18 PM, Jianshi Huang wrote: > Looks like https://issues.apache.org/jira/browse/SPARK-1800 is not merged > into master? > > I cannot find spark.sql.hints.broadcastTables in latest master, but it's > in the following patch. > > > https://github.com/apache/spark/commit/7

Re: How to do broadcast join in SparkSQL

2014-10-07 Thread Jianshi Huang
ep 29, 2014 at 1:24 AM, Jianshi Huang wrote: > Yes, looks like it can only be controlled by the > parameter spark.sql.autoBroadcastJoinThreshold, which is a little bit weird > to me. > > How am I suppose to know the exact bytes of a table? Let me specify the > join algorit