Re: Spark SQL Query and join different data sources.

2014-09-02 Thread Yin Huai
Actually, with HiveContext, you can join hive tables with registered temporary tables. On Fri, Aug 22, 2014 at 9:07 PM, chutium wrote: > oops, thanks Yan, you are right, i got > > scala> sqlContext.sql("select * from a join b").take(10) > java.lang.RuntimeException: Table Not Found: b >

RE: Spark SQL Query and join different data sources.

2014-08-22 Thread chutium
oops, thanks Yan, you are right, i got scala> sqlContext.sql("select * from a join b").take(10) java.lang.RuntimeException: Table Not Found: b at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:90)

RE: Spark SQL Query and join different data sources.

2014-08-21 Thread alexliu68
Presto is so far good at joining different sources/databases. I tried a simple join query in Spark SQL, it fails as the followings errors val a = cql("select test.a from test JOIN test1 on test.a = test1.a") a: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at SchemaRDD.scala:104 == Query

RE: Spark SQL Query and join different data sources.

2014-08-21 Thread Yan Zhou.sc
resto might be better than HiveQL; while in terms of federation, Hive is actually very good at it. -Original Message- From: chutium [mailto:teng....@gmail.com] Sent: Thursday, August 21, 2014 4:35 AM To: d...@spark.incubator.apache.org Subject: Re: Spark SQL Query and join different data so

Re: Spark SQL Query and join different data sources.

2014-08-21 Thread chutium
as far as i know, HQL queries try to find the schema info of all the tables in this query from hive metastore, so it is not possible to join tables from sqlContext using hiveContext.hql but this should work: hiveContext.hql("select ...").regAsTable("a") sqlContext.jsonFile("xxx").regAsTable("b")