Hi Kishor,

Spark doesn't currently support subqueries in the WHERE clause. However, it
looks as though someone is working on this right now:
https://issues.apache.org/jira/browse/SPARK-4226

Hope this helps,
Rick Hillegas



Kishor Bachhav <kbach...@pivotal.io> wrote on 10/28/2015 05:52:50 AM:

> From: Kishor Bachhav <kbach...@pivotal.io>
> To: user@spark.apache.org
> Date: 10/28/2015 05:53 AM
> Subject: nested select is not working in spark sql
>
> Hi,

> I am trying to execute below query in spark sql but throws exception
>
> select n_name from NATION where n_regionkey = (select r_regionkey
> from REGION where r_name='ASIA')

> Exception:
> Exception in thread "main" java.lang.RuntimeException: [1.55]
> failure: ``)'' expected but identifier r_regionkey found
>
> select n_name from NATION where n_regionkey = (select r_regionkey
> from REGION where r_name='ASIA')
>                                                       ^
>     at scala.sys.package$.error(package.scala:27)
>     at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse
> (AbstractSparkSQLParser.scala:36)
>     at org.apache.spark.sql.SnappyParserDialect.parse
(snappyParsers.scala:65)
>     at org.apache.spark.sql.SQLContext$$anonfun$3.apply
(SQLContext.scala:169)
>     at org.apache.spark.sql.SQLContext$$anonfun$3.apply
(SQLContext.scala:169)
>     at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark
> $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
>     at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark
> $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
>     at scala.util.parsing.combinator.Parsers$Success.map
(Parsers.scala:136)
>     at scala.util.parsing.combinator.Parsers$Success.map
(Parsers.scala:135)
>     at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map
> $1.apply(Parsers.scala:242)
>     at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map
> $1.apply(Parsers.scala:242)
>     at scala.util.parsing.combinator.Parsers$$anon$3.apply
(Parsers.scala:222)
>     at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1$$anonfun$apply$2.apply(Parsers.scala:254)
>     at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1$$anonfun$apply$2.apply(Parsers.scala:254)
>     at scala.util.parsing.combinator.Parsers$Failure.append
(Parsers.scala:202)
>     at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1.apply(Parsers.scala:254)
>     at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append
> $1.apply(Parsers.scala:254)
>     at scala.util.parsing.combinator.Parsers$$anon$3.apply
(Parsers.scala:222)
>     at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply
> $14.apply(Parsers.scala:891)
>     at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply
> $14.apply(Parsers.scala:891)
>     at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>     at scala.util.parsing.combinator.Parsers$$anon$2.apply
(Parsers.scala:890)
>     at scala.util.parsing.combinator.PackratParsers$$anon$1.apply
> (PackratParsers.scala:110)
>     at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse
> (AbstractSparkSQLParser.scala:34)
>

> Same is working in mysql as well as memsql.

> Expected Result is
>
> memsql> select n_name from NATION where n_regionkey = (select
> r_regionkey from REGION where r_name='ASIA');
> +-----------+
> | n_name    |
> +-----------+
> | INDIA     |
> | INDONESIA |
> | JAPAN     |
> | CHINA     |
> | VIETNAM   |
> +-----------+
> 5 rows in set (0.71 sec)

> How can I make this work in spark sql?

> Actually above query is one simplified version of Minimum cost
> supplier query (Q2) of TPCH which has this nested select nature. I
> am working on these TPCH queries. If anybody has the modified set of
> TPCH queries for spark sql, kindly let me know. It will be very useful
for me.
>
> select
> s_acctbal,
> s_name,
> n_name,
> p_partkey,
> p_mfgr,
> s_address,
> s_phone,
> s_comment
> from
> part,
> supplier,
> partsupp,
> nation,
> region
> where
> p_partkey = ps_partkey
> and s_suppkey = ps_suppkey
> and p_size = [SIZE]
> and p_type like '%[TYPE]'
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = '[REGION]'
> and ps_supplycost = (
>   select
> min(ps_supplycost)
> from
> partsupp, supplier,
> nation, region
> where
> p_partkey = ps_partkey
> and s_suppkey = ps_suppkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = '[REGION]'
> )
> order by
> s_acctbal desc,
> n_name,
> s_name,
> p_partkey;
>

> Regards
>   Kishor

Reply via email to