Hi Kishor, Spark doesn't currently support subqueries in the WHERE clause. However, it looks as though someone is working on this right now: https://issues.apache.org/jira/browse/SPARK-4226
Hope this helps, Rick Hillegas Kishor Bachhav <kbach...@pivotal.io> wrote on 10/28/2015 05:52:50 AM: > From: Kishor Bachhav <kbach...@pivotal.io> > To: user@spark.apache.org > Date: 10/28/2015 05:53 AM > Subject: nested select is not working in spark sql > > Hi, > I am trying to execute below query in spark sql but throws exception > > select n_name from NATION where n_regionkey = (select r_regionkey > from REGION where r_name='ASIA') > Exception: > Exception in thread "main" java.lang.RuntimeException: [1.55] > failure: ``)'' expected but identifier r_regionkey found > > select n_name from NATION where n_regionkey = (select r_regionkey > from REGION where r_name='ASIA') > ^ > at scala.sys.package$.error(package.scala:27) > at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse > (AbstractSparkSQLParser.scala:36) > at org.apache.spark.sql.SnappyParserDialect.parse (snappyParsers.scala:65) > at org.apache.spark.sql.SQLContext$$anonfun$3.apply (SQLContext.scala:169) > at org.apache.spark.sql.SQLContext$$anonfun$3.apply (SQLContext.scala:169) > at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark > $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115) > at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark > $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114) > at scala.util.parsing.combinator.Parsers$Success.map (Parsers.scala:136) > at scala.util.parsing.combinator.Parsers$Success.map (Parsers.scala:135) > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map > $1.apply(Parsers.scala:242) > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map > $1.apply(Parsers.scala:242) > at scala.util.parsing.combinator.Parsers$$anon$3.apply (Parsers.scala:222) > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > $1$$anonfun$apply$2.apply(Parsers.scala:254) > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > $1$$anonfun$apply$2.apply(Parsers.scala:254) > at scala.util.parsing.combinator.Parsers$Failure.append (Parsers.scala:202) > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > $1.apply(Parsers.scala:254) > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > $1.apply(Parsers.scala:254) > at scala.util.parsing.combinator.Parsers$$anon$3.apply (Parsers.scala:222) > at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply > $14.apply(Parsers.scala:891) > at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply > $14.apply(Parsers.scala:891) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at scala.util.parsing.combinator.Parsers$$anon$2.apply (Parsers.scala:890) > at scala.util.parsing.combinator.PackratParsers$$anon$1.apply > (PackratParsers.scala:110) > at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse > (AbstractSparkSQLParser.scala:34) > > Same is working in mysql as well as memsql. > Expected Result is > > memsql> select n_name from NATION where n_regionkey = (select > r_regionkey from REGION where r_name='ASIA'); > +-----------+ > | n_name | > +-----------+ > | INDIA | > | INDONESIA | > | JAPAN | > | CHINA | > | VIETNAM | > +-----------+ > 5 rows in set (0.71 sec) > How can I make this work in spark sql? > Actually above query is one simplified version of Minimum cost > supplier query (Q2) of TPCH which has this nested select nature. I > am working on these TPCH queries. If anybody has the modified set of > TPCH queries for spark sql, kindly let me know. It will be very useful for me. > > select > s_acctbal, > s_name, > n_name, > p_partkey, > p_mfgr, > s_address, > s_phone, > s_comment > from > part, > supplier, > partsupp, > nation, > region > where > p_partkey = ps_partkey > and s_suppkey = ps_suppkey > and p_size = [SIZE] > and p_type like '%[TYPE]' > and s_nationkey = n_nationkey > and n_regionkey = r_regionkey > and r_name = '[REGION]' > and ps_supplycost = ( > select > min(ps_supplycost) > from > partsupp, supplier, > nation, region > where > p_partkey = ps_partkey > and s_suppkey = ps_suppkey > and s_nationkey = n_nationkey > and n_regionkey = r_regionkey > and r_name = '[REGION]' > ) > order by > s_acctbal desc, > n_name, > s_name, > p_partkey; > > Regards > Kishor