Re: SparkSql newbie problems with nested selects

Andy Davidson Sun, 13 Jul 2014 14:17:15 -0700

Hi Michael

Changing my col name to something other the count¹ . Fixed the parse error


Many thanks, 

Andy

From:  Michael Armbrust <mich...@databricks.com>
Reply-To:  <user@spark.apache.org>
Date:  Sunday, July 13, 2014 at 1:18 PM
To:  <user@spark.apache.org>
Cc:  "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
Subject:  Re: SparkSql newbie problems with nested selects

> Hi Andy,
> 
> The SQL parser is pretty basic (we plan to improve this for the 1.2 release).
> In this case I think part of the problem is that one of your variables is
> "count", which is a reserved word.  Unfortunately, we don't have the ability
> to escape identifiers at this point.
> 
> However, I did manage to get your query to parse using the HiveQL parser,
> provided by HiveContext.
> 
> hiveCtx.hql("""
> select freq.docid, freqTranspose.docid, sum(freq.count * freqTranspose.count)
> from 
>        Frequency freq JOIN
>        (select term, docid, count from Frequency) freqTranspose
>        where freq.term = freqTranspose.term
>        group by freq.docid, freqTranspose.docid""")
> 
> Michael
> 
> 
> On Sun, Jul 13, 2014 at 12:43 PM, Andy Davidson
> <a...@santacruzintegration.com> wrote:
>> Hi 
>> 
>> I am running into trouble with a nested query using python. To try and debug
>> it, I first wrote the query I want using sqlite3
>> 
>> select freq.docid, freqTranspose.docid, sum(freq.count * freqTranspose.count)
>> from 
>>        Frequency as freq,
>>        (select term, docid, count from Frequency) as freqTranspose
>>        where freq.term = freqTranspose.term
>>        group by freq.docid, freqTranspose.docid
>>        ;
>> 
>> 
>> Sparksql has trouble parsing the "(select ) as freqTranspose ³ line
>> 
>> Here is what my input data looks like
>> $ head -n 3 reuters.db.csv
>> docid,term,count
>> 10000_txt_earn,net,1
>> 10000_txt_earn,rogers,4
>> 
>> The output from sqlite3 is
>> $ head -n 6 3hSimilarityMatrix.slow.sql.out
>> freq.docid      freqTranspose.docid  sum(freq.count * freqTranspose.count)
>> --------------  -------------------  -------------------------------------
>> 10000_txt_earn  10000_txt_earn       127
>> 10000_txt_earn  10054_txt_earn       33
>> 10000_txt_earn  10080_txt_crude      146
>> 10000_txt_earn  10088_txt_acq        11
>> $ 
>> 
>> 
>> My code example pretty much follows
>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>> 
>> dataFile = sc.textFile("reuters.db.csv²)
>> lines = dataFile.map(lambda l: l.split(",²))
>> def mapLines(line) :
>>     ret = {}
>>     ret['docid'] = line[0]
>>     ret['term'] = line[1]
>>     ret['count'] = line[2]
>>     return ret
>> frequency = lines.map(mapLines)
>> schemaFrequency = sqlContext.inferSchema(frequency)
>> schemaFrequency.registerAsTable("frequency²)
>> 
>> Okay here is where I run into trouble
>> 
>> sqlCmd = "select \
>>             freq.docid, \
>>             freqTranspose.docid \
>>           from \
>>               frequency as freq, \
>>               (select term, docid, count from frequency)  \
>>         "
>> similarities = sqlContext.sql(sqlCmd)
>> 
>> 
>> /Users/andy/workSpace/dataBricksIntroToApacheSpark/USBStick/spark/python/lib/
>> py4j-0.8.1-src.zip/py4j/protocol.py in get_return_value(answer,
>> gateway_client, target_id, name)    298                 raise Py4JJavaError(
>>     299                     'An error occurred while calling
>> {0}{1}{2}.\n'.--> 300                     format(target_id, '.', name),
>> value)
>>     301             else:    302                 raise Py4JError(
>> 
>> Py4JJavaError: An error occurred while calling o40.sql.
>> : java.lang.RuntimeException: [1.153] failure: ``('' expected but `from'
>> found
>> 
>> select             freq.docid,             freqTranspose.docid           from
>> frequency as freq,               (select term, docid, count from frequency)
>>                 
>> 
>> Simple sql seems to ³parse² I.e. Select  freq.docid from frequency as freq
>> 
>> Any suggestions would be greatly appreciated.
>> 
>> Andy
>> 
>> P.s. I should note, I think I am using version 1.0 ?
>> 
>> 
>

Re: SparkSql newbie problems with nested selects

Reply via email to