Re: Stuck with DataFrame df.select("select * from table");

Eugene Morozov Fri, 25 Dec 2015 09:55:02 -0800

Ted, Igor,

Oh my... thanks a lot to both of you!
Igor was absolutely right, but I missed that I have to use sqlContext =(


Everything's perfect.
Thank you.

--
Be well!
Jean Morozov

On Fri, Dec 25, 2015 at 8:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> DataFrame uses different syntax from SQL query.
> I searched unit tests but didn't find any in the form of df.select("select
> ...")
>
> Looks like you should use sqlContext as other people suggested.
>
> On Fri, Dec 25, 2015 at 8:29 AM, Eugene Morozov <
> evgeny.a.moro...@gmail.com> wrote:
>
>> Thanks for the comments, although the issue is not in limit() predicate.
>> It's something with spark being unable to resolve the expression.
>>
>> I can do smth like this. It works as it suppose to:
>>  df.select(df.col("*")).where(df.col("x1").equalTo(3.0)).show(5);
>>
>> But I think old fashioned sql style have to work also. I have
>> df.registeredTempTable("tmptable") and then
>>
>> df.select("select * from tmptable where x1 = '3.0'").show();
>>
>> org.apache.spark.sql.AnalysisException: cannot resolve 'select * from tmp
>> where x1 = '1.0'' given input columns x1, x4, x5, x3, x2;
>>
>> at
>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>> at
>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
>> at
>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.sca
>>
>>
>> From the first statement I conclude that my custom datasource is
>> perfectly fine.
>> Just wonder how to fix / workaround that.
>> --
>> Be well!
>> Jean Morozov
>>
>> On Fri, Dec 25, 2015 at 6:13 PM, Igor Berman <igor.ber...@gmail.com>
>> wrote:
>>
>>> sqlContext.sql("select * from table limit 5").show() (not sure if limit
>>> 5 supported)
>>>
>>> or use Dmitriy's solution. select() defines your projection when you've
>>> specified entire query
>>>
>>> On 25 December 2015 at 15:42, Василец Дмитрий <pronix.serv...@gmail.com>
>>> wrote:
>>>
>>>> hello
>>>> you can try to use df.limit(5).show()
>>>> just trick :)
>>>>
>>>> On Fri, Dec 25, 2015 at 2:34 PM, Eugene Morozov <
>>>> evgeny.a.moro...@gmail.com> wrote:
>>>>
>>>>> Hello, I'm basically stuck as I have no idea where to look;
>>>>>
>>>>> Following simple code, given that my Datasource is working gives me an
>>>>> exception.
>>>>>
>>>>> DataFrame df = sqlc.load(filename, 
>>>>> "com.epam.parso.spark.ds.DefaultSource");
>>>>> df.cache();
>>>>> df.printSchema();       <-- prints the schema perfectly fine!
>>>>>
>>>>> df.show();                      <-- Works perfectly fine (shows table 
>>>>> with 20 lines)!
>>>>> df.registerTempTable("table");
>>>>> df.select("select * from table limit 5").show(); <-- gives weird exception
>>>>>
>>>>> Exception is:
>>>>>
>>>>> AnalysisException: cannot resolve 'select * from table limit 5' given 
>>>>> input columns VER, CREATED, SOC, SOCC, HLTC, HLGTC, STATUS
>>>>>
>>>>> I can do a collect on a dataframe, but cannot select any specific
>>>>> columns either "select * from table" or "select VER, CREATED from table".
>>>>>
>>>>> I use spark 1.5.2.
>>>>> The same code perfectly works through Zeppelin 0.5.5.
>>>>>
>>>>> Thanks.
>>>>> --
>>>>> Be well!
>>>>> Jean Morozov
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Stuck with DataFrame df.select("select * from table");

Reply via email to