Ted, Igor, Oh my... thanks a lot to both of you! Igor was absolutely right, but I missed that I have to use sqlContext =(
Everything's perfect. Thank you. -- Be well! Jean Morozov On Fri, Dec 25, 2015 at 8:31 PM, Ted Yu <yuzhih...@gmail.com> wrote: > DataFrame uses different syntax from SQL query. > I searched unit tests but didn't find any in the form of df.select("select > ...") > > Looks like you should use sqlContext as other people suggested. > > On Fri, Dec 25, 2015 at 8:29 AM, Eugene Morozov < > evgeny.a.moro...@gmail.com> wrote: > >> Thanks for the comments, although the issue is not in limit() predicate. >> It's something with spark being unable to resolve the expression. >> >> I can do smth like this. It works as it suppose to: >> df.select(df.col("*")).where(df.col("x1").equalTo(3.0)).show(5); >> >> But I think old fashioned sql style have to work also. I have >> df.registeredTempTable("tmptable") and then >> >> df.select("select * from tmptable where x1 = '3.0'").show(); >> >> org.apache.spark.sql.AnalysisException: cannot resolve 'select * from tmp >> where x1 = '1.0'' given input columns x1, x4, x5, x3, x2; >> >> at >> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >> at >> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56) >> at >> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.sca >> >> >> From the first statement I conclude that my custom datasource is >> perfectly fine. >> Just wonder how to fix / workaround that. >> -- >> Be well! >> Jean Morozov >> >> On Fri, Dec 25, 2015 at 6:13 PM, Igor Berman <igor.ber...@gmail.com> >> wrote: >> >>> sqlContext.sql("select * from table limit 5").show() (not sure if limit >>> 5 supported) >>> >>> or use Dmitriy's solution. select() defines your projection when you've >>> specified entire query >>> >>> On 25 December 2015 at 15:42, Василец Дмитрий <pronix.serv...@gmail.com> >>> wrote: >>> >>>> hello >>>> you can try to use df.limit(5).show() >>>> just trick :) >>>> >>>> On Fri, Dec 25, 2015 at 2:34 PM, Eugene Morozov < >>>> evgeny.a.moro...@gmail.com> wrote: >>>> >>>>> Hello, I'm basically stuck as I have no idea where to look; >>>>> >>>>> Following simple code, given that my Datasource is working gives me an >>>>> exception. >>>>> >>>>> DataFrame df = sqlc.load(filename, >>>>> "com.epam.parso.spark.ds.DefaultSource"); >>>>> df.cache(); >>>>> df.printSchema(); <-- prints the schema perfectly fine! >>>>> >>>>> df.show(); <-- Works perfectly fine (shows table >>>>> with 20 lines)! >>>>> df.registerTempTable("table"); >>>>> df.select("select * from table limit 5").show(); <-- gives weird exception >>>>> >>>>> Exception is: >>>>> >>>>> AnalysisException: cannot resolve 'select * from table limit 5' given >>>>> input columns VER, CREATED, SOC, SOCC, HLTC, HLGTC, STATUS >>>>> >>>>> I can do a collect on a dataframe, but cannot select any specific >>>>> columns either "select * from table" or "select VER, CREATED from table". >>>>> >>>>> I use spark 1.5.2. >>>>> The same code perfectly works through Zeppelin 0.5.5. >>>>> >>>>> Thanks. >>>>> -- >>>>> Be well! >>>>> Jean Morozov >>>>> >>>> >>>> >>> >> >