Hi Ayan, Thanks for the response. I am using SQL query (not Dataframe). Could you please explain how I should import this sql function to it? Simply importing this class to my driver code does not help here.
Many functions that I need are already there in the sql.functions so I do not want to rewrite them. Regards Ashok On Fri, Mar 4, 2016 at 3:52 PM, ayan guha <guha.a...@gmail.com> wrote: > Most likely you are missing import of org.apache.spark.sql.functions. > > In any case, you can write your own function for floor and use it as UDF. > > On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran < > ashokkumar.rajend...@gmail.com> wrote: > >> Hi, >> >> I load json file that has timestamp (as long in milliseconds) and several >> other attributes. I would like to group them by 5 minutes and store them as >> separate file. >> >> I am facing couple of problems here.. >> 1. Using Floor function at select clause (to bucket by 5mins) gives me >> error saying "java.util.NoSuchElementException: key not found: floor". How >> do I use floor function in select clause? I see that floor method is >> available in org.apache.spark.sql.functions clause but not sure why its not >> working here. >> 2. Can I use the same in Group by clause? >> 3. How do I store them as separate file after grouping them? >> >> String logPath = "my-json.gz"; >> DataFrame logdf = sqlContext.read().json(logPath); >> logdf.registerTempTable("logs"); >> DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as >> rawTimeStamp, `user.requestId` as requestId, >> *floor(`user.timestamp`/72000*) as timeBucket FROM logs"); >> bucketLogs.toJSON().saveAsTextFile("target_file"); >> >> Regards >> Ashok >> > > > > -- > Best Regards, > Ayan Guha >