Re: Facing issue with floor function in spark SQL query

ashokkumar rajendran Fri, 04 Mar 2016 02:35:55 -0800

Hi Ayan,

Thanks for the response. I am using SQL query (not Dataframe). Could you
please explain how I should import this sql function to it? Simply
importing this class to my driver code does not help here.


Many functions that I need are already there in the sql.functions so I do
not want to rewrite them.

Regards
Ashok

On Fri, Mar 4, 2016 at 3:52 PM, ayan guha <guha.a...@gmail.com> wrote:

> Most likely you are missing import of  org.apache.spark.sql.functions.
>
> In any case, you can write your own function for floor and use it as UDF.
>
> On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran <
> ashokkumar.rajend...@gmail.com> wrote:
>
>> Hi,
>>
>> I load json file that has timestamp (as long in milliseconds) and several
>> other attributes. I would like to group them by 5 minutes and store them as
>> separate file.
>>
>> I am facing couple of problems here..
>> 1. Using Floor function at select clause (to bucket by 5mins) gives me
>> error saying "java.util.NoSuchElementException: key not found: floor". How
>> do I use floor function in select clause? I see that floor method is
>> available in org.apache.spark.sql.functions clause but not sure why its not
>> working here.
>> 2. Can I use the same in Group by clause?
>> 3. How do I store them as separate file after grouping them?
>>
>>         String logPath = "my-json.gz";
>>         DataFrame logdf = sqlContext.read().json(logPath);
>>         logdf.registerTempTable("logs");
>>         DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp` as
>> rawTimeStamp, `user.requestId` as requestId,
>> *floor(`user.timestamp`/72000*) as timeBucket FROM logs");
>>         bucketLogs.toJSON().saveAsTextFile("target_file");
>>
>> Regards
>> Ashok
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Facing issue with floor function in spark SQL query

Reply via email to