Many thanks, will look into this. I dont want to particularly reuse the
custom Hive UDAF I have, would prefer writing a new one it that is
cleaner. I am just using the JVM.
On 5 June 2015 at 00:03, Holden Karau wrote:
> My current example doesn't use a Hive UDAF, but you would do something
> p
My current example doesn't use a Hive UDAF, but you would do something
pretty similar (it calls a new user defined UDAF, and there are wrappers to
make Spark SQL UDAFs from Hive UDAFs but they are private). So this is
doable, but since it pokes at internals it will likely break between
versions of
Hi Holden, Olivier
>>So for column you need to pass in a Java function, I have some sample
code which does this but it does terrible things to access Spark internals.
I also need to call a Hive UDAF in a dataframe agg function. Are there any
examples of what Column expects?
Deenar
On 2 June 201
So for column you need to pass in a Java function, I have some sample code
which does this but it does terrible things to access Spark internals.
On Tuesday, June 2, 2015, Olivier Girardot
wrote:
> Nice to hear from you Holden ! I ended up trying exactly that (Column) -
> but I may have done it
Nice to hear from you Holden ! I ended up trying exactly that (Column) -
but I may have done it wrong :
In [*5*]: g.agg(Column("percentile(value, 0.5)"))
Py4JError: An error occurred while calling o97.agg. Trace:
py4j.Py4JException: Method agg([class java.lang.String, class
scala.collection.immuta
Not super easily, the GroupedData class uses a strToExpr function which has
a pretty limited set of functions so we cant pass in the name of an
arbitrary hive UDAF (unless I'm missing something). We can instead
construct an column with the expression you want and then pass it in to
agg() that way (
I've finally come to the same conclusion, but isn't there any way to call
this Hive UDAFs from the agg("percentile(key,0.5)") ??
Le mar. 2 juin 2015 à 15:37, Yana Kadiyska a
écrit :
> Like this...sqlContext should be a HiveContext instance
>
> case class KeyValue(key: Int, value: String)
> val d
Like this...sqlContext should be a HiveContext instance
case class KeyValue(key: Int, value: String)
val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
df.registerTempTable("table")
sqlContext.sql("select percentile(key,0.5) from table").show()
On Tue, Jun 2, 2015 at 8:07 AM,
Hi everyone,
Is there any way to compute a median on a column using Spark's Dataframe. I
know you can use stats in a RDD but I'd rather stay within a dataframe.
Hive seems to imply that using ntile one can compute percentiles, quartiles
and therefore a median.
Does anyone have experience with this