Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-10 Thread Andy Dang
egation states, so the > aggregation buffer schema must be mutable types in unsafe row. > > If you can use TypedImperativeAggregate to implement your aggregation > function, SparkSQL has ObjectHashAggregateExec which supports hash-based > aggregate using arbitrary JVM objects as aggreg

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Andy Dang
> > So, I'm not sure about your query though, it seems the types of aggregated > data in your query > are not supported for hash-based aggregates. > > // maropu > > > > On Mon, Jan 9, 2017 at 10:52 PM, Andy Dang wrote: > >> Hi all, >> >>

How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Andy Dang
Hi all, It appears to me that Dataset.groupBy().agg(udaf) requires a full sort, which is very inefficient for certain aggration: The code is very simple: - I have a UDAF - What I want to do is: dataset.groupBy(cols).agg(udaf).count() The physical plan I got was: *HashAggregate(keys=[], functions

Re: Converting an InternalRow to a Row

2017-01-07 Thread Andy Dang
Thus, the caller should copy the result before > making another call if required. > > I think it is why you get a list of the same entries. > > So you may need to change it to: > > data.add(unboundedEncoder.toRow(input).copy()); > > > > Andy Dang wrote

Re: Converting an InternalRow to a Row

2017-01-06 Thread Andy Dang
gards, Andy On Fri, Jan 6, 2017 at 3:48 AM, Liang-Chi Hsieh wrote: > > Can you show how you use the encoder in your UDAF? > > > Andy Dang wrote > > One more question about the behavior of ExpressionEncoder > > > > . > > > > I have a UDAF that ha

Re: Converting an InternalRow to a Row

2017-01-05 Thread Andy Dang
the expected behavior of Encoders? --- Regards, Andy On Thu, Jan 5, 2017 at 10:55 AM, Andy Dang wrote: > Perfect. The API in Java is bit clumsy though > > What I ended up doing in Java (the val is from lombok, if anyone's > wondering): >

Re: Converting an InternalRow to a Row

2017-01-05 Thread Andy Dang
> resolveAndBind(); > > > Andy Dang wrote > > Hi all, > > (cc-ing dev since I've hit some developer API corner) > > > > What's the best way to convert an InternalRow to a Row if I've got an > > InternalRow and the corresponding Schema. >

Converting an InternalRow to a Row

2017-01-04 Thread Andy Dang
Hi all, (cc-ing dev since I've hit some developer API corner) What's the best way to convert an InternalRow to a Row if I've got an InternalRow and the corresponding Schema. Code snippet: @Test public void foo() throws Exception { Row row = RowFactory.create(1); StructType

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
ply. > > If we download all the dependencies at separate location and link with > spark job jar on spark cluster, is it best way to execute spark job ? > > Thanks. > > On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang wrote: > >> I used to use uber jar in Spark 1.x because of

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
I used to use uber jar in Spark 1.x because of classpath issues (we couldn't re-model our dependencies based on our code, and thus cluster's run dependencies could be very different from running Spark directly in the IDE. We had to use userClasspathFirst "hack" to work around this. With Spark 2, i

Negative number of active tasks

2016-12-23 Thread Andy Dang
Hi all, Today I hit a weird bug in Spark 2.0.2 (vanilla Spark) - the executor tab shows negative number of active tasks. I have about 25 jobs, each with 20k tasks so the numbers are not that crazy. What could possibly the cause of this bug? This is the first time I've seen it and the only specia