egation states, so the
> aggregation buffer schema must be mutable types in unsafe row.
>
> If you can use TypedImperativeAggregate to implement your aggregation
> function, SparkSQL has ObjectHashAggregateExec which supports hash-based
> aggregate using arbitrary JVM objects as aggreg
>
> So, I'm not sure about your query though, it seems the types of aggregated
> data in your query
> are not supported for hash-based aggregates.
>
> // maropu
>
>
>
> On Mon, Jan 9, 2017 at 10:52 PM, Andy Dang wrote:
>
>> Hi all,
>>
>>
Hi all,
It appears to me that Dataset.groupBy().agg(udaf) requires a full sort,
which is very inefficient for certain aggration:
The code is very simple:
- I have a UDAF
- What I want to do is: dataset.groupBy(cols).agg(udaf).count()
The physical plan I got was:
*HashAggregate(keys=[], functions
Thus, the caller should copy the result before
> making another call if required.
>
> I think it is why you get a list of the same entries.
>
> So you may need to change it to:
>
> data.add(unboundedEncoder.toRow(input).copy());
>
>
>
> Andy Dang wrote
gards,
Andy
On Fri, Jan 6, 2017 at 3:48 AM, Liang-Chi Hsieh wrote:
>
> Can you show how you use the encoder in your UDAF?
>
>
> Andy Dang wrote
> > One more question about the behavior of ExpressionEncoder
> >
> > .
> >
> > I have a UDAF that ha
the expected behavior of Encoders?
---
Regards,
Andy
On Thu, Jan 5, 2017 at 10:55 AM, Andy Dang wrote:
> Perfect. The API in Java is bit clumsy though
>
> What I ended up doing in Java (the val is from lombok, if anyone's
> wondering):
>
> resolveAndBind();
>
>
> Andy Dang wrote
> > Hi all,
> > (cc-ing dev since I've hit some developer API corner)
> >
> > What's the best way to convert an InternalRow to a Row if I've got an
> > InternalRow and the corresponding Schema.
>
Hi all,
(cc-ing dev since I've hit some developer API corner)
What's the best way to convert an InternalRow to a Row if I've got an
InternalRow and the corresponding Schema.
Code snippet:
@Test
public void foo() throws Exception {
Row row = RowFactory.create(1);
StructType
ply.
>
> If we download all the dependencies at separate location and link with
> spark job jar on spark cluster, is it best way to execute spark job ?
>
> Thanks.
>
> On Fri, Dec 23, 2016 at 8:34 PM, Andy Dang wrote:
>
>> I used to use uber jar in Spark 1.x because of
I used to use uber jar in Spark 1.x because of classpath issues (we
couldn't re-model our dependencies based on our code, and thus cluster's
run dependencies could be very different from running Spark directly in the
IDE. We had to use userClasspathFirst "hack" to work around this.
With Spark 2, i
Hi all,
Today I hit a weird bug in Spark 2.0.2 (vanilla Spark) - the executor tab
shows negative number of active tasks.
I have about 25 jobs, each with 20k tasks so the numbers are not that crazy.
What could possibly the cause of this bug? This is the first time I've seen
it and the only specia
11 matches
Mail list logo