[DISCUSS][SQL] Improve Performance of AggregationIterator

2020-07-28 Thread Chang Chen
Hi Spark Developers

We are implementing a new TypedImperativeAggregate which will benefit from
batch to batch update or merge. And at least, in the Sort based
aggregation,  we can process inputs batch to batch.

Does anyone do the same optimization?


Re: Contributing to JIRA Maintenance

2020-07-28 Thread Sean Owen
Thanks for doing this - and I will say this is a great way for anyone
out there to contribute directly to the project. Issue trackers need
maintenance too. It's not that hard to spot basic problems with JIRAs
and request fixes, as a way to engage the reporter usefully.

I triage PRs but rarely look at JIRAs anymore, just because the volume
and noise level is larger. But it is important.

On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon  wrote:
>
> Hi all,
>
> I would like to ask for some help about JIRA maintenance contributions in 
> Apache Spark.
> I tend to see less and less people active in JIRA maintenance contributions.
>
> I have regularly checked all JIRAs and monitored them continuously for the 
> last 4 years.
> For the last week, I didn't have time to take a look, and I felt frustrated 
> that there are
> many JIRAs that look clearly needing action. Here are the examples only from 
> the last week:
>
> Exact duplication:
> Resolve one and link another one as a duplicate.
> - https://issues.apache.org/jira/browse/SPARK-32370
> - https://issues.apache.org/jira/browse/SPARK-32369
>
> Different languages:
> Ask English translations which dev people use to communicate.
> If the reporter is inactive, we can resolve it till then.
> - https://issues.apache.org/jira/browse/SPARK-32355
>
> No JIRA description:
>  Ask to fill the JIRA description. Not so many people know what the issue 
> the
> JIRA describes just from reading the title which will end up that nobody 
> can work
> on the JIRA.
> - https://issues.apache.org/jira/browse/SPARK-32361
> - https://issues.apache.org/jira/browse/SPARK-32359
> - https://issues.apache.org/jira/browse/SPARK-32388
> - https://issues.apache.org/jira/browse/SPARK-32390
> - https://issues.apache.org/jira/browse/SPARK-32400
>
> Malformed image:
> If the attached image looks malformed to you, ask to fix.
> - https://issues.apache.org/jira/browse/SPARK-32433
>
> Questions:
> Questions should usually go to mailing list or stackoverflow per 
> http://spark.apache.org/community.html
> - https://issues.apache.org/jira/browse/SPARK-32460
>
>
> There is clear guidance about JIRA maintenance "Contributing to JIRA 
> Maintenance"
> in http://spark.apache.org/contributing.html (thanks @Sean Owen for writing 
> this).
> I hope to see more people and ask for some help in the JIRA maintenance.
>
> FWIW, at least I, as a PMC, monitor most of these JIRA maintenance 
> contributions from the
> community and take them into account when/where it should be.
>
>
> Thanks all in advance.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Contributing to JIRA Maintenance

2020-07-28 Thread Rohit Mishra
Hello All,

I have recently joined the Dev mailing list to help the community. Since I
am in my attempt to understand the code base before contributing, I think
looking into Jira maintenance will be a good way to help. I will start
looking into it. Do I need anyone’s approval?

In case I need any help in the beginning can I mail here or there is a
separate mailing id related to Jira maintenance?

Just a trivial question- Do we have any document to give an overview of the
code structure for newbie like me, I can create one if there isn’t any.

Thanks,
Rohit Mishra

On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen  wrote:

> Thanks for doing this - and I will say this is a great way for anyone
> out there to contribute directly to the project. Issue trackers need
> maintenance too. It's not that hard to spot basic problems with JIRAs
> and request fixes, as a way to engage the reporter usefully.
>
> I triage PRs but rarely look at JIRAs anymore, just because the volume
> and noise level is larger. But it is important.
>
> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon  wrote:
> >
> > Hi all,
> >
> > I would like to ask for some help about JIRA maintenance contributions
> in Apache Spark.
> > I tend to see less and less people active in JIRA maintenance
> contributions.
> >
> > I have regularly checked all JIRAs and monitored them continuously for
> the last 4 years.
> > For the last week, I didn't have time to take a look, and I felt
> frustrated that there are
> > many JIRAs that look clearly needing action. Here are the examples only
> from the last week:
> >
> > Exact duplication:
> > Resolve one and link another one as a duplicate.
> > - https://issues.apache.org/jira/browse/SPARK-32370
> > - https://issues.apache.org/jira/browse/SPARK-32369
> >
> > Different languages:
> > Ask English translations which dev people use to communicate.
> > If the reporter is inactive, we can resolve it till then.
> > - https://issues.apache.org/jira/browse/SPARK-32355
> >
> > No JIRA description:
> >  Ask to fill the JIRA description. Not so many people know what the
> issue the
> > JIRA describes just from reading the title which will end up that
> nobody can work
> > on the JIRA.
> > - https://issues.apache.org/jira/browse/SPARK-32361
> > - https://issues.apache.org/jira/browse/SPARK-32359
> > - https://issues.apache.org/jira/browse/SPARK-32388
> > - https://issues.apache.org/jira/browse/SPARK-32390
> > - https://issues.apache.org/jira/browse/SPARK-32400
> >
> > Malformed image:
> > If the attached image looks malformed to you, ask to fix.
> > - https://issues.apache.org/jira/browse/SPARK-32433
> >
> > Questions:
> > Questions should usually go to mailing list or stackoverflow per
> http://spark.apache.org/community.html
> > - https://issues.apache.org/jira/browse/SPARK-32460
> >
> >
> > There is clear guidance about JIRA maintenance "Contributing to JIRA
> Maintenance"
> > in http://spark.apache.org/contributing.html (thanks @Sean Owen for
> writing this).
> > I hope to see more people and ask for some help in the JIRA maintenance.
> >
> > FWIW, at least I, as a PMC, monitor most of these JIRA maintenance
> contributions from the
> > community and take them into account when/where it should be.
> >
> >
> > Thanks all in advance.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Contributing to JIRA Maintenance

2020-07-28 Thread Sean Owen
To help with JIRA, I don't think you need to know a lot about the code
structure. I think we're talking about more basic triage, like, is it
a question that should go to the mailing list instead? is there enough
detail to understand it at all? is it tagged with a few appropriate
components, does its affected version make sense?  Finding duplicate
issues is hard but quite valuable if you can identify related issues
and mark them.

I can also tell you about using the JIRA Client to search for issues
that don't make much sense, like, open and targeting a released
version.

Actually I think anyone can modify issues in JIRA, so you don't need
special permission. You could consult with me or Hyukjin or dev@ after
making a few changes to check if they're on the right track.

iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails
about changes. I don't know if it's that useful to subscribe to.

Documenting the code structure - might be kind of hard in any detail,
but if you put together a doc that is useful and doesn't require a lot
of maintenance, that gives a good overview, we could consider adding
that to the developer docs.



On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra  wrote:
>
> Hello All,
>
> I have recently joined the Dev mailing list to help the community. Since I am 
> in my attempt to understand the code base before contributing, I think 
> looking into Jira maintenance will be a good way to help. I will start 
> looking into it. Do I need anyone’s approval?
>
> In case I need any help in the beginning can I mail here or there is a 
> separate mailing id related to Jira maintenance?
>
> Just a trivial question- Do we have any document to give an overview of the 
> code structure for newbie like me, I can create one if there isn’t any.
>
> Thanks,
> Rohit Mishra
>
> On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen  wrote:
>>
>> Thanks for doing this - and I will say this is a great way for anyone
>> out there to contribute directly to the project. Issue trackers need
>> maintenance too. It's not that hard to spot basic problems with JIRAs
>> and request fixes, as a way to engage the reporter usefully.
>>
>> I triage PRs but rarely look at JIRAs anymore, just because the volume
>> and noise level is larger. But it is important.
>>
>> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon  wrote:
>> >
>> > Hi all,
>> >
>> > I would like to ask for some help about JIRA maintenance contributions in 
>> > Apache Spark.
>> > I tend to see less and less people active in JIRA maintenance 
>> > contributions.
>> >
>> > I have regularly checked all JIRAs and monitored them continuously for the 
>> > last 4 years.
>> > For the last week, I didn't have time to take a look, and I felt 
>> > frustrated that there are
>> > many JIRAs that look clearly needing action. Here are the examples only 
>> > from the last week:
>> >
>> > Exact duplication:
>> > Resolve one and link another one as a duplicate.
>> > - https://issues.apache.org/jira/browse/SPARK-32370
>> > - https://issues.apache.org/jira/browse/SPARK-32369
>> >
>> > Different languages:
>> > Ask English translations which dev people use to communicate.
>> > If the reporter is inactive, we can resolve it till then.
>> > - https://issues.apache.org/jira/browse/SPARK-32355
>> >
>> > No JIRA description:
>> >  Ask to fill the JIRA description. Not so many people know what the 
>> > issue the
>> > JIRA describes just from reading the title which will end up that 
>> > nobody can work
>> > on the JIRA.
>> > - https://issues.apache.org/jira/browse/SPARK-32361
>> > - https://issues.apache.org/jira/browse/SPARK-32359
>> > - https://issues.apache.org/jira/browse/SPARK-32388
>> > - https://issues.apache.org/jira/browse/SPARK-32390
>> > - https://issues.apache.org/jira/browse/SPARK-32400
>> >
>> > Malformed image:
>> > If the attached image looks malformed to you, ask to fix.
>> > - https://issues.apache.org/jira/browse/SPARK-32433
>> >
>> > Questions:
>> > Questions should usually go to mailing list or stackoverflow per 
>> > http://spark.apache.org/community.html
>> > - https://issues.apache.org/jira/browse/SPARK-32460
>> >
>> >
>> > There is clear guidance about JIRA maintenance "Contributing to JIRA 
>> > Maintenance"
>> > in http://spark.apache.org/contributing.html (thanks @Sean Owen for 
>> > writing this).
>> > I hope to see more people and ask for some help in the JIRA maintenance.
>> >
>> > FWIW, at least I, as a PMC, monitor most of these JIRA maintenance 
>> > contributions from the
>> > community and take them into account when/where it should be.
>> >
>> >
>> > Thanks all in advance.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Contributing to JIRA Maintenance

2020-07-28 Thread Rohit Mishra
Thanks Sean for your elaborate and valuable explanation. I will look into
it from tomorrow and will reach out if required.

Have a good day.

Regards,
Rohit Mishra

On Tue, 28 Jul 2020 at 11:20 PM, Sean Owen  wrote:

> To help with JIRA, I don't think you need to know a lot about the code
> structure. I think we're talking about more basic triage, like, is it
> a question that should go to the mailing list instead? is there enough
> detail to understand it at all? is it tagged with a few appropriate
> components, does its affected version make sense?  Finding duplicate
> issues is hard but quite valuable if you can identify related issues
> and mark them.
>
> I can also tell you about using the JIRA Client to search for issues
> that don't make much sense, like, open and targeting a released
> version.
>
> Actually I think anyone can modify issues in JIRA, so you don't need
> special permission. You could consult with me or Hyukjin or dev@ after
> making a few changes to check if they're on the right track.
>
> iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails
> about changes. I don't know if it's that useful to subscribe to.
>
> Documenting the code structure - might be kind of hard in any detail,
> but if you put together a doc that is useful and doesn't require a lot
> of maintenance, that gives a good overview, we could consider adding
> that to the developer docs.
>
>
>
> On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra 
> wrote:
> >
> > Hello All,
> >
> > I have recently joined the Dev mailing list to help the community. Since
> I am in my attempt to understand the code base before contributing, I think
> looking into Jira maintenance will be a good way to help. I will start
> looking into it. Do I need anyone’s approval?
> >
> > In case I need any help in the beginning can I mail here or there is a
> separate mailing id related to Jira maintenance?
> >
> > Just a trivial question- Do we have any document to give an overview of
> the code structure for newbie like me, I can create one if there isn’t any.
> >
> > Thanks,
> > Rohit Mishra
> >
> > On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen  wrote:
> >>
> >> Thanks for doing this - and I will say this is a great way for anyone
> >> out there to contribute directly to the project. Issue trackers need
> >> maintenance too. It's not that hard to spot basic problems with JIRAs
> >> and request fixes, as a way to engage the reporter usefully.
> >>
> >> I triage PRs but rarely look at JIRAs anymore, just because the volume
> >> and noise level is larger. But it is important.
> >>
> >> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon 
> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > I would like to ask for some help about JIRA maintenance
> contributions in Apache Spark.
> >> > I tend to see less and less people active in JIRA maintenance
> contributions.
> >> >
> >> > I have regularly checked all JIRAs and monitored them continuously
> for the last 4 years.
> >> > For the last week, I didn't have time to take a look, and I felt
> frustrated that there are
> >> > many JIRAs that look clearly needing action. Here are the examples
> only from the last week:
> >> >
> >> > Exact duplication:
> >> > Resolve one and link another one as a duplicate.
> >> > - https://issues.apache.org/jira/browse/SPARK-32370
> >> > - https://issues.apache.org/jira/browse/SPARK-32369
> >> >
> >> > Different languages:
> >> > Ask English translations which dev people use to communicate.
> >> > If the reporter is inactive, we can resolve it till then.
> >> > - https://issues.apache.org/jira/browse/SPARK-32355
> >> >
> >> > No JIRA description:
> >> >  Ask to fill the JIRA description. Not so many people know what
> the issue the
> >> > JIRA describes just from reading the title which will end up that
> nobody can work
> >> > on the JIRA.
> >> > - https://issues.apache.org/jira/browse/SPARK-32361
> >> > - https://issues.apache.org/jira/browse/SPARK-32359
> >> > - https://issues.apache.org/jira/browse/SPARK-32388
> >> > - https://issues.apache.org/jira/browse/SPARK-32390
> >> > - https://issues.apache.org/jira/browse/SPARK-32400
> >> >
> >> > Malformed image:
> >> > If the attached image looks malformed to you, ask to fix.
> >> > - https://issues.apache.org/jira/browse/SPARK-32433
> >> >
> >> > Questions:
> >> > Questions should usually go to mailing list or stackoverflow per
> http://spark.apache.org/community.html
> >> > - https://issues.apache.org/jira/browse/SPARK-32460
> >> >
> >> >
> >> > There is clear guidance about JIRA maintenance "Contributing to JIRA
> Maintenance"
> >> > in http://spark.apache.org/contributing.html (thanks @Sean Owen for
> writing this).
> >> > I hope to see more people and ask for some help in the JIRA
> maintenance.
> >> >
> >> > FWIW, at least I, as a PMC, monitor most of these JIRA maintenance
> contributions from the
> >> > community and take them into account when/where it should be.
> >> >
> >> >
> >> > Thanks all in 

Re: Contributing to JIRA Maintenance

2020-07-28 Thread Hyukjin Kwon
Yeah, to contribute to JIRA maintenance, it does not need a lot of codes
given my experience.

Just to share my own story:
4 years ago when I was one of contributors, I have been looking for many
other ways around to
contribute to Spark. I noticed Sean was making exceptional efforts in the
JIRA maintenance
contribution - he monitored JIRAs basically 24/7. I started to make
sustained efforts and contributions
there when he asked some help in the dev mailing list. I also did some code
work but my JIRA
maintenance contribution is also one of the important community activities.
This was appropriately considered and recognised by other PMCs.

The commit bit. Probably the ideal case is to have contributions in balance
across many
aspects. But If somebody makes a lot of sustained efforts and contributions
to one
aspect, this can be also the case we take into account. Yeah, I think Shane
is a good example.


2020년 7월 29일 (수) 오전 2:57, Rohit Mishra 님이 작성:

> Thanks Sean for your elaborate and valuable explanation. I will look into
> it from tomorrow and will reach out if required.
>
> Have a good day.
>
> Regards,
> Rohit Mishra
>
> On Tue, 28 Jul 2020 at 11:20 PM, Sean Owen  wrote:
>
>> To help with JIRA, I don't think you need to know a lot about the code
>> structure. I think we're talking about more basic triage, like, is it
>> a question that should go to the mailing list instead? is there enough
>> detail to understand it at all? is it tagged with a few appropriate
>> components, does its affected version make sense?  Finding duplicate
>> issues is hard but quite valuable if you can identify related issues
>> and mark them.
>>
>> I can also tell you about using the JIRA Client to search for issues
>> that don't make much sense, like, open and targeting a released
>> version.
>>
>> Actually I think anyone can modify issues in JIRA, so you don't need
>> special permission. You could consult with me or Hyukjin or dev@ after
>> making a few changes to check if they're on the right track.
>>
>> iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails
>> about changes. I don't know if it's that useful to subscribe to.
>>
>> Documenting the code structure - might be kind of hard in any detail,
>> but if you put together a doc that is useful and doesn't require a lot
>> of maintenance, that gives a good overview, we could consider adding
>> that to the developer docs.
>>
>>
>>
>> On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra 
>> wrote:
>> >
>> > Hello All,
>> >
>> > I have recently joined the Dev mailing list to help the community.
>> Since I am in my attempt to understand the code base before contributing, I
>> think looking into Jira maintenance will be a good way to help. I will
>> start looking into it. Do I need anyone’s approval?
>> >
>> > In case I need any help in the beginning can I mail here or there is a
>> separate mailing id related to Jira maintenance?
>> >
>> > Just a trivial question- Do we have any document to give an overview of
>> the code structure for newbie like me, I can create one if there isn’t any.
>> >
>> > Thanks,
>> > Rohit Mishra
>> >
>> > On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen  wrote:
>> >>
>> >> Thanks for doing this - and I will say this is a great way for anyone
>> >> out there to contribute directly to the project. Issue trackers need
>> >> maintenance too. It's not that hard to spot basic problems with JIRAs
>> >> and request fixes, as a way to engage the reporter usefully.
>> >>
>> >> I triage PRs but rarely look at JIRAs anymore, just because the volume
>> >> and noise level is larger. But it is important.
>> >>
>> >> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon 
>> wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > I would like to ask for some help about JIRA maintenance
>> contributions in Apache Spark.
>> >> > I tend to see less and less people active in JIRA maintenance
>> contributions.
>> >> >
>> >> > I have regularly checked all JIRAs and monitored them continuously
>> for the last 4 years.
>> >> > For the last week, I didn't have time to take a look, and I felt
>> frustrated that there are
>> >> > many JIRAs that look clearly needing action. Here are the examples
>> only from the last week:
>> >> >
>> >> > Exact duplication:
>> >> > Resolve one and link another one as a duplicate.
>> >> > - https://issues.apache.org/jira/browse/SPARK-32370
>> >> > - https://issues.apache.org/jira/browse/SPARK-32369
>> >> >
>> >> > Different languages:
>> >> > Ask English translations which dev people use to communicate.
>> >> > If the reporter is inactive, we can resolve it till then.
>> >> > - https://issues.apache.org/jira/browse/SPARK-32355
>> >> >
>> >> > No JIRA description:
>> >> >  Ask to fill the JIRA description. Not so many people know what
>> the issue the
>> >> > JIRA describes just from reading the title which will end up
>> that nobody can work
>> >> > on the JIRA.
>> >> > - https://issues.apache.org/jira/browse/SPARK-32361
>> >> > - https://i