[DISCUSS][SQL] Improve Performance of AggregationIterator
Hi Spark Developers We are implementing a new TypedImperativeAggregate which will benefit from batch to batch update or merge. And at least, in the Sort based aggregation, we can process inputs batch to batch. Does anyone do the same optimization?
Re: Contributing to JIRA Maintenance
Thanks for doing this - and I will say this is a great way for anyone out there to contribute directly to the project. Issue trackers need maintenance too. It's not that hard to spot basic problems with JIRAs and request fixes, as a way to engage the reporter usefully. I triage PRs but rarely look at JIRAs anymore, just because the volume and noise level is larger. But it is important. On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon wrote: > > Hi all, > > I would like to ask for some help about JIRA maintenance contributions in > Apache Spark. > I tend to see less and less people active in JIRA maintenance contributions. > > I have regularly checked all JIRAs and monitored them continuously for the > last 4 years. > For the last week, I didn't have time to take a look, and I felt frustrated > that there are > many JIRAs that look clearly needing action. Here are the examples only from > the last week: > > Exact duplication: > Resolve one and link another one as a duplicate. > - https://issues.apache.org/jira/browse/SPARK-32370 > - https://issues.apache.org/jira/browse/SPARK-32369 > > Different languages: > Ask English translations which dev people use to communicate. > If the reporter is inactive, we can resolve it till then. > - https://issues.apache.org/jira/browse/SPARK-32355 > > No JIRA description: > Ask to fill the JIRA description. Not so many people know what the issue > the > JIRA describes just from reading the title which will end up that nobody > can work > on the JIRA. > - https://issues.apache.org/jira/browse/SPARK-32361 > - https://issues.apache.org/jira/browse/SPARK-32359 > - https://issues.apache.org/jira/browse/SPARK-32388 > - https://issues.apache.org/jira/browse/SPARK-32390 > - https://issues.apache.org/jira/browse/SPARK-32400 > > Malformed image: > If the attached image looks malformed to you, ask to fix. > - https://issues.apache.org/jira/browse/SPARK-32433 > > Questions: > Questions should usually go to mailing list or stackoverflow per > http://spark.apache.org/community.html > - https://issues.apache.org/jira/browse/SPARK-32460 > > > There is clear guidance about JIRA maintenance "Contributing to JIRA > Maintenance" > in http://spark.apache.org/contributing.html (thanks @Sean Owen for writing > this). > I hope to see more people and ask for some help in the JIRA maintenance. > > FWIW, at least I, as a PMC, monitor most of these JIRA maintenance > contributions from the > community and take them into account when/where it should be. > > > Thanks all in advance. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Contributing to JIRA Maintenance
Hello All, I have recently joined the Dev mailing list to help the community. Since I am in my attempt to understand the code base before contributing, I think looking into Jira maintenance will be a good way to help. I will start looking into it. Do I need anyone’s approval? In case I need any help in the beginning can I mail here or there is a separate mailing id related to Jira maintenance? Just a trivial question- Do we have any document to give an overview of the code structure for newbie like me, I can create one if there isn’t any. Thanks, Rohit Mishra On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen wrote: > Thanks for doing this - and I will say this is a great way for anyone > out there to contribute directly to the project. Issue trackers need > maintenance too. It's not that hard to spot basic problems with JIRAs > and request fixes, as a way to engage the reporter usefully. > > I triage PRs but rarely look at JIRAs anymore, just because the volume > and noise level is larger. But it is important. > > On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon wrote: > > > > Hi all, > > > > I would like to ask for some help about JIRA maintenance contributions > in Apache Spark. > > I tend to see less and less people active in JIRA maintenance > contributions. > > > > I have regularly checked all JIRAs and monitored them continuously for > the last 4 years. > > For the last week, I didn't have time to take a look, and I felt > frustrated that there are > > many JIRAs that look clearly needing action. Here are the examples only > from the last week: > > > > Exact duplication: > > Resolve one and link another one as a duplicate. > > - https://issues.apache.org/jira/browse/SPARK-32370 > > - https://issues.apache.org/jira/browse/SPARK-32369 > > > > Different languages: > > Ask English translations which dev people use to communicate. > > If the reporter is inactive, we can resolve it till then. > > - https://issues.apache.org/jira/browse/SPARK-32355 > > > > No JIRA description: > > Ask to fill the JIRA description. Not so many people know what the > issue the > > JIRA describes just from reading the title which will end up that > nobody can work > > on the JIRA. > > - https://issues.apache.org/jira/browse/SPARK-32361 > > - https://issues.apache.org/jira/browse/SPARK-32359 > > - https://issues.apache.org/jira/browse/SPARK-32388 > > - https://issues.apache.org/jira/browse/SPARK-32390 > > - https://issues.apache.org/jira/browse/SPARK-32400 > > > > Malformed image: > > If the attached image looks malformed to you, ask to fix. > > - https://issues.apache.org/jira/browse/SPARK-32433 > > > > Questions: > > Questions should usually go to mailing list or stackoverflow per > http://spark.apache.org/community.html > > - https://issues.apache.org/jira/browse/SPARK-32460 > > > > > > There is clear guidance about JIRA maintenance "Contributing to JIRA > Maintenance" > > in http://spark.apache.org/contributing.html (thanks @Sean Owen for > writing this). > > I hope to see more people and ask for some help in the JIRA maintenance. > > > > FWIW, at least I, as a PMC, monitor most of these JIRA maintenance > contributions from the > > community and take them into account when/where it should be. > > > > > > Thanks all in advance. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: Contributing to JIRA Maintenance
To help with JIRA, I don't think you need to know a lot about the code structure. I think we're talking about more basic triage, like, is it a question that should go to the mailing list instead? is there enough detail to understand it at all? is it tagged with a few appropriate components, does its affected version make sense? Finding duplicate issues is hard but quite valuable if you can identify related issues and mark them. I can also tell you about using the JIRA Client to search for issues that don't make much sense, like, open and targeting a released version. Actually I think anyone can modify issues in JIRA, so you don't need special permission. You could consult with me or Hyukjin or dev@ after making a few changes to check if they're on the right track. iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails about changes. I don't know if it's that useful to subscribe to. Documenting the code structure - might be kind of hard in any detail, but if you put together a doc that is useful and doesn't require a lot of maintenance, that gives a good overview, we could consider adding that to the developer docs. On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra wrote: > > Hello All, > > I have recently joined the Dev mailing list to help the community. Since I am > in my attempt to understand the code base before contributing, I think > looking into Jira maintenance will be a good way to help. I will start > looking into it. Do I need anyone’s approval? > > In case I need any help in the beginning can I mail here or there is a > separate mailing id related to Jira maintenance? > > Just a trivial question- Do we have any document to give an overview of the > code structure for newbie like me, I can create one if there isn’t any. > > Thanks, > Rohit Mishra > > On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen wrote: >> >> Thanks for doing this - and I will say this is a great way for anyone >> out there to contribute directly to the project. Issue trackers need >> maintenance too. It's not that hard to spot basic problems with JIRAs >> and request fixes, as a way to engage the reporter usefully. >> >> I triage PRs but rarely look at JIRAs anymore, just because the volume >> and noise level is larger. But it is important. >> >> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon wrote: >> > >> > Hi all, >> > >> > I would like to ask for some help about JIRA maintenance contributions in >> > Apache Spark. >> > I tend to see less and less people active in JIRA maintenance >> > contributions. >> > >> > I have regularly checked all JIRAs and monitored them continuously for the >> > last 4 years. >> > For the last week, I didn't have time to take a look, and I felt >> > frustrated that there are >> > many JIRAs that look clearly needing action. Here are the examples only >> > from the last week: >> > >> > Exact duplication: >> > Resolve one and link another one as a duplicate. >> > - https://issues.apache.org/jira/browse/SPARK-32370 >> > - https://issues.apache.org/jira/browse/SPARK-32369 >> > >> > Different languages: >> > Ask English translations which dev people use to communicate. >> > If the reporter is inactive, we can resolve it till then. >> > - https://issues.apache.org/jira/browse/SPARK-32355 >> > >> > No JIRA description: >> > Ask to fill the JIRA description. Not so many people know what the >> > issue the >> > JIRA describes just from reading the title which will end up that >> > nobody can work >> > on the JIRA. >> > - https://issues.apache.org/jira/browse/SPARK-32361 >> > - https://issues.apache.org/jira/browse/SPARK-32359 >> > - https://issues.apache.org/jira/browse/SPARK-32388 >> > - https://issues.apache.org/jira/browse/SPARK-32390 >> > - https://issues.apache.org/jira/browse/SPARK-32400 >> > >> > Malformed image: >> > If the attached image looks malformed to you, ask to fix. >> > - https://issues.apache.org/jira/browse/SPARK-32433 >> > >> > Questions: >> > Questions should usually go to mailing list or stackoverflow per >> > http://spark.apache.org/community.html >> > - https://issues.apache.org/jira/browse/SPARK-32460 >> > >> > >> > There is clear guidance about JIRA maintenance "Contributing to JIRA >> > Maintenance" >> > in http://spark.apache.org/contributing.html (thanks @Sean Owen for >> > writing this). >> > I hope to see more people and ask for some help in the JIRA maintenance. >> > >> > FWIW, at least I, as a PMC, monitor most of these JIRA maintenance >> > contributions from the >> > community and take them into account when/where it should be. >> > >> > >> > Thanks all in advance. >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Contributing to JIRA Maintenance
Thanks Sean for your elaborate and valuable explanation. I will look into it from tomorrow and will reach out if required. Have a good day. Regards, Rohit Mishra On Tue, 28 Jul 2020 at 11:20 PM, Sean Owen wrote: > To help with JIRA, I don't think you need to know a lot about the code > structure. I think we're talking about more basic triage, like, is it > a question that should go to the mailing list instead? is there enough > detail to understand it at all? is it tagged with a few appropriate > components, does its affected version make sense? Finding duplicate > issues is hard but quite valuable if you can identify related issues > and mark them. > > I can also tell you about using the JIRA Client to search for issues > that don't make much sense, like, open and targeting a released > version. > > Actually I think anyone can modify issues in JIRA, so you don't need > special permission. You could consult with me or Hyukjin or dev@ after > making a few changes to check if they're on the right track. > > iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails > about changes. I don't know if it's that useful to subscribe to. > > Documenting the code structure - might be kind of hard in any detail, > but if you put together a doc that is useful and doesn't require a lot > of maintenance, that gives a good overview, we could consider adding > that to the developer docs. > > > > On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra > wrote: > > > > Hello All, > > > > I have recently joined the Dev mailing list to help the community. Since > I am in my attempt to understand the code base before contributing, I think > looking into Jira maintenance will be a good way to help. I will start > looking into it. Do I need anyone’s approval? > > > > In case I need any help in the beginning can I mail here or there is a > separate mailing id related to Jira maintenance? > > > > Just a trivial question- Do we have any document to give an overview of > the code structure for newbie like me, I can create one if there isn’t any. > > > > Thanks, > > Rohit Mishra > > > > On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen wrote: > >> > >> Thanks for doing this - and I will say this is a great way for anyone > >> out there to contribute directly to the project. Issue trackers need > >> maintenance too. It's not that hard to spot basic problems with JIRAs > >> and request fixes, as a way to engage the reporter usefully. > >> > >> I triage PRs but rarely look at JIRAs anymore, just because the volume > >> and noise level is larger. But it is important. > >> > >> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon > wrote: > >> > > >> > Hi all, > >> > > >> > I would like to ask for some help about JIRA maintenance > contributions in Apache Spark. > >> > I tend to see less and less people active in JIRA maintenance > contributions. > >> > > >> > I have regularly checked all JIRAs and monitored them continuously > for the last 4 years. > >> > For the last week, I didn't have time to take a look, and I felt > frustrated that there are > >> > many JIRAs that look clearly needing action. Here are the examples > only from the last week: > >> > > >> > Exact duplication: > >> > Resolve one and link another one as a duplicate. > >> > - https://issues.apache.org/jira/browse/SPARK-32370 > >> > - https://issues.apache.org/jira/browse/SPARK-32369 > >> > > >> > Different languages: > >> > Ask English translations which dev people use to communicate. > >> > If the reporter is inactive, we can resolve it till then. > >> > - https://issues.apache.org/jira/browse/SPARK-32355 > >> > > >> > No JIRA description: > >> > Ask to fill the JIRA description. Not so many people know what > the issue the > >> > JIRA describes just from reading the title which will end up that > nobody can work > >> > on the JIRA. > >> > - https://issues.apache.org/jira/browse/SPARK-32361 > >> > - https://issues.apache.org/jira/browse/SPARK-32359 > >> > - https://issues.apache.org/jira/browse/SPARK-32388 > >> > - https://issues.apache.org/jira/browse/SPARK-32390 > >> > - https://issues.apache.org/jira/browse/SPARK-32400 > >> > > >> > Malformed image: > >> > If the attached image looks malformed to you, ask to fix. > >> > - https://issues.apache.org/jira/browse/SPARK-32433 > >> > > >> > Questions: > >> > Questions should usually go to mailing list or stackoverflow per > http://spark.apache.org/community.html > >> > - https://issues.apache.org/jira/browse/SPARK-32460 > >> > > >> > > >> > There is clear guidance about JIRA maintenance "Contributing to JIRA > Maintenance" > >> > in http://spark.apache.org/contributing.html (thanks @Sean Owen for > writing this). > >> > I hope to see more people and ask for some help in the JIRA > maintenance. > >> > > >> > FWIW, at least I, as a PMC, monitor most of these JIRA maintenance > contributions from the > >> > community and take them into account when/where it should be. > >> > > >> > > >> > Thanks all in
Re: Contributing to JIRA Maintenance
Yeah, to contribute to JIRA maintenance, it does not need a lot of codes given my experience. Just to share my own story: 4 years ago when I was one of contributors, I have been looking for many other ways around to contribute to Spark. I noticed Sean was making exceptional efforts in the JIRA maintenance contribution - he monitored JIRAs basically 24/7. I started to make sustained efforts and contributions there when he asked some help in the dev mailing list. I also did some code work but my JIRA maintenance contribution is also one of the important community activities. This was appropriately considered and recognised by other PMCs. The commit bit. Probably the ideal case is to have contributions in balance across many aspects. But If somebody makes a lot of sustained efforts and contributions to one aspect, this can be also the case we take into account. Yeah, I think Shane is a good example. 2020년 7월 29일 (수) 오전 2:57, Rohit Mishra 님이 작성: > Thanks Sean for your elaborate and valuable explanation. I will look into > it from tomorrow and will reach out if required. > > Have a good day. > > Regards, > Rohit Mishra > > On Tue, 28 Jul 2020 at 11:20 PM, Sean Owen wrote: > >> To help with JIRA, I don't think you need to know a lot about the code >> structure. I think we're talking about more basic triage, like, is it >> a question that should go to the mailing list instead? is there enough >> detail to understand it at all? is it tagged with a few appropriate >> components, does its affected version make sense? Finding duplicate >> issues is hard but quite valuable if you can identify related issues >> and mark them. >> >> I can also tell you about using the JIRA Client to search for issues >> that don't make much sense, like, open and targeting a released >> version. >> >> Actually I think anyone can modify issues in JIRA, so you don't need >> special permission. You could consult with me or Hyukjin or dev@ after >> making a few changes to check if they're on the right track. >> >> iss...@spark.apache.org (IIRC) gets a copy of all the JIRA emails >> about changes. I don't know if it's that useful to subscribe to. >> >> Documenting the code structure - might be kind of hard in any detail, >> but if you put together a doc that is useful and doesn't require a lot >> of maintenance, that gives a good overview, we could consider adding >> that to the developer docs. >> >> >> >> On Tue, Jul 28, 2020 at 12:16 PM Rohit Mishra >> wrote: >> > >> > Hello All, >> > >> > I have recently joined the Dev mailing list to help the community. >> Since I am in my attempt to understand the code base before contributing, I >> think looking into Jira maintenance will be a good way to help. I will >> start looking into it. Do I need anyone’s approval? >> > >> > In case I need any help in the beginning can I mail here or there is a >> separate mailing id related to Jira maintenance? >> > >> > Just a trivial question- Do we have any document to give an overview of >> the code structure for newbie like me, I can create one if there isn’t any. >> > >> > Thanks, >> > Rohit Mishra >> > >> > On Tue, 28 Jul 2020 at 6:46 PM, Sean Owen wrote: >> >> >> >> Thanks for doing this - and I will say this is a great way for anyone >> >> out there to contribute directly to the project. Issue trackers need >> >> maintenance too. It's not that hard to spot basic problems with JIRAs >> >> and request fixes, as a way to engage the reporter usefully. >> >> >> >> I triage PRs but rarely look at JIRAs anymore, just because the volume >> >> and noise level is larger. But it is important. >> >> >> >> On Mon, Jul 27, 2020 at 10:12 PM Hyukjin Kwon >> wrote: >> >> > >> >> > Hi all, >> >> > >> >> > I would like to ask for some help about JIRA maintenance >> contributions in Apache Spark. >> >> > I tend to see less and less people active in JIRA maintenance >> contributions. >> >> > >> >> > I have regularly checked all JIRAs and monitored them continuously >> for the last 4 years. >> >> > For the last week, I didn't have time to take a look, and I felt >> frustrated that there are >> >> > many JIRAs that look clearly needing action. Here are the examples >> only from the last week: >> >> > >> >> > Exact duplication: >> >> > Resolve one and link another one as a duplicate. >> >> > - https://issues.apache.org/jira/browse/SPARK-32370 >> >> > - https://issues.apache.org/jira/browse/SPARK-32369 >> >> > >> >> > Different languages: >> >> > Ask English translations which dev people use to communicate. >> >> > If the reporter is inactive, we can resolve it till then. >> >> > - https://issues.apache.org/jira/browse/SPARK-32355 >> >> > >> >> > No JIRA description: >> >> > Ask to fill the JIRA description. Not so many people know what >> the issue the >> >> > JIRA describes just from reading the title which will end up >> that nobody can work >> >> > on the JIRA. >> >> > - https://issues.apache.org/jira/browse/SPARK-32361 >> >> > - https://i