Re: Exposing Spark parallelized directory listing & non-locality listing in core

2020-07-23 Thread Steve Loughran
On Wed, 22 Jul 2020 at 18:50, Holden Karau  wrote:

> Wonderful. To be clear the patch is more to start the discussion about how
> we want to do it and less what I think is the right way.
>
>
be happy to give a quick online tour of ongoing work on S3A enhancements
some time next week, get feedback


Re: Exposing Spark parallelized directory listing & non-locality listing in core

2020-07-23 Thread Holden Karau
Awesome that sounds great :)

On Thu, Jul 23, 2020 at 3:43 AM Steve Loughran  wrote:

>
>
> On Wed, 22 Jul 2020 at 18:50, Holden Karau  wrote:
>
>> Wonderful. To be clear the patch is more to start the discussion about
>> how we want to do it and less what I think is the right way.
>>
>>
> be happy to give a quick online tour of ongoing work on S3A enhancements
> some time next week, get feedback
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-23 Thread Imran Rashid
Sure, that sounds good to me.  +1

On Wed, Jul 22, 2020 at 1:50 PM Holden Karau  wrote:

>
>
> On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org > wrote:
>
>> Hi Holden,
>>
>> thanks for leading this discussion, I'm in favor in general.  I have one
>> specific question -- these two sections seem to contradict each other
>> slightly:
>>
>> > If there is a -1 from a non-committer, multiple committers or the PMC
>> should be consulted before moving forward.
>> >
>> >If the original person who cast the veto can not be reached in a
>> reasonable time frame given likely holidays, it is up to the PMC to decide
>> the next steps within the guidelines of the ASF. This must be decided by a
>> consensus vote under the ASF voting rules.
>>
>> I think the intent here is that if a *committer* gives a -1, then the PMC
>> has to have a consensus vote?  And if a non-committer gives a -1, then
>> multiple committers should be consulted?  How about combining those two
>> into something like
>>
>> "All -1s with justification merit discussion.  A -1 from a non-committer
>> can be overridden only with input from multiple committers.  A -1 from a
>> committer requires a consensus vote of the PMC under ASF voting rules".
>>
> I can work with that although it wasn’t quite what I was originally going
> for. I didn’t intend to have committer -1s be eligible for override. I
> believe committers have demonstrated sufficient merit; they are the same as
> PMC member -1s in our project.
>
> My aim was just if something weird happens (like say I had a pending -1
> before my motorcycle crash last year) we go to the PMC and take a binding
> vote on what to do, and most likely someone on the PMC will reach out to
> the ASF for understanding around the guidelines.
>
> What about:
>
> All -1s with justification merit discussion.  A -1 from a non-committer
> can be overridden only with input from multiple committers and suitable
> time for any committer to raise concerns.  A -1 from a committer who can
> not be reached requires a consensus vote of the PMC under ASF voting rules
> to determine the next steps within the ASF guidelines for vetos.
>
>>
>>
>> thanks,
>> Imran
>>
>>
>> On Tue, Jul 21, 2020 at 3:41 PM Holden Karau 
>> wrote:
>>
>>> Hi Spark Developers,
>>>
>>> There has been a rather active discussion regarding the specific vetoes
>>> that occured during Spark 3. From that I believe we are now mostly in
>>> agreement that it would be best to clarify our rules around code vetoes &
>>> merging in general. Personally I believe this change is important to help
>>> improve the appearance of a level playing field in the project.
>>>
>>> Once discussion settles I'll run this by a copy editor, my grammar isn't
>>> amazing, and bring forward for a vote.
>>>
>>> The current Spark committer guide is at https://spark.apache.org/
>>> committers.html. I am proposing we add a section on when it is OK to
>>> merge PRs directly above the section on how to merge PRs. The text I am
>>> proposing to amend our committer guidelines with is:
>>>
>>> PRs shall not be merged during active on topic discussion except for
>>> issues like critical security fixes of a public vulnerability. Under
>>> extenuating circumstances PRs may be merged during active off topic
>>> discussion and the discussion directed to a more appropriate venue. Time
>>> should be given prior to merging for those involved with the conversation
>>> to explain if they believe they are on topic.
>>>
>>> Lazy consensus requires giving time for discussion to settle, while
>>> understanding that people may not be working on Spark as their full time
>>> job and may take holidays. It is believed that by doing this we can limit
>>> how often people feel the need to exercise their veto.
>>>
>>> For the purposes of a -1 on code changes, a qualified voter includes all
>>> PMC members and committers in the project. For a -1 to be a valid veto it
>>> must include a technical reason. The reason can include things like the
>>> change may introduce a maintenance burden or is not the direction of Spark.
>>>
>>> If there is a -1 from a non-committer, multiple committers or the PMC
>>> should be consulted before moving forward.
>>>
>>>
>>> If the original person who cast the veto can not be reached in a
>>> reasonable time frame given likely holidays, it is up to the PMC to decide
>>> the next steps within the guidelines of the ASF. This must be decided by a
>>> consensus vote under the ASF voting rules.
>>>
>>> These policies serve to reiterate the core principle that code must not
>>> be merged with a pending veto or before a consensus has been reached (lazy
>>> or otherwise).
>>>
>>> It is the PMC’s hope that vetoes continue to be infrequent, and when
>>> they occur all parties take the time to build consensus prior to additional
>>> feature work.
>>>
>>>
>>> Being a committer means exercising your judgement, while working in a
>>> community with diverse views. There is nothing wrong in get

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-23 Thread Mridul Muralidharan
Thanks Holden, this version looks good to me.
+1

Regards,
Mridul


On Thu, Jul 23, 2020 at 3:56 PM Imran Rashid  wrote:

> Sure, that sounds good to me.  +1
>
> On Wed, Jul 22, 2020 at 1:50 PM Holden Karau  wrote:
>
>>
>>
>> On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org >
>> wrote:
>>
>>> Hi Holden,
>>>
>>> thanks for leading this discussion, I'm in favor in general.  I have one
>>> specific question -- these two sections seem to contradict each other
>>> slightly:
>>>
>>> > If there is a -1 from a non-committer, multiple committers or the PMC
>>> should be consulted before moving forward.
>>> >
>>> >If the original person who cast the veto can not be reached in a
>>> reasonable time frame given likely holidays, it is up to the PMC to decide
>>> the next steps within the guidelines of the ASF. This must be decided by a
>>> consensus vote under the ASF voting rules.
>>>
>>> I think the intent here is that if a *committer* gives a -1, then the
>>> PMC has to have a consensus vote?  And if a non-committer gives a -1, then
>>> multiple committers should be consulted?  How about combining those two
>>> into something like
>>>
>>> "All -1s with justification merit discussion.  A -1 from a non-committer
>>> can be overridden only with input from multiple committers.  A -1 from a
>>> committer requires a consensus vote of the PMC under ASF voting rules".
>>>
>> I can work with that although it wasn’t quite what I was originally going
>> for. I didn’t intend to have committer -1s be eligible for override. I
>> believe committers have demonstrated sufficient merit; they are the same as
>> PMC member -1s in our project.
>>
>> My aim was just if something weird happens (like say I had a pending -1
>> before my motorcycle crash last year) we go to the PMC and take a binding
>> vote on what to do, and most likely someone on the PMC will reach out to
>> the ASF for understanding around the guidelines.
>>
>> What about:
>>
>> All -1s with justification merit discussion.  A -1 from a non-committer
>> can be overridden only with input from multiple committers and suitable
>> time for any committer to raise concerns.  A -1 from a committer who can
>> not be reached requires a consensus vote of the PMC under ASF voting rules
>> to determine the next steps within the ASF guidelines for vetos.
>>
>>>
>>>
>>> thanks,
>>> Imran
>>>
>>>
>>> On Tue, Jul 21, 2020 at 3:41 PM Holden Karau 
>>> wrote:
>>>
 Hi Spark Developers,

 There has been a rather active discussion regarding the specific vetoes
 that occured during Spark 3. From that I believe we are now mostly in
 agreement that it would be best to clarify our rules around code vetoes &
 merging in general. Personally I believe this change is important to help
 improve the appearance of a level playing field in the project.

 Once discussion settles I'll run this by a copy editor, my grammar
 isn't amazing, and bring forward for a vote.

 The current Spark committer guide is at https://spark.apache.org/
 committers.html. I am proposing we add a section on when it is OK to
 merge PRs directly above the section on how to merge PRs. The text I am
 proposing to amend our committer guidelines with is:

 PRs shall not be merged during active on topic discussion except for
 issues like critical security fixes of a public vulnerability. Under
 extenuating circumstances PRs may be merged during active off topic
 discussion and the discussion directed to a more appropriate venue. Time
 should be given prior to merging for those involved with the conversation
 to explain if they believe they are on topic.

 Lazy consensus requires giving time for discussion to settle, while
 understanding that people may not be working on Spark as their full time
 job and may take holidays. It is believed that by doing this we can limit
 how often people feel the need to exercise their veto.

 For the purposes of a -1 on code changes, a qualified voter includes
 all PMC members and committers in the project. For a -1 to be a valid veto
 it must include a technical reason. The reason can include things like the
 change may introduce a maintenance burden or is not the direction of Spark.

 If there is a -1 from a non-committer, multiple committers or the PMC
 should be consulted before moving forward.


 If the original person who cast the veto can not be reached in a
 reasonable time frame given likely holidays, it is up to the PMC to decide
 the next steps within the guidelines of the ASF. This must be decided by a
 consensus vote under the ASF voting rules.

 These policies serve to reiterate the core principle that code must not
 be merged with a pending veto or before a consensus has been reached (lazy
 or otherwise).

 It is the PMC’s hope that vetoes continue to be infrequent, and when
 they o

Re: [PSA] Apache Spark uses GitHub Actions to run the tests

2020-07-23 Thread Imran Rashid
Thanks for setting this up, Hyukjin.

How do you re-trigger tests in github actions?  Eg. there is a failure that
appears to be some random infra thing or a flaky test, or maybe the tests
were just run a while back so you want to get a fresh batch of tests.  I
think the old "Jenkins, retest this please" will still run tests via
Jenkins, and so does the button on spark-prs.appspot.com ?

>From a tiny bit of searching, it looks like we might need to add the
"workflow dispatch" to the github action configuration as indicated here?
https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/

thanks,
Imran

On Tue, Jul 14, 2020 at 1:18 AM Hyukjin Kwon  wrote:

> Hi dev,
>
> Github Actions build was introduced to run the regular Spark test cases at
> https://github.com/apache/spark/pull/29057and
> https://github.com/apache/spark/pull/29086.
> This is virtually the duplication of default Jenkins PR builder at this
> moment.
>
> The only differences are:
> - Github Actions does not run the tests for Kinesis, see SPARK-32246
> - Github Actions does not support other profiles such as JDK 11 or Hive
> 1.2, see SPARK-32255
> - Jenkins build does not run Java documentation build, see SPARK-32233
> - Jenkins build does not run the dependency test, see SPARK-32178
>
> Therefore, I do believe PRs can be merged in most general cases once the
> Jenkins PR
> builder or Github Actions build passes when we expect the successful test
> results from
> the default Jenkins PR builder.
>
> Thanks.
>


Re: [PSA] Apache Spark uses GitHub Actions to run the tests

2020-07-23 Thread Hyukjin Kwon
Ah, it doesn’t have an integration with the PR dashboard for now.
Imran, have you set up this ?
Once you have the write access directly to the GitHub repo, I think you
will be able to see the “Re-run jobs” button:

[image: Screen Shot 2020-07-24 at 9.40.03 AM.png]

It is true that the PR author ideally should be able to rerun it too but
seems not possible like some of other CIs.
For now, authors could simply push empty commits or rebase to retrigger for
now.



2020년 7월 24일 (금) 오전 7:10, Imran Rashid 님이 작성:

> Thanks for setting this up, Hyukjin.
>
> How do you re-trigger tests in github actions?  Eg. there is a failure
> that appears to be some random infra thing or a flaky test, or maybe the
> tests were just run a while back so you want to get a fresh batch of
> tests.  I think the old "Jenkins, retest this please" will still run tests
> via Jenkins, and so does the button on spark-prs.appspot.com ?
>
> From a tiny bit of searching, it looks like we might need to add the
> "workflow dispatch" to the github action configuration as indicated here?
> https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/
>
> thanks,
> Imran
>
> On Tue, Jul 14, 2020 at 1:18 AM Hyukjin Kwon  wrote:
>
>> Hi dev,
>>
>> Github Actions build was introduced to run the regular Spark test cases
>> at https://github.com/apache/spark/pull/29057and
>> https://github.com/apache/spark/pull/29086.
>> This is virtually the duplication of default Jenkins PR builder at this
>> moment.
>>
>> The only differences are:
>> - Github Actions does not run the tests for Kinesis, see SPARK-32246
>> - Github Actions does not support other profiles such as JDK 11 or Hive
>> 1.2, see SPARK-32255
>> - Jenkins build does not run Java documentation build, see SPARK-32233
>> - Jenkins build does not run the dependency test, see SPARK-32178
>>
>> Therefore, I do believe PRs can be merged in most general cases once the
>> Jenkins PR
>> builder or Github Actions build passes when we expect the successful test
>> results from
>> the default Jenkins PR builder.
>>
>> Thanks.
>>
>