Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-13 Thread shane knapp ☠
alright, the system load graphs show that we've had a generally decreasing
load since friday, and have burned through ~3k builds/day since the reboot
last week!  i don't see many timeouts, and the PRB builds have been
generally green for a couple of days.

again, i will keep an eye on things but i feel we're out of the woods right
now.  :)

shane

On Fri, Jul 10, 2020 at 3:43 PM Frank Yin  wrote:

> Great. Thanks.
>
> On Fri, Jul 10, 2020 at 3:39 PM shane knapp ☠  wrote:
>
>> no, 8 hours is plenty.  things will speed up soon once the backlog of
>> builds works through  i limited the number of PRB builds to 4 per
>> worker, and things are looking better.  let's see how we look next week.
>>
>> On Fri, Jul 10, 2020 at 3:31 PM Frank Yin  wrote:
>>
>>> Can we also increase the build timeout?
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125617
>>> This one fails because it times out, not because of test failures.
>>>
>>> On Fri, Jul 10, 2020 at 2:16 PM Frank Yin  wrote:
>>>
 Yeah, that's what I figured -- those workers are under load. Thanks.

 On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ 
 wrote:

> only 125561, 125562 and 125564 were impacted by -9.
>
> 125565 exited w/a code of 15 (143 - 128), which means the process was
> terminated for unknown reasons.
>
> 125563 looks like mima failed due to a bunch of errors.
>
> i just spot checked a bunch of recent failed PRB builds from today and
> they all seemed to be legit.
>
> another thing that might be happening is an overload of PRB builds on
> the workers due to the backlog...  the workers are under a LOT of load
> right now, and i can put some rate limiting in to see if that helps out.
>
> shane
>
> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin 
> wrote:
>
>> Like from build number 125565 to 125561, all impacted by kill -9.
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>>
>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ 
>> wrote:
>>
>>> define "a lot" and provide some links to those builds, please.
>>> there are roughly 2000 builds per day, and i can't do more than keep a
>>> cursory eye on things.
>>>
>>> the infrastructure that the tests run on hasn't changed one bit on
>>> any of the workers, and 'kill -9' could be a timeout, flakiness caused 
>>> by
>>> old build processes remaining on the workers after the master went 
>>> down, or
>>> me trying to clean things up w/o a reboot.  or, perhaps, something wrong
>>> w/the infra.  :)
>>>
>>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin 
>>> wrote:
>>>
 Agree, but I’ve seen a lot of kill by signal 9, assuming that
 infrastructure?

 On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ 
 wrote:

> yeah, i can't do much for flaky tests...  just flaky
> infrastructure.
>
>
> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon 
> wrote:
>
>> Couple of flaky tests can happen. It's usual. Seems it got better
>> now at least. I will keep monitoring the builds.
>>
>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>>
>>> Looks like Jenkins isn't stable still. My PR fails two times in
>>> a row:
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>>
>>>
>>>
>>> --
>>> Sent from:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>

>>
>> --
>> Shane Knapp
>> Computer Guy /

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-13 Thread Xiao Li
Thank you very much, Shane!

Xiao

On Mon, Jul 13, 2020 at 10:15 AM shane knapp ☠  wrote:

> alright, the system load graphs show that we've had a generally decreasing
> load since friday, and have burned through ~3k builds/day since the reboot
> last week!  i don't see many timeouts, and the PRB builds have been
> generally green for a couple of days.
>
> again, i will keep an eye on things but i feel we're out of the woods
> right now.  :)
>
> shane
>
> On Fri, Jul 10, 2020 at 3:43 PM Frank Yin  wrote:
>
>> Great. Thanks.
>>
>> On Fri, Jul 10, 2020 at 3:39 PM shane knapp ☠ 
>> wrote:
>>
>>> no, 8 hours is plenty.  things will speed up soon once the backlog of
>>> builds works through  i limited the number of PRB builds to 4 per
>>> worker, and things are looking better.  let's see how we look next week.
>>>
>>> On Fri, Jul 10, 2020 at 3:31 PM Frank Yin  wrote:
>>>
 Can we also increase the build timeout?

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125617
 This one fails because it times out, not because of test failures.

 On Fri, Jul 10, 2020 at 2:16 PM Frank Yin  wrote:

> Yeah, that's what I figured -- those workers are under load. Thanks.
>
> On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ 
> wrote:
>
>> only 125561, 125562 and 125564 were impacted by -9.
>>
>> 125565 exited w/a code of 15 (143 - 128), which means the process was
>> terminated for unknown reasons.
>>
>> 125563 looks like mima failed due to a bunch of errors.
>>
>> i just spot checked a bunch of recent failed PRB builds from today
>> and they all seemed to be legit.
>>
>> another thing that might be happening is an overload of PRB builds on
>> the workers due to the backlog...  the workers are under a LOT of load
>> right now, and i can put some rate limiting in to see if that helps out.
>>
>> shane
>>
>> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin 
>> wrote:
>>
>>> Like from build number 125565 to 125561, all impacted by kill -9.
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>>>
>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>>>
>>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ 
>>> wrote:
>>>
 define "a lot" and provide some links to those builds, please.
 there are roughly 2000 builds per day, and i can't do more than keep a
 cursory eye on things.

 the infrastructure that the tests run on hasn't changed one bit on
 any of the workers, and 'kill -9' could be a timeout, flakiness caused 
 by
 old build processes remaining on the workers after the master went 
 down, or
 me trying to clean things up w/o a reboot.  or, perhaps, something 
 wrong
 w/the infra.  :)

 On Fri, Jul 10, 2020 at 9:28 AM Frank Yin 
 wrote:

> Agree, but I’ve seen a lot of kill by signal 9, assuming that
> infrastructure?
>
> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ 
> wrote:
>
>> yeah, i can't do much for flaky tests...  just flaky
>> infrastructure.
>>
>>
>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon <
>> gurwls...@gmail.com> wrote:
>>
>>> Couple of flaky tests can happen. It's usual. Seems it got
>>> better now at least. I will keep monitoring the builds.
>>>
>>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>>>
 Looks like Jenkins isn't stable still. My PR fails two times in
 a row:

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport



 --
 Sent from:
 http://apache-spark-developers-list.1001551.n3.nabble.com/


 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

 --
 Shane Knapp
 Computer Guy / Voice of Reason
 UC Berkeley EECS Research / RISELab Staff Technical Lead
>>

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-13 Thread Hyukjin Kwon
Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch
at https://github.com/apache/spark/pull/28957

2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon 님이 작성:

> Thanks Dongjoon. That makes much more sense now!
>
> 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun 님이 작성:
>
>> Thank you, Hyukjin.
>>
>> According to the Python community, Python 3.5 is also EOF at 2020-09-13
>> (only two months left).
>>
>> - https://www.python.org/downloads/
>>
>> So, targeting live Python versions at Apache Spark 3.1.0 (December 2020)
>> looks reasonable to me.
>>
>> For old Python versions, we still have Apache Spark 2.4 LTS and also
>> Apache Spark 3.0.x will work.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li 
>> wrote:
>>
>>> +1, especially Python 2
>>>
>>> Holden Karau  于2020年7月2日周四 上午10:20写道:
>>>
 I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It
 will be exciting to get to use more recent Python features. The most recent
 Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if
 folks really can’t upgrade there’s conda.

 Is there anyone with a large Python 3.5 fleet who can’t use conda?

 On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon 
 wrote:

> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
> should make such changes in maintenance releases
>
> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>
>> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>>
>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I would like to discuss dropping deprecated Python versions 2, 3.4
>>> and 3.5 at https://github.com/apache/spark/pull/28957. I assume
>>> people support it in general
>>> but I am writing this to make sure everybody is happy.
>>>
>>> Fokko made a very good investigation on it, see
>>> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
>>> Assuming from the statistics, I think we're pretty safe to drop them.
>>> Also note that dropping Python 2 was actually declared at
>>> https://python3statement.org/
>>>
>>> Roughly speaking, there are many main advantages by dropping them:
>>>   1. It removes a bunch of hacks we added around 700 lines in
>>> PySpark.
>>>   2. PyPy2 has a critical bug that causes a flaky test,
>>> https://issues.apache.org/jira/browse/SPARK-28358 given my testing
>>> and investigation.
>>>   3. Users can use Python type hints with Pandas UDFs without
>>> thinking about Python version
>>>   4. Users can leverage one latest cloudpickle,
>>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can
>>> also leverage C pickle.
>>>   5. ...
>>>
>>> So it benefits both users and dev. WDYT guys?
>>>
>>>
>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-13 Thread Holden Karau
Awesome, thanks you for driving this forward :)

On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon  wrote:

> Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch
> at https://github.com/apache/spark/pull/28957
>
> 2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon 님이 작성:
>
>> Thanks Dongjoon. That makes much more sense now!
>>
>> 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun 님이 작성:
>>
>>> Thank you, Hyukjin.
>>>
>>> According to the Python community, Python 3.5 is also EOF at 2020-09-13
>>> (only two months left).
>>>
>>> - https://www.python.org/downloads/
>>>
>>> So, targeting live Python versions at Apache Spark 3.1.0 (December 2020)
>>> looks reasonable to me.
>>>
>>> For old Python versions, we still have Apache Spark 2.4 LTS and also
>>> Apache Spark 3.0.x will work.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li 
>>> wrote:
>>>
 +1, especially Python 2

 Holden Karau  于2020年7月2日周四 上午10:20写道:

> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward.
> It will be exciting to get to use more recent Python features. The most
> recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with
> 3.5, if folks really can’t upgrade there’s conda.
>
> Is there anyone with a large Python 3.5 fleet who can’t use conda?
>
> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon 
> wrote:
>
>> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
>> should make such changes in maintenance releases
>>
>> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>>
>>> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>>>
>>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 I would like to discuss dropping deprecated Python versions 2, 3.4
 and 3.5 at https://github.com/apache/spark/pull/28957. I assume
 people support it in general
 but I am writing this to make sure everybody is happy.

 Fokko made a very good investigation on it, see
 https://github.com/apache/spark/pull/28957#issuecomment-652022449.
 Assuming from the statistics, I think we're pretty safe to drop
 them.
 Also note that dropping Python 2 was actually declared at
 https://python3statement.org/

 Roughly speaking, there are many main advantages by dropping them:
   1. It removes a bunch of hacks we added around 700 lines in
 PySpark.
   2. PyPy2 has a critical bug that causes a flaky test,
 https://issues.apache.org/jira/browse/SPARK-28358 given my testing
 and investigation.
   3. Users can use Python type hints with Pandas UDFs without
 thinking about Python version
   4. Users can leverage one latest cloudpickle,
 https://github.com/apache/spark/pull/28950. With Python 3.8+ it
 can also leverage C pickle.
   5. ...

 So it benefits both users and dev. WDYT guys?


 --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


[PSA] Python 2, 3.4 and 3.5 are now dropped

2020-07-13 Thread Hyukjin Kwon
I am sending another email to make sure dev people know. Python 2, 3.4 and
3.5 are now dropped at https://github.com/apache/spark/pull/28957.


Re: [PSA] Python 2, 3.4 and 3.5 are now dropped

2020-07-13 Thread Hyukjin Kwon
cc user mailing list too.

2020년 7월 14일 (화) 오전 11:27, Hyukjin Kwon 님이 작성:

> I am sending another email to make sure dev people know. Python 2, 3.4 and
> 3.5 are now dropped at https://github.com/apache/spark/pull/28957.
>
>
>


[PSA] Apache Spark uses GitHub Actions to run the tests

2020-07-13 Thread Hyukjin Kwon
Hi dev,

Github Actions build was introduced to run the regular Spark test cases at
https://github.com/apache/spark/pull/29057and
https://github.com/apache/spark/pull/29086.
This is virtually the duplication of default Jenkins PR builder at this
moment.

The only differences are:
- Github Actions does not run the tests for Kinesis, see SPARK-32246
- Github Actions does not support other profiles such as JDK 11 or Hive
1.2, see SPARK-32255
- Jenkins build does not run Java documentation build, see SPARK-32233
- Jenkins build does not run the dependency test, see SPARK-32178

Therefore, I do believe PRs can be merged in most general cases once the
Jenkins PR
builder or Github Actions build passes when we expect the successful test
results from
the default Jenkins PR builder.

Thanks.