date:20180708

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Hyukjin Kwon

The reason is that it is not 100% clear if the root cause in the Sphinx bug is Python 2 and if the workaround is to use Python 3. Xiangrui opened a bug against Sphinx https://github.com/sphinx-doc/sphinx/issues/5142 Here is my observation: - Sphinx seems having a bug that it does not respect 'auto

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao

Thanks @Hyukjin Kwon . Yes I'm using python2 to build docs, looks like Python2 with Sphinx has issues. What is the pending thing for this PR ( https://github.com/apache/spark/pull/21659)? I'm planning to cut RC2 once this is merged, do you an ETA for this PR? Hyukjin Kwon 于2018年7月9日周一上午9:06写道：

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Hyukjin Kwon

Seems Python 2's Sphinx was used - https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression and SPARK-24530 issue exists in the RC. it's kind of tricky to manually verify if Python 3 is used given my few tries in my l

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao

Hi Sean, SPARK-24530 is not included in this RC1 release. Actually I'm so familiar with this issue so still using python2 to generate docs. In the JIRA it mentioned that python3 with sphinx could workaround this issue. @Hyukjin Kwon would you please help to clarify? Thanks Saisai Xiao Li 于20

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Xiao Li

Three business days might be too short. Let us open the vote until the end of this Friday (July 13th)? Cheers, Xiao 2018-07-08 10:15 GMT-07:00 Sean Owen : > Just checking that the doc issue in https://issues.apache.org/ > jira/browse/SPARK-24530 is worked around in this release? > > This was po

Re: [SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow

2018-07-08 Thread Reynold Xin

Yes I would just reuse the same function. On Sun, Jul 8, 2018 at 5:01 AM Li Jin wrote: > Hi Linar, > > This seems useful. But perhaps reusing the same function name is better? > > > http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.createDataFrame > > Curren

Re: [DESIGN] Barrier Execution Mode

2018-07-08 Thread Reynold Xin

Xingbo, Please reference the spip and jira ticket next time: [SPARK-24374] SPIP: Support Barrier Scheduling in Apache Spark On Sun, Jul 8, 2018 at 9:45 AM Xingbo Jiang wrote: > Hi All, > > I would like to invite you to review the design document for Barrier > Execution Mode: > > https://docs.g

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Sean Owen

Just checking that the doc issue in https://issues.apache.org/jira/browse/SPARK-24530 is worked around in this release? This was pointed out as an example of a broken doc: https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression Here it is in 2.3

[DESIGN] Barrier Execution Mode

2018-07-08 Thread Xingbo Jiang

Hi All, I would like to invite you to review the design document for Barrier Execution Mode: https://docs.google.com/document/d/1GvcYR6ZFto3dOnjfLjZMtTezX0W5VYN9w1l4-tQXaZk/edit# TL;DR: We announced the project Hydrogen on recent Spark+AI Summit, a major part of the project involves significant c

Re: [SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow

2018-07-08 Thread Li Jin

Hi Linar, This seems useful. But perhaps reusing the same function name is better? http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SparkSession.createDataFrame Currently createDataFrame takes an RDD of any kind of SQL data representation(e.g. row, tuple, int, boolean,

[SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow

2018-07-08 Thread Linar Savion

We've created a snippet that creates a Spark DF from a RDD of many pandas DFs in a distributed manner that does not require the driver to collect the entire dataset. Early tests show a performance improvement of x6-x10 over using pandasDF->Rows>sparkDF. I've seen that there are some open pull req

[VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao

Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until July 11th PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ... To l

Re: [VOTE] SPARK 2.3.2 (RC1)

Re: [VOTE] SPARK 2.3.2 (RC1)

Re: [VOTE] SPARK 2.3.2 (RC1)

Re: [VOTE] SPARK 2.3.2 (RC1)

Re: [VOTE] SPARK 2.3.2 (RC1)

Re: [SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow

Re: [DESIGN] Barrier Execution Mode

Re: [VOTE] SPARK 2.3.2 (RC1)

[DESIGN] Barrier Execution Mode

Re: [SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow

[SPARK][SQL] Distributed createDataframe from many pandas DFs using Arrow

[VOTE] SPARK 2.3.2 (RC1)

12 matches

Site Navigation

Mail list logo

Footer information