Re: [pyspark][SPARK-25079]: preparing to enter the brave new world of python3.5!

2018-08-09 Thread shane knapp
also, i looked pretty closely @ the python3.5 release notes, and nothing caught my eye as being a showstopper. On Thu, Aug 9, 2018 at 10:41 AM, shane knapp wrote: > please see: https://issues.apache.org/jira/browse/SPARK-25079 > > this is holding back the arrow 0.10.0 upgrade. > > i'm fairly ce

[pyspark][SPARK-25079]: preparing to enter the brave new world of python3.5!

2018-08-09 Thread shane knapp
please see: https://issues.apache.org/jira/browse/SPARK-25079 this is holding back the arrow 0.10.0 upgrade. i'm fairly certain that things won't break w/this bump, and it shouldn't impact the 2.4 cut and code freeze, but i'd like to have some pyspark folks chime in. thanks in advance, shane -

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-09 Thread makatun
Here are the images missing in the previous mail. My apologies. -- Sent from: http://apache-spark-develo

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-09 Thread makatun
Following the discussion and recommendations by the link you provided, we ran tests with disabled constraint propagation, using the following option: spark.conf.set(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key, false) The resulting measurements are on the plot: