subject:"large number of import\-related function calls in PySpark profile"

Re: large number of import-related function calls in PySpark profile

2015-09-03 Thread Davies Liu

The slowness in PySpark may be related to searching path added by PySpark, could you show the sys.path? On Thu, Sep 3, 2015 at 1:38 PM, Priedhorsky, Reid wrote: > > On Sep 3, 2015, at 12:39 PM, Davies Liu wrote: > > I think this is not a problem of PySpark, you also saw this if you > profile thi

Re: large number of import-related function calls in PySpark profile

2015-09-03 Thread Priedhorsky, Reid

On Sep 3, 2015, at 12:39 PM, Davies Liu mailto:dav...@databricks.com>> wrote: I think this is not a problem of PySpark, you also saw this if you profile this script: ``` list(map(map_, range(sc.defaultParallelism))) ``` 81777/808740.0860.0000.3600.000 :2264(_handle_fromlist) T

Re: large number of import-related function calls in PySpark profile

2015-09-03 Thread Davies Liu

I think this is not a problem of PySpark, you also saw this if you profile this script: ``` list(map(map_, range(sc.defaultParallelism))) ``` 81777/808740.0860.0000.3600.000 :2264(_handle_fromlist) On Thu, Sep 3, 2015 at 11:16 AM, Priedhorsky, Reid wrote: > > On Sep 2, 2015, a

Re: large number of import-related function calls in PySpark profile

2015-09-03 Thread Priedhorsky, Reid

On Sep 2, 2015, at 11:31 PM, Davies Liu mailto:dav...@databricks.com>> wrote: Could you have a short script to reproduce this? Good point. Here you go. This is Python 3.4.3 on Ubuntu 15.04. import pandas as pd # must be in default path for interpreter import pyspark LEN = 260 ITER_CT = 1

Re: large number of import-related function calls in PySpark profile

2015-09-02 Thread Davies Liu

Could you have a short script to reproduce this? On Wed, Sep 2, 2015 at 2:10 PM, Priedhorsky, Reid wrote: > Hello, > > I have a PySpark computation that relies on Pandas and NumPy. Currently, my > inner loop iterates 2,000 times. I’m seeing the following show up in my > profiling: > > 74804/29102

large number of import-related function calls in PySpark profile

2015-09-02 Thread Priedhorsky, Reid

Hello, I have a PySpark computation that relies on Pandas and NumPy. Currently, my inner loop iterates 2,000 times. I’m seeing the following show up in my profiling: 74804/291020.2040.0002.1730.000 :2234(_find_and_load) 74804/291020.1450.0001.8670.000 :2207(_find

Re: large number of import-related function calls in PySpark profile

Re: large number of import-related function calls in PySpark profile

Re: large number of import-related function calls in PySpark profile

Re: large number of import-related function calls in PySpark profile

Re: large number of import-related function calls in PySpark profile

large number of import-related function calls in PySpark profile

6 matches

Site Navigation

Mail list logo

Footer information