Hi, I think that solution is too simple. Just download anaconda (if you pay for the licensed version you will eventually feel like being in heaven when you move to CI and CD and live in a world where you have a data product actually running in real life).
Then start the pyspark program by including the following: PYSPARK_PYTHON=<<path to your anaconda installation>>/anaconda2/bin/python2.7 PATH=$PATH:<<path to your anaconda installation>>/anaconda/bin <<path to your pyspark>>/pyspark :) In case you are using it in EMR the solution is a bit tricky. Just let me know in case you want any further help. Regards, Gourav Sengupta On Thu, Jun 2, 2016 at 7:59 PM, Eike von Seggern <eike.segg...@sevenval.com> wrote: > Hi, > > are you using Spark on one machine or many? > > If on many, are you sure numpy is correctly installed on all machines? > > To check that the environment is set-up correctly, you can try something > like > > import os > pythonpaths = sc.range(10).map(lambda i: > os.environ.get("PYTHONPATH")).collect() > print(pythonpaths) > > HTH > > Eike > > 2016-06-02 15:32 GMT+02:00 Bhupendra Mishra <bhupendra.mis...@gmail.com>: > >> did not resolved. :( >> >> On Thu, Jun 2, 2016 at 3:01 PM, Sergio Fernández <wik...@apache.org> >> wrote: >> >>> >>> On Thu, Jun 2, 2016 at 9:59 AM, Bhupendra Mishra < >>> bhupendra.mis...@gmail.com> wrote: >>>> >>>> and i have already exported environment variable in spark-env.sh as >>>> follows.. error still there error: ImportError: No module named numpy >>>> >>>> export PYSPARK_PYTHON=/usr/bin/python >>>> >>> >>> According the documentation at >>> http://spark.apache.org/docs/latest/configuration.html#environment-variables >>> the PYSPARK_PYTHON environment variable is for poniting to the Python >>> interpreter binary. >>> >>> If you check the programming guide >>> https://spark.apache.org/docs/0.9.0/python-programming-guide.html#installing-and-configuring-pyspark >>> it says you need to add your custom path to PYTHONPATH (the script >>> automatically adds the bin/pyspark there). >>> >>> So typically in Linux you would need to add the following (assuming you >>> installed numpy there): >>> >>> export PYTHONPATH=$PYTHONPATH:/usr/lib/python2.7/dist-packages >>> >>> Hope that helps. >>> >>> >>> >>> >>>> On Thu, Jun 2, 2016 at 12:04 AM, Julio Antonio Soto de Vicente < >>>> ju...@esbet.es> wrote: >>>> >>>>> Try adding to spark-env.sh (renaming if you still have it with >>>>> .template at the end): >>>>> >>>>> PYSPARK_PYTHON=/path/to/your/bin/python >>>>> >>>>> Where your bin/python is your actual Python environment with Numpy >>>>> installed. >>>>> >>>>> >>>>> El 1 jun 2016, a las 20:16, Bhupendra Mishra < >>>>> bhupendra.mis...@gmail.com> escribió: >>>>> >>>>> I have numpy installed but where I should setup PYTHONPATH? >>>>> >>>>> >>>>> On Wed, Jun 1, 2016 at 11:39 PM, Sergio Fernández <wik...@apache.org> >>>>> wrote: >>>>> >>>>>> sudo pip install numpy >>>>>> >>>>>> On Wed, Jun 1, 2016 at 5:56 PM, Bhupendra Mishra < >>>>>> bhupendra.mis...@gmail.com> wrote: >>>>>> >>>>>>> Thanks . >>>>>>> How can this be resolved? >>>>>>> >>>>>>> On Wed, Jun 1, 2016 at 9:02 PM, Holden Karau <hol...@pigscanfly.ca> >>>>>>> wrote: >>>>>>> >>>>>>>> Generally this means numpy isn't installed on the system or your >>>>>>>> PYTHONPATH has somehow gotten pointed somewhere odd, >>>>>>>> >>>>>>>> On Wed, Jun 1, 2016 at 8:31 AM, Bhupendra Mishra < >>>>>>>> bhupendra.mis...@gmail.com> wrote: >>>>>>>> >>>>>>>>> If any one please can help me with following error. >>>>>>>>> >>>>>>>>> File >>>>>>>>> "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py", >>>>>>>>> line 25, in <module> >>>>>>>>> >>>>>>>>> ImportError: No module named numpy >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks in advance! >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Cell : 425-233-8271 >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sergio Fernández >>>>>> Partner Technology Manager >>>>>> Redlink GmbH >>>>>> m: +43 6602747925 >>>>>> e: sergio.fernan...@redlink.co >>>>>> w: http://redlink.co >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Sergio Fernández >>> Partner Technology Manager >>> Redlink GmbH >>> m: +43 6602747925 >>> e: sergio.fernan...@redlink.co >>> w: http://redlink.co >>> >> >> > > > -- > ------------------------------------------------ > *Jan Eike von Seggern* > Data Scientist > ------------------------------------------------ > *Sevenval Technologies GmbH * > > FRONT-END-EXPERTS SINCE 1999 > > Köpenicker Straße 154 | 10997 Berlin > > office +49 30 707 190 - 229 > mail eike.segg...@sevenval.com > > www.sevenval.com > > Sitz: Köln, HRB 79823 > Geschäftsführung: Jan Webering (CEO), Thorsten May, Sascha Langfus, > Joern-Carlos Kuntze > > *Wir erhöhen den Return On Investment bei Ihren Mobile und Web-Projekten. > Sprechen Sie uns an:*http://roi.sevenval.com/ > > ----------------------------------------------------------------------------------------------------------------------------------------------- > FOLLOW US on > > [image: Sevenval blog] > <http://sevenval.us11.list-manage1.com/track/click?u=5f2d34577b3182d6f029ebe63&id=ff955ef848&e=b789cc1a5f> > > [image: sevenval on twitter] > <http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=998e8f655c&e=b789cc1a5f> > [image: sevenval on linkedin] > <http://sevenval.us11.list-manage.com/track/click?u=5f2d34577b3182d6f029ebe63&id=7ae7d93d42&e=b789cc1a5f>[image: > sevenval on pinterest] > <http://sevenval.us11.list-manage2.com/track/click?u=5f2d34577b3182d6f029ebe63&id=f8c66fb950&e=b789cc1a5f> >