Devs? Is this an issue for you that deserves a ticket?
On Sun, Mar 2, 2014 at 4:32 PM, Nicholas Chammas <nicholas.cham...@gmail.com > wrote: > So this issue appears to be related to the other Python 2.7-related issue > I reported in this > thread<http://apache-spark-user-list.1001560.n3.nabble.com/java-net-SocketException-on-reduceByKey-in-pyspark-td2184.html> > . > > Shall I open a bug in JIRA about this and include the wikistat repro? > > Nick > > > On Sun, Mar 2, 2014 at 1:50 AM, nicholas.chammas < > nicholas.cham...@gmail.com> wrote: > >> Unexpected behavior. Here's the repro: >> >> 1. Launch an EC2 cluster with spark-ec2. 1 slave; default instance >> type. >> 2. Upgrade the cluster to Python 2.7 using the instructions >> here<https://spark-project.atlassian.net/browse/SPARK-922?focusedCommentId=15711&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15711> >> . >> 3. pip2.7 install numpy >> 4. Run this script in the pyspark shell: >> >> wikistat = sc.textFile('s3n://ACCESSKEY:SECRET@bigdatademo >> /sample/wiki/pagecounts-20100212-050000.gz') >> wikistat = wikistat.map(lambda x: x.split(' ')).cache() >> wikistat.map(lambda x: (x[1], int(x[3]))).map(lambda x: >> (x[1],x[0])).sortByKey(False).take(5) >> >> 5. You will see a long error output that includes a complaint about >> NumPy not being installed. >> 6. Now remove the sortByKey() from that last line and rerun it. >> >> wikistat.map(lambda x: (x[1], int(x[3]))).map(lambda x: >> (x[1],x[0])).take(5) >> >> You should see your results without issue. So it's the sortByKey() >> that's choking. >> 7. Quit the pyspark shell and pip uninstall numpy. >> 8. Rerun the three lines from step 4. Enjoy your sorted results >> error-free. >> >> Can anyone else reproduce this issue? Is it a bug? I don't see it if I >> leave the cluster on the default Python 2.6.8. >> >> Installing numpy on the slave via pssh and pip2.7 (so that it's identical >> to the master) does not fix the issue. Dunno if installing Python packages >> everywhere is even necessary though. >> >> Nick >> >> >> ------------------------------ >> View this message in context: Python 2.7 + numpy break >> sortByKey()<http://apache-spark-user-list.1001560.n3.nabble.com/Python-2-7-numpy-break-sortByKey-tp2214.html> >> Sent from the Apache Spark User List mailing list >> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >> > >