Well other than making the code consistent whats the high level goal in doing this and why does it matter so much how many workers we have in different scenarios (pyspark versus different components of spark). I'm ok not making the change and working on something else to be honest but spending hours troubleshooting issues in a local dev environment that doesnt resemble jenkins closely enough is not a productive use of time. Would love to get input on next logical steps.
________________________________ From: Reynold Xin <r...@databricks.com> Sent: Monday, December 5, 2016 6:44 PM To: Saikat Kanjilal Cc: dev@spark.apache.org Subject: Re: Spark-9487, Need some insight Honestly it is pretty difficult. Given the difficulty, would it still make sense to do that change? (the one that sets the same number of workers/parallelism across different languages in testing) On Mon, Dec 5, 2016 at 3:33 PM, Saikat Kanjilal <sxk1...@hotmail.com<mailto:sxk1...@hotmail.com>> wrote: Hello again dev community, Ping on this, apologies for rerunning this thread but never heard from anyone, based on this link: https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins I can try to install jenkins locally but is that really needed? Thanks in advance. ________________________________ From: Saikat Kanjilal <sxk1...@hotmail.com<mailto:sxk1...@hotmail.com>> Sent: Tuesday, November 29, 2016 8:14 PM To: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Spark-9487, Need some insight Hello Spark dev community, I took this the following jira item (https://github.com/apache/spark/pull/15848) and am looking for some general pointers, it seems that I am running into issues where things work successfully doing local development on my macbook pro but fail on jenkins for a multitiude of reasons and errors, here's an example, if you see this build output report: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69297/ you will see the DataFrameStatSuite, now locally I am running these individual tests with this command: ./build/mvn test -P... -DwildcardSuites=none -Dtest=org.apache.spark.sql.DataFrameStatSuite. It seems that I need to emulate a jenkins like environment locally, this seems sort of like an untenable hurdle, granted that my changes involve changing the total number of workers in the sparkcontext and if so should I be testing my changes in an environment that more closely resembles jenkins. I really want to work on/complete this PR but I keep getting hamstrung by a dev environment that is not equivalent to our CI environment. I'm guessing/hoping I'm not the first one to run into this so some insights. pointers to get past this would be very appreciated , would love to keep contributing and hoping this is a hurdle that's overcomeable with some tweaks to my dev environment. Thanks in advance.