Re: pull request template
It's a good idea. I would add in there the spec for the PR title. I always get wrong the order between Jira and component. Moreover, CONTRIBUTING.md is also lacking them. Any reason not to add it there? I can open PRs for both, but maybe you want to keep that info on the wiki instead. iulian On Thu, Feb 18, 2016 at 4:18 AM, Reynold Xin wrote: > Github introduced a new feature today that allows projects to define > templates for pull requests. I pushed a very simple template to the > repository: > > https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE > > > Over time I think we can see how this works and perhaps add a small > checklist to the pull request template so contributors are reminded every > time they submit a pull request the important things to do in a pull > request (e.g. having proper tests). > > > > ## What changes were proposed in this pull request? > > (Please fill in changes proposed in this fix) > > > ## How was the this patch tested? > > (Please explain how this patch was tested. E.g. unit tests, integration > tests, manual tests) > > > (If this patch involves UI changes, please attach a screenshot; otherwise, > remove this) > > > -- -- Iulian Dragos -- Reactive Apps on the JVM www.typesafe.com
Re: pull request template
All that seems fine. All of this is covered in the contributing wiki, which is linked from CONTRIBUTING.md (and should be from the template), but people don't seem to bother reading it. I don't mind duplicating some key points, and even a more explicit exhortation to read the whole wiki, before considering opening a PR. We spend way too much time asking people to fix things they should have taken 60 seconds to do correctly in the first place. On Fri, Feb 19, 2016 at 10:33 AM, Iulian Dragoș wrote: > It's a good idea. I would add in there the spec for the PR title. I always > get wrong the order between Jira and component. > > Moreover, CONTRIBUTING.md is also lacking them. Any reason not to add it > there? I can open PRs for both, but maybe you want to keep that info on the > wiki instead. > > iulian > > On Thu, Feb 18, 2016 at 4:18 AM, Reynold Xin wrote: >> >> Github introduced a new feature today that allows projects to define >> templates for pull requests. I pushed a very simple template to the >> repository: >> >> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE >> >> >> Over time I think we can see how this works and perhaps add a small >> checklist to the pull request template so contributors are reminded every >> time they submit a pull request the important things to do in a pull request >> (e.g. having proper tests). >> >> >> >> ## What changes were proposed in this pull request? >> >> (Please fill in changes proposed in this fix) >> >> >> ## How was the this patch tested? >> >> (Please explain how this patch was tested. E.g. unit tests, integration >> tests, manual tests) >> >> >> (If this patch involves UI changes, please attach a screenshot; otherwise, >> remove this) >> >> > > > > -- > > -- > Iulian Dragos > > -- > Reactive Apps on the JVM > www.typesafe.com > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Write access to wiki
Any chance I could also get write access to the wiki? I'd like to update some of the PySpark documentation in the wiki. On Tue, Jan 12, 2016 at 10:14 AM, shane knapp wrote: > > Ok, sounds good. I think it would be great, if you could add installing > the > > 'docker-engine' package and starting the 'docker' service in there too. I > > was planning to update the playbook if there were one in the apache/spark > > repo but I didn't see one, hence my question. > > > we currently have docker 1.5 running on the worker, and after the > Great Upgrade To CentOS7, we'll be running a much more modern version > of docker. > > shane > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau
Re: How to run PySpark tests?
Or wait I don't have access to the wiki - if anyone can give me wiki access I'll update the instructions. On Thu, Feb 18, 2016 at 8:45 PM, Holden Karau wrote: > Great - I'll update the wiki. > > On Thu, Feb 18, 2016 at 8:34 PM, Jason White > wrote: > >> Compiling with `build/mvn -Pyarn -Phadoop-2.4 -Phive >> -Dhadoop.version=2.4.0 >> -DskipTests clean package` followed by `python/run-tests` seemed to do the >> trick! Thanks! >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-run-PySpark-tests-tp16357p16362.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau
Re: pull request template
We can add that too - just need to figure out a good way so people don't leave a lot of the unnecessary "guideline" messages in the template. The contributing guide is great, but unfortunately it is not as noticeable and is often ignored. It's good to have this full-fledged contributing guide, and then have a very lightweight version of that in the form of templates to force contributors to think about all the important aspects outlined in the contributing guide. On Fri, Feb 19, 2016 at 2:36 AM, Sean Owen wrote: > All that seems fine. All of this is covered in the contributing wiki, > which is linked from CONTRIBUTING.md (and should be from the > template), but people don't seem to bother reading it. I don't mind > duplicating some key points, and even a more explicit exhortation to > read the whole wiki, before considering opening a PR. We spend way too > much time asking people to fix things they should have taken 60 > seconds to do correctly in the first place. > > On Fri, Feb 19, 2016 at 10:33 AM, Iulian Dragoș > wrote: > > It's a good idea. I would add in there the spec for the PR title. I > always > > get wrong the order between Jira and component. > > > > Moreover, CONTRIBUTING.md is also lacking them. Any reason not to add it > > there? I can open PRs for both, but maybe you want to keep that info on > the > > wiki instead. > > > > iulian > > > > On Thu, Feb 18, 2016 at 4:18 AM, Reynold Xin > wrote: > >> > >> Github introduced a new feature today that allows projects to define > >> templates for pull requests. I pushed a very simple template to the > >> repository: > >> > >> > https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE > >> > >> > >> Over time I think we can see how this works and perhaps add a small > >> checklist to the pull request template so contributors are reminded > every > >> time they submit a pull request the important things to do in a pull > request > >> (e.g. having proper tests). > >> > >> > >> > >> ## What changes were proposed in this pull request? > >> > >> (Please fill in changes proposed in this fix) > >> > >> > >> ## How was the this patch tested? > >> > >> (Please explain how this patch was tested. E.g. unit tests, integration > >> tests, manual tests) > >> > >> > >> (If this patch involves UI changes, please attach a screenshot; > otherwise, > >> remove this) > >> > >> > > > > > > > > -- > > > > -- > > Iulian Dragos > > > > -- > > Reactive Apps on the JVM > > www.typesafe.com > > >
Re: DataFrame API and Ordering
I am not sure. Spark SQL, DataFrames and Datasets Guide already has a section about NaN semantics. This could be a good place to add at least some basic description. For the rest InterpretedOrdering could be a good choice. On 02/19/2016 12:35 AM, Reynold Xin wrote: > You are correct and we should document that. > > Any suggestions on where we should document this? In DoubleType and > FloatType? > > On Tuesday, February 16, 2016, Maciej Szymkiewicz > mailto:mszymkiew...@gmail.com>> wrote: > > I am not sure if I've missed something obvious but as far as I can > tell > DataFrame API doesn't provide a clearly defined ordering rules > excluding > NaN handling. Methods like DataFrame.sort or sql.functions like min / > max provide only general description. Discrepancy between > functions.max > (min) and GroupedData.max where the latter one supports only numeric > makes current situation even more confusing. With growing number of > orderable types I believe that documentation should clearly define > ordering rules including: > > - NULL behavior > - collation > - behavior on complex types (structs, arrays) > > While this information can extracted from the source it is not easily > accessible and without explicit specification it is not clear if > current > behavior is contractual. It can be also confusing if user expects an > order depending on a current locale (R). > > Best, > Maciej > signature.asc Description: OpenPGP digital signature