from:"Bryan Cutler"

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-19 Thread Bryan Cutler

+1, sounds good On Wed, May 18, 2022 at 9:16 PM Dongjoon Hyun wrote: > +1 > > Thank you for the suggestion, Hyukjin. > > Dongjoon. > > On Wed, May 18, 2022 at 11:08 AM Bjørn Jørgensen > wrote: > >> +1 >> But can will have PR Title and PR label the same, PS >> >> ons. 18. mai 2022 kl. 18:57 skr

Re: Welcoming six new Apache Spark committers

2021-03-29 Thread Bryan Cutler

Congratulations everyone! On Sun, Mar 28, 2021 at 11:00 PM ML Books wrote: > Congrats all > > On Sat, Mar 27, 2021, 1:58 AM Matei Zaharia > wrote: > >> Hi all, >> >> The Spark PMC recently voted to add several new committers. Please join >> me in welcoming them to their new role! Our new commit

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-26 Thread Bryan Cutler

+1 (non-binding) On Fri, Mar 26, 2021 at 9:49 AM Maciej wrote: > +1 (nonbinding) > > On 3/26/21 3:52 PM, Hyukjin Kwon wrote: > > Hi all, > > I’d like to start a vote for SPIP: Support pandas API layer on PySpark. > > The proposal is to embrace Koalas in PySpark to have the pandas API layer > on

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-16 Thread Bryan Cutler

+1 the proposal sounds good to me. Having a familiar API built-in will really help new users get into using Spark that might only have Pandas experience. It sounds like maintenance costs should be manageable, once the hurdle with setting up tests is done. Just out of curiosity, does Koalas pretty m

Re: Welcoming some new Apache Spark committers

2020-07-14 Thread Bryan Cutler

Congratulations and welcome! On Tue, Jul 14, 2020 at 12:36 PM Xingbo Jiang wrote: > Welcome, Huaxin, Jungtaek, and Dilip! > > Congratulations! > > On Tue, Jul 14, 2020 at 10:37 AM Matei Zaharia > wrote: > >> Hi all, >> >> The Spark PMC recently voted to add several new committers. Please join >

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread Bryan Cutler

+1 (non-binding) On Mon, Jun 8, 2020, 1:49 PM Tom Graves wrote: > +1 > > Tom > > On Saturday, June 6, 2020, 03:09:09 PM CDT, Reynold Xin < > r...@databricks.com> wrote: > > > Please vote on releasing the following candidate as Apache Spark version > 3.0.0. > > The vote is open until [DUE DAY] an

Re: Revisiting Python / pandas UDF (continues)

2019-12-16 Thread Bryan Cutler

Thanks for taking this on Hyukjin! I'm looking forward to the PRs and happy to help out where I can. Bryan On Wed, Dec 4, 2019 at 9:13 PM Hyukjin Kwon wrote: > Hi all, > > I would like to finish redesigning Pandas UDF ones in Spark 3.0. > If you guys don't have a minor concern in general about

Re: Slower than usual on PRs

2019-12-16 Thread Bryan Cutler

Sorry to hear this Holden! Hope you get well soon and take it easy!! On Tue, Dec 3, 2019 at 6:21 PM Hyukjin Kwon wrote: > Yeah, please take care of your heath first! > > 2019년 12월 3일 (화) 오후 1:32, Wenchen Fan 님이 작성: > >> Sorry to hear that. Hope you get better soon! >> >> On Tue, Dec 3, 2019 at 1

Re: [build system] Upgrading pyarrow, builds might be temporarily broken

2019-11-14 Thread Bryan Cutler

Update: #26133 <https://github.com/apache/spark/pull/26133> has been merged and builds should be passing now, thanks all! On Thu, Nov 14, 2019 at 4:12 PM Bryan Cutler wrote: > We are in the process of upgrading pyarrow in the testing environment, > which might cause pyspark test fa

[build system] Upgrading pyarrow, builds might be temporarily broken

2019-11-14 Thread Bryan Cutler

We are in the process of upgrading pyarrow in the testing environment, which might cause pyspark test failures until https://github.com/apache/spark/pull/26133 is merged. Apologies for the lack of notice beforehand, but I jumped the gun a little and forgot this would affect other builds too. The PR

Re: [DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-12 Thread Bryan Cutler

, 2019 at 6:08 PM Hyukjin Kwon wrote: >> > >> > +1 >> > >> > 2019년 11월 6일 (수) 오후 11:38, Wenchen Fan 님이 작성: >> >> >> >> Sounds reasonable to me. We should make the behavior consistent within >> Spark. >> >> >> >>

[DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-04 Thread Bryan Cutler

Currently, when a PySpark Row is created with keyword arguments, the fields are sorted alphabetically. This has created a lot of confusion with users because it is not obvious (although it is stated in the pydocs) that they will be sorted alphabetically. Then later when applying a schema and the fi

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-31 Thread Bryan Cutler

+1 for deprecating On Wed, Oct 30, 2019 at 2:46 PM Shane Knapp wrote: > sure. that shouldn't be too hard, but we've historically given very > little support to it. > > On Wed, Oct 30, 2019 at 2:31 PM Maciej Szymkiewicz > wrote: > >> Could we upgrade to PyPy3.6 v7.2.0? >> On 10/30/19 9:45 PM, S

Re: Welcoming some new committers and PMC members

2019-09-17 Thread Bryan Cutler

Congratulations, all well deserved! On Thu, Sep 12, 2019, 3:32 AM Jacek Laskowski wrote: > Hi, > > What a great news! Congrats to all awarded and the community for voting > them in! > > p.s. I think it should go to the user mailing list too. > > Pozdrawiam, > Jacek Laskowski > > https://abo

Re: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-11 Thread Bryan Cutler

+1 (non-binding), looks good! On Wed, Sep 11, 2019 at 10:05 AM Ryan Blue wrote: > +1 > > This is going to be really useful. Thanks for working on it! > > On Wed, Sep 11, 2019 at 9:38 AM Felix Cheung > wrote: > >> +1 >> >> -- >> *From:* Thomas graves >> *Sent:* Wedne

Re: [DISCUSS] New sections in Github Pull Request description template

2019-07-26 Thread Bryan Cutler

The k8s template is pretty good. Under the behavior change section, it would be good to add instructions to also describe previous and new behavior as Hyukjin proposed. On Tue, Jul 23, 2019 at 10:07 PM Reynold Xin wrote: > I like the spirit, but not sure about the exact proposal. Take a look at

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler

- > *From:* Holden Karau > *Sent:* Friday, June 14, 2019 11:06:15 AM > *To:* Felix Cheung > *Cc:* Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp > *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas > > Are there other Python dependencies we sh

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler

10 AM shane knapp wrote: > ah, ok... should we downgrade the testing env on jenkins then? any > specific version? > > shane, who is loathe (and i mean LOATHE) to touch python envs ;) > > On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler wrote: > >> I should have stated

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler

easy >>> chance we’ll have to bump version numbers easily I’d suggest 0.24.2 >>> >>> >>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon >>> wrote: >>> >>>> I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and >>&

[DISCUSS] Increasing minimum supported version of Pandas

2019-06-13 Thread Bryan Cutler

Hi All, We would like to discuss increasing the minimum supported version of Pandas in Spark, which is currently 0.19.2. Pandas 0.19.2 was released nearly 3 years ago and there are some workarounds in PySpark that could be removed if such an old version is not required. This will help to keep cod

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Bryan Cutler

+1 and the draft sounds good On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: > Here is the draft announcement: > > === > Plan for dropping Python 2 support > > As many of you already knew, Python core development team and many > utilized Python packages like Pandas and NumPy will drop Python

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-08 Thread Bryan Cutler

+1 (non-binding) On Tue, May 7, 2019 at 12:04 PM Bobby Evans wrote: > I am +! > > On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: > >> Hi everyone, >> >> I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs >> for extended Columnar Processing Support. The proposal is to ext

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-02 Thread Bryan Cutler

ce with > > something that expects the data in arrow format will already have to know > > what version of the format it was programmed against and in the worst > case > > if the layout does change we can support the new layout if needed. > > > > > > On Sun, Apr 21,

Re: Thoughts on dataframe cogroup?

2019-04-23 Thread Bryan Cutler

PI. Why not >>>>>> make >>>>>> this available since most of the work would be done? >>>>>> >>>>>> On Mon, Apr 15, 2019 at 7:50 AM Li Jin wrote: >>>>>> >>>>>>> Thank you Chris,

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bryan Cutler

columnar format are mostly in > Python. > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be good enough? For example, > SPARK-26412 (UDF that takes an i

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Bryan Cutler

+1 (non-binding) On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe wrote: > +1 (non-binding). Looking forward to seeing better support for processing > columnar data. > > Jason > > On Tue, Apr 16, 2019 at 10:38 AM Tom Graves > wrote: > >> Hi everyone, >> >> I'd like to call for a vote on SPARK-27396

Re: [SPARK-25079] moving from python 3.4 to python 3.6.8, impacts all active branches

2019-04-18 Thread Bryan Cutler

Great work, thanks Shane! On Thu, Apr 18, 2019 at 2:46 PM shane knapp wrote: > alrighty folks, the future is here and we'll be moving to python 3.6 > monday! > > all three PRs are green! > master PR: https://github.com/apache/spark/pull/24266 > 2.4 PR: https://github.com/apache/spark/pull/2437

Re: Thoughts on dataframe cogroup?

2019-04-08 Thread Bryan Cutler

Chirs, an SPIP sounds good to me. I agree with Li that it wouldn't be too difficult to extend the currently functionality to transfer multiple DataFrames. For the SPIP, I would keep it more high-level and I don't think it's necessary to include details of the Python worker, we can hash that out af

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-02 Thread Bryan Cutler

for python 3.5 tho. It’s >>> just saying the next release. >>> >>> In any case I think in the next release it will be great to get more >>> Python 3.x release test coverage. >>> >>> >>> >>> -- >>

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Bryan Cutler

:58 PM Felix Cheung wrote: > 3.4 is end of life but 3.5 is not. From your link > > we expect to release Python 3.5.8 around September 2019. > > > > -- > *From:* shane knapp > *Sent:* Thursday, March 28, 2019 7:54 PM > *To:* Hyukjin Kwon &g

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-26 Thread Bryan Cutler

ch 25, 2019 6:48 PM >> *To:* Hyukjin Kwon >> *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp >> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276] >> >> I don't know a lot about Arrow here, but seems reasonable. Is this for >> Spark 3

Re: [pyspark] dataframe map_partition

2019-03-08 Thread Bryan Cutler

Hi Peng, I just added support for scalar Pandas UDF to return a StructType as a Pandas DataFrame in https://issues.apache.org/jira/browse/SPARK-23836. Is that the functionality you are looking for? Bryan On Thu, Mar 7, 2019 at 1:13 PM peng yu wrote: > right now, i'm using the colums-at-a-time

Re: Welcome Jose Torres as a Spark committer

2019-01-30 Thread Bryan Cutler

Congrats Jose! On Tue, Jan 29, 2019, 10:48 AM Shixiong Zhu Hi all, > > The Apache Spark PMC recently added Jose Torres as a committer on the > project. Jose has been a major contributor to Structured Streaming. Please > join me in welcoming him! > > Best Regards, > > Shixiong Zhu > >

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Bryan Cutler

Great work Hyukjin! I'm not too familiar with R, but I'll take a look at the PR. Bryan On Fri, Nov 9, 2018 at 9:19 AM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Thanks Hyukjin! Very cool results > > Shivaram > On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung > wrote: > > > > Very

Re: welcome a new batch of committers

2018-10-03 Thread Bryan Cutler

Congratulations everyone! Very well deserved!! On Wed, Oct 3, 2018, 1:59 AM Reynold Xin wrote: > Hi all, > > The Apache Spark PMC has recently voted to add several new committers to > the project, for their contributions: > > - Shane Knapp (contributor to infra) > - Dongjoon Hyun (contributor to

Re: python tests: any reason for a huge tests.py?

2018-09-13 Thread Bryan Cutler

Hi Imran, I agree it would be good to split up the tests, but there might be a couple things to discuss first. Right now we have a single "test.py" for each subpackage. I think it makes sense to roughly have a test file for most modules, e.g. "test_rdd.py", but it might not always be clear cut and

Re: [discuss][minor] impending python 3.x jenkins upgrade... 3.5.x? 3.6.x?

2018-08-20 Thread Bryan Cutler

Thanks for looking into this Shane! If we are choosing a single python 3.x, I think 3.6 would be good. It might still be nice to test against other versions too, so we can catch any issues. Is it possible to have more exhaustive testing as part of a nightly or scheduled build? As a point of refere

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Bryan Cutler

I agree that we should hold off on the Arrow upgrade if it requires major changes to our testing. I did have another thought that maybe we could just add another job to test against Python 3.5 and pyarrow 0.10.0 and keep all current testing the same? I'm not sure how doable that is right now and do

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-06 Thread Bryan Cutler

Hi All, I'd like to request a few days extension to the code freeze to complete the upgrade to Apache Arrow 0.10.0, SPARK-23874. This upgrade includes several key improvements and bug fixes. The RC vote just passed this morning and code changes are complete in https://github.com/apache/spark/pull

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Bryan Cutler

+1 On Mon, Jun 4, 2018 at 10:18 AM, Joseph Bradley wrote: > +1 > > On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra > wrote: > >> +1 >> >> On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 2.3.1. >>> >>> Giv

Re: Feedback on first commit + jira issue I opened

2018-05-31 Thread Bryan Cutler

Hi Andrew, Please just go ahead and make the pull request. It's easier to review and give feedback, thanks! Bryan On Thu, May 31, 2018 at 9:44 AM, Long, Andrew wrote: > Hello Friends, > > > > I’m a new committer and I’ve submitted my first patch and I had some > questions about documentation

Re: Integrating ML/DL frameworks with Spark

2018-05-14 Thread Bryan Cutler

Thanks for starting this discussion, I'd also like to see some improvements in this area and glad to hear that the Pandas UDFs / Arrow functionality might be useful. I'm wondering if from your initial investigations you found anything lacking from the Arrow format or possible improvements that wou

Re: 回复： Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Bryan Cutler

Congratulations Zhenhua! On Mon, Apr 2, 2018 at 12:01 PM, ron8hu wrote: > Congratulations, Zhenhua! Well deserved!! > > Ron > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To unsubscri

Re: Silencing messages from Ivy when calling spark-submit

2018-03-06 Thread Bryan Cutler

> Would you happen to know what setting I need? I'm looking here > <http://ant.apache.org/ivy/history/latest-milestone/settings.html>, but > it's a bit overwhelming. I'm basically looking for a way to set the overall > Ivy log level to WARN or higher. > > Nic

Re: Silencing messages from Ivy when calling spark-submit

2018-03-05 Thread Bryan Cutler

Hi Nick, Not sure about changing the default to warnings only because I think some might find the resolution output useful, but you can specify your own ivy settings file with "spark.jars.ivySettings" to point to your ivysettings.xml file. Would that work for you to configure it there? Bryan On

Re: Welcoming some new committers

2018-03-05 Thread Bryan Cutler

Thanks everyone, this is very exciting! I'm looking forward to working with you all and helping out more in the future. Also, congrats to the other committers as well!!

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Bryan Cutler

+1 Tests passed and additionally ran Arrow related tests and did some perf checks with python 2.7.14 On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau wrote: > Note: given the state of Jenkins I'd love to see Bryan Cutler or someone > with Arrow experience sign off on this release. >

Re: JIRA access

2018-02-23 Thread Bryan Cutler

Hi Arun, The general process is to just leave a comment in the JIRA that you are working on it so others know. Once your pull request is merged, the JIRA will be assigned to you. You can read http://spark.apache.org/contributing.html for details. On Fri, Feb 23, 2018 at 9:08 PM, Arun Manivannan

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Bryan Cutler

rld bugs, any particular reason why? >>>> Would you be comfortable with doing it in 2.3.1? >>>> >>>> >>>> >>>> > Also lets try to keep track in our commit messages which version of >>>> cloudpickle we end up upgrad

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Bryan Cutler

> > I am technically involved in cloudpickle dev although less active. > They changed default pickle protocol (https://github.com/cloudpipe/ > cloudpickle/pull/127). So, if we target 0.5.x+, we should double check > the potential compatibility issue, or fix the protocol, which I beli

Thoughts on Cloudpickle Update

2018-01-15 Thread Bryan Cutler

Hi All, I've seen a couple issues lately related to cloudpickle, notably https://issues.apache.org/jira/browse/SPARK-22674, and would like to get some feedback on updating the version in PySpark which should fix these issues and allow us to remove some workarounds. Spark is currently using a fork

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-07 Thread Bryan Cutler

+1 (non-binding) for the goals and non-goals of this SPIP. I think it's fine to work out the minor details of the API during review. Bryan On Wed, Sep 6, 2017 at 5:17 AM, Takuya UESHIN wrote: > Hi all, > > Thank you for voting and suggestions. > > As Wenchen mentioned and also we're discussing

Re: Run a specific PySpark test or group of tests

2017-08-15 Thread Bryan Cutler

This generally works for me to just run tests within a class or even a single test. Not as flexible as pytest -k, which would be nice.. $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Pytest does

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Bryan Cutler

Great work Hyukjin and Sameer! On Mon, Aug 7, 2017 at 10:22 AM, Mridul Muralidharan wrote: > Congratulations Hyukjin, Sameer ! > > Regards, > Mridul > > On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia > wrote: > > Hi everyone, > > > > The Spark PMC recently voted to add Hyukjin Kwon and Sameer Ag

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Bryan Cutler

run it manually, but >> yeah I'm not sure where it normally runs or why it hasn't. Shane not sure >> if you're the person to ask? >> >> >> On Wed, Aug 2, 2017 at 7:47 PM Bryan Cutler wrote: >> >>> Hi Devs, >>> >>> I'

Some PRs not automatically linked to JIRAs

2017-08-02 Thread Bryan Cutler

Hi Devs, I've noticed a couple PRs recently have not been automatically linked to the related JIRAs. This was one of mine (I linked it manually) https://issues.apache.org/jira/browse/SPARK-21583, but I've seen it happen elsewhere. I think this is the script that does it, but it hasn't been chang

Re: welcoming Burak and Holden as committers

2017-01-25 Thread Bryan Cutler

Congratulations Holden and Burak, well deserved!!! On Tue, Jan 24, 2017 at 10:13 AM, Reynold Xin wrote: > Hi all, > > Burak and Holden have recently been elected as Apache Spark committers. > > Burak has been very active in a large number of areas in Spark, including > linear algebra, stats/math

Re: Belief propagation algorithm is open sourced

2016-12-14 Thread Bryan Cutler

I'll check it out, thanks for sharing Alexander! On Dec 13, 2016 4:58 PM, "Ulanov, Alexander" wrote: > Dear Spark developers and users, > > > HPE has open sourced the implementation of the belief propagation (BP) > algorithm for Apache Spark, a popular message passing algorithm for > performing

Re: Why is there no flatten method on RDD?

2016-12-14 Thread Bryan Cutler

Hi Tarun, I think there just hasn't been a strong need for it when you can accomplish the same with just rdd.flatMap(identity). I see a JIRA was just opened for this https://issues.apache.org/jira/browse/SPARK-18855 On Mon, Dec 5, 2016 at 2:55 PM, Tarun Kumar wrote: > Hi, > > Although a flatMa

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Bryan Cutler

Congrats Xiao! On Tue, Oct 4, 2016 at 11:14 AM, Holden Karau wrote: > Congratulations :D :) Yay! > > On Tue, Oct 4, 2016 at 11:14 AM, Suresh Thalamati < > suresh.thalam...@gmail.com> wrote: > >> Congratulations, Xiao! >> >> >> >> > On Oct 3, 2016, at 10:46 PM, Reynold Xin wrote: >> > >> > Hi al

Re: AccumulatorV2 += operator

2016-08-03 Thread Bryan Cutler

e extending the accumulator (but it certainly could cause > confusion). > > Reynold can provide a more definitive answer in this case. > > On Tue, Aug 2, 2016 at 1:46 PM, Bryan Cutler wrote: > >> It seems like the += operator is missing from the new accumulator API, >> alt

AccumulatorV2 += operator

2016-08-02 Thread Bryan Cutler

It seems like the += operator is missing from the new accumulator API, although the docs still make reference to it. Anyone know if it was intentionally not put in? I'm happy to do a PR for it or update the docs to just use the add() method, just want to check if there was some reason first. Bry

Re: Welcoming Yanbo Liang as a committer

2016-06-05 Thread Bryan Cutler

Congratulations Yanbo! On Jun 5, 2016 4:03 AM, "Kousuke Saruta" wrote: > Congratulations Yanbo! > > > - Kousuke > > On 2016/06/04 11:48, Matei Zaharia wrote: > >> Hi all, >> >> The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been >> a super active contributor in many areas of

Re: Organizing Spark ML example packages

2016-04-19 Thread Bryan Cutler

+1, adding some organization would make it easier for people to find a specific example On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang wrote: > This sounds good to me, and it will make ML examples more neatly. > > 2016-04-14 5:28 GMT-07:00 Nick Pentreath : > >> Hey Spark devs >> >> I noticed that

Re: pull request template

2016-03-19 Thread Bryan Cutler

+1 on Marcelo's comments. It would be nice not to pollute commit messages with the instructions because some people might forget to remove them. Nobody has suggested removing the template. On Tue, Mar 15, 2016 at 3:59 PM, Joseph Bradley wrote: > +1 for keeping the template > > I figure any temp

Re: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Bryan Cutler

> > *From:* Bryan Cutler [mailto:cutl...@gmail.com] > *Sent:* Thursday, January 14, 2016 2:19 PM > *To:* Rachana Srivastava > *Cc:* u...@spark.apache.org; dev@spark.apache.org > *Subject:* Re: Random Forest FeatureImportance throwing > NullPointerException > > > > Hi Rac

Re: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Bryan Cutler

Hi Rachana, I got the same exception. It is because computing the feature importance depends on impurity stats, which is not calculated with the old RandomForestModel in MLlib. Feel free to create a JIRA for this if you think it is necessary, otherwise I believe this problem will be eventually s

Re: running lda in spark throws exception

2016-01-14 Thread Bryan Cutler

s the input to re-encode terms? > > On Thu, Jan 14, 2016 at 6:53 AM, Bryan Cutler wrote: > > I was now able to reproduce the exception using the master branch and > local > > mode. It looks like the problem is the vectors of term counts in the > corpus > > are no

Re: running lda in spark throws exception

2016-01-13 Thread Bryan Cutler

> for (word <- Range(0, ldaModel.vocabSize)) { print(" " + > > topics(word, topic)); } > > > > println() > > > > } > > > > > > // Save and load model. > > > > ldaModel.save(sc, args(1)) > > > >

Re: running lda in spark throws exception

2016-01-08 Thread Bryan Cutler

Hi Li, I tried out your code and sample data in both local mode and Spark Standalone and it ran correctly with output that looks good. Sorry, I don't have a YARN cluster setup right now, so maybe the error you are seeing is specific to that. Btw, I am running the latest Spark code from the maste

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Bryan Cutler

Hi Jacek, I also recently noticed those messages, and some others, and am wondering if there is an issue. I am also seeing the following when I have event logging enabled. The first application is submitted and executes fine, but all subsequent attempts produce an error log, but the master fails

Re: let spark streaming sample come to stop

2015-11-16 Thread Bryan Cutler

Hi Renyi, This is the intended behavior of the streaming HdfsWordCount example. It makes use of a 'textFileStream' which will monitor a hdfs directory for any newly created files and push them into a dstream. It is meant to be run indefinitely, unless interrupted by ctrl-c, for example. -bryan

72 matches

Mail list logo