[VOTE] Deprecate SparkR

2024-08-21 Thread Shivaram Venkataraman
Hi all Based on the previous discussion thread [1], I hereby call a vote to deprecate the SparkR module in Apache Spark with the upcoming Spark 4 release and remove it in the next major release Spark 5. [ ] +1: Accept the proposal [ ] +0 [ ] -1: I don’t think this is a good idea because .. This

Re: [DISCUSS] Deprecating SparkR

2024-08-21 Thread Shivaram Venkataraman
>> I first wondered about the future of SparkR after noticing >>>> <https://lists.apache.org/thread/jd1hyq6c9v1qg0ym5qlct8lgcxk9yd6z> how >>>> low the visit stats were for the R API docs as compared to Python and >>>> Scala. (I can’t seem to find those vis

Re: [外部邮件] Re: [DISCUSS] Deprecating SparkR

2024-08-21 Thread Shivaram Venkataraman
ek Lim >> 时间:2024-08-16 09:06:52 >> 主题:[外部邮件] Re: [DISCUSS] Deprecating SparkR >> 收件人:Wenchen Fan; >> 抄送人:L. C. Hsieh;Dongjoon >> Hyun;Holden >> Karau;Xiao Li;Hyukjin Kwon< >> gurwls...@apache.org>;Nicholas Chammas;Shivaram >> Venkataraman;

[DISCUSS] Deprecating SparkR

2024-08-12 Thread Shivaram Venkataraman
Hi About ten years ago, I created the original SparkR package as part of my research at UC Berkeley [SPARK-5654 ]. After my PhD I started as a professor at UW-Madison and my contributions to SparkR have been in the background given my availability.

Re: [VOTE] Update the committer guidelines to clarify when to commit changes.

2020-07-31 Thread Shivaram Venkataraman
+1 Thanks Shivaram On Thu, Jul 30, 2020 at 11:56 PM Wenchen Fan wrote: > > +1, thanks for driving it, Holden! > > On Fri, Jul 31, 2020 at 10:24 AM Holden Karau wrote: >> >> +1 from myself :) >> >> On Thu, Jul 30, 2020 at 2:53 PM Jungtaek Lim >> wrote: >>> >>> +1 (non-binding, I guess) >>> >>>

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-14 Thread Shivaram Venkataraman
Hi all Just wanted to check if there are any blockers that we are still waiting for to start the new release process. Thanks Shivaram On Sun, Jul 5, 2020, 06:51 wuyi wrote: > Ok, after having another look, I think it only affects local cluster deploy > mode, which is for testing only. > > > wu

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-01 Thread Shivaram Venkataraman
;> https://issues.apache.org/jira/browse/SPARK-32136 >> >> >> >> Thanks, >> >> Jason. >> >> >> >> From: Jungtaek Lim >> Date: Wednesday, 1 July 2020 at 10:20 am >> To: Shivaram Venkataraman >> Cc: Prashant Shar

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Shivaram Venkataraman
quot;;"Jungtaek Lim"< >> kabhwan.opensou...@gmail.com>;"Jules Damji";"Holden >> Karau";"Reynold Xin";"Shivaram >> Venkataraman";"Yuanjian Li"< >> xyliyuanj...@gmail.com>;"Spark dev list";"

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Shivaram Venkataraman
+1 Thanks Yuanjian -- I think it'll be great to have a 3.0.1 release soon. Shivaram On Tue, Jun 23, 2020 at 3:43 AM Takeshi Yamamuro wrote: > > Thanks for the heads-up, Yuanjian! > > > I also noticed branch-3.0 already has 39 commits after Spark 3.0.0. > wow, the updates are so quick. Anyway, +1

Re: SparkR latest API docs missing?

2019-05-08 Thread Shivaram Venkataraman
May 8, 2019 at 11:27 AM Shivaram Venkataraman wrote: > > Actually I found this while I was uploading the latest release to CRAN > -- these docs should be generated as a part of the release process > though and shouldn't be related to CRAN. > > On Wed, May 8, 2019 at 11:24 AM

Re: SparkR latest API docs missing?

2019-05-08 Thread Shivaram Venkataraman
due to the > additional CRAN processes. > > On Wed, May 8, 2019 at 11:23 AM Shivaram Venkataraman > wrote: > > > > I just noticed that the SparkR API docs are missing at > > https://spark.apache.org/docs/latest/api/R/index.html --- It looks > > like they were

SparkR latest API docs missing?

2019-05-08 Thread Shivaram Venkataraman
I just noticed that the SparkR API docs are missing at https://spark.apache.org/docs/latest/api/R/index.html --- It looks like they were missing from the 2.4.3 release? Thanks Shivaram - To unsubscribe e-mail: dev-unsubscr...@spa

Fwd: CRAN submission SparkR 2.3.3

2019-02-24 Thread Shivaram Venkataraman
have not been too many changes since 2.3.3, how much effort would it be to cut a 2.3.4 with just this change. Thanks Shivaram -- Forwarded message - From: Uwe Ligges Date: Sun, Feb 17, 2019 at 12:28 PM Subject: Re: CRAN submission SparkR 2.3.3 To: Shivaram Venkataraman , CRAN

Re: Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Shivaram Venkataraman
Those speedups look awesome! Great work Hyukjin! Thanks Shivaram On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon wrote: > > Guys, as continuation of Arrow optimization for R DataFrame to Spark > DataFrame, > > I am trying to make a vectorized gapply[Collect] implementation as an > experiment like

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Shivaram Venkataraman
Thanks Hyukjin! Very cool results Shivaram On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung wrote: > > Very cool! > > > > From: Hyukjin Kwon > Sent: Thursday, November 8, 2018 10:29 AM > To: dev > Subject: Arrow optimization in conversion from R DataFrame to Spark Da

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-07 Thread Shivaram Venkataraman
> From: Sean Owen > Sent: Tuesday, November 6, 2018 10:51 AM > To: Shivaram Venkataraman > Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0 > > I think the second option, to skip the tests, is best right now, if

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Shivaram Venkataraman
the release of 2.4.0 > > > > From: Wenchen Fan > Sent: Tuesday, November 6, 2018 8:51 AM > To: Felix Cheung > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0 > > Do yo

Re: Removing non-deprecated R methods that were deprecated in Python, Scala?

2018-11-06 Thread Shivaram Venkataraman
Yep. That sounds good to me. On Tue, Nov 6, 2018 at 11:06 AM Sean Owen wrote: > > Sounds good, remove in 3.1? I can update accordingly. > > On Tue, Nov 6, 2018, 10:46 AM Reynold Xin > >> Maybe deprecate and remove in next version? It is bad to just remove a >> method without deprecation notice. >

Re: [R] discuss: removing lint-r checks for old branches

2018-08-10 Thread Shivaram Venkataraman
Sounds good to me as well. Thanks Shane. Shivaram On Fri, Aug 10, 2018 at 1:40 PM Reynold Xin wrote: > > SGTM > > On Fri, Aug 10, 2018 at 1:39 PM shane knapp wrote: >> >> https://issues.apache.org/jira/browse/SPARK-25089 >> >> basically since these branches are old, and there will be a greater t

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.2.2

2018-07-09 Thread Shivaram Venkataraman
t; Tom > > On Monday, July 9, 2018, 4:50:18 PM CDT, Shivaram Venkataraman > wrote: > > > Yes. I think Felix checked in a fix to ignore tests run on java > versions that are not Java 8 (I think the fix was in > https://github.com/apache/spark/pull/21666 which is in 2.3.2) &

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.2.2

2018-07-09 Thread Shivaram Venkataraman
Java 9. Spark doesn't > support that. Is there any way to tell CRAN this should not be tested? > > On Mon, Jul 9, 2018, 4:17 PM Shivaram Venkataraman > wrote: >> >> The upcoming 2.2.2 release was submitted to CRAN. I think there are >> some knows issues o

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.2.2

2018-07-09 Thread Shivaram Venkataraman
o-check service Flavor: r-devel-linux-x86_64-debian-gcc, r-devel-windows-ix86+x86_64 Check: CRAN incoming feasibility, Result: WARNING Maintainer: 'Shivaram Venkataraman ' New submission Package was archived on CRAN Insufficient package version (submitted: 2.2.2, existing: 2.3.0)

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1

2018-06-12 Thread Shivaram Venkataraman
looks like Oracle JDK? > > ____ > From: Shivaram Venkataraman > Sent: Tuesday, June 12, 2018 3:17:52 PM > To: dev > Cc: Felix Cheung > Subject: Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1 > > Corresponding to the Spark 2.3.1 release, I submitted the SparkR build

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1

2018-06-12 Thread Shivaram Venkataraman
debian-gcc, r-devel-windows-ix86+x86_64 Check: CRAN incoming feasibility, Result: NOTE Maintainer: 'Shivaram Venkataraman ' New submission Package was archived on CRAN Possibly mis-spelled words in DESCRIPTION: Frontend (4:10, 5:28) CRAN repository db overrides: X-CR

Re: [VOTE] SPIP ML Pipelines in R

2018-05-31 Thread Shivaram Venkataraman
Hossein -- Can you clarify what the resolution on the repository / release issue discussed on SPIP ? Shivaram On Thu, May 31, 2018 at 9:06 AM, Felix Cheung wrote: > +1 > With my concerns in the SPIP discussion. > > > From: Hossein > Sent: Wednesday, May 30, 2018

Re: SparkR was removed from CRAN on 2018-05-01

2018-05-29 Thread Shivaram Venkataraman
es/1851 > > On Tue, May 29, 2018 at 1:52 PM, Shivaram Venkataraman > wrote: >> >> Yes. That is correct >> >> Shivaram >> >> On Tue, May 29, 2018 at 11:48 AM, Hossein wrote: >> > I guess this relates to our conversation on the SPIP. When this

Re: SparkR was removed from CRAN on 2018-05-01

2018-05-29 Thread Shivaram Venkataraman
Yes. That is correct Shivaram On Tue, May 29, 2018 at 11:48 AM, Hossein wrote: > I guess this relates to our conversation on the SPIP. When this happens, do > we wait for a new minor release to submit it to CRAN again? > > --Hossein > > On Fri, May 25, 2018 at 5:11 PM, Felix Cheung > wrote: >>

Re: Time for 2.3.1?

2018-05-13 Thread Shivaram Venkataraman
+1 We had a SparkR fix for CRAN SystemRequirements that will also be good to get out. Shivaram On Fri, May 11, 2018 at 12:34 PM, Henry Robinson wrote: > https://github.com/apache/spark/pull/21302 > > On 11 May 2018 at 11:47, Henry Robinson wrote: > >> I was planning to do so shortly. >> >> Hen

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Shivaram Venkataraman
> > > >>- Fault tolerance and execution model: Spark assumes fine-grained >>task recovery, i.e. if something fails, only that task is rerun. This >>doesn’t match the execution model of distributed ML/DL frameworks that are >>typically MPI-based, and rerunning a single task would lea

Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-06 Thread Shivaram Venkataraman
The problem with doing work in the callsite thread is that there are a number of data structures that are updated during job submission and these data structures are guarded by the event loop ensuring only one thread accesses them. I dont think there is a very easy fix for this given the structure

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Shivaram Venkataraman
nstance, it brings open the web directory > instead. > > 2) The second is the dist location we are voting on has a .iml file, which > is normally not included in release or release RC and it is unsigned and > without hash (therefore seems like it should not be in the release) > > Thanks

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Shivaram Venkataraman
FWIW The search result link works for me Shivaram On Mon, Feb 19, 2018 at 6:21 PM, Felix Cheung wrote: > These are two separate things: > > Does the search result links work for you? > > The second is the dist location we are voting on has a .iml file. > > _ > From:

Re: [RESULT][VOTE] Spark 2.2.1 (RC2)

2017-12-13 Thread Shivaram Venkataraman
is vote passes. Thanks everyone for testing this release. >>>> >>>> >>>> +1: >>>> >>>> Sean Owen (binding) >>>> >>>> Herman van Hövell tot Westerflier (binding) >>>> >>>> Wenchen Fan (binding) >>>> >>>> Shivaram Venkataraman (binding) >>>> >>>> Felix Cheung >>>> >>>> Henry Robinson >>>> >>>> Hyukjin Kwon >>>> >>>> Dongjoon Hyun >>>> >>>> Kazuaki Ishizaki >>>> >>>> Holden Karau >>>> >>>> Weichen Xu >>>> >>>> >>>> 0: None >>>> >>>> -1: None >>>> >>>

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Shivaram Venkataraman
+1 SHA, MD5 and signatures look fine. Built and ran Maven tests on my Macbook. Thanks Shivaram On Wed, Nov 29, 2017 at 10:43 AM, Holden Karau wrote: > +1 (non-binding) > > PySpark install into a virtualenv works, PKG-INFO looks correctly > populated (mostly checking for the pypandoc conversion

SparkR is now available on CRAN

2017-10-12 Thread Shivaram Venkataraman
Hi all I'm happy to announce that the most recent release of Spark, 2.1.2 is now available for download as an R package from CRAN at https://cran.r-project.org/web/packages/SparkR/ . This makes it easy to get started with SparkR for new R users and the package includes code to download the corresp

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Shivaram Venkataraman
rporating something that is not completely > trusted or approved into the process of building something that we are then > going to approve as trusted is different from the prior use of cloudfront. > > On Wed, Sep 13, 2017 at 10:26 AM, Shivaram Venkataraman > wrote: >> >>

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Shivaram Venkataraman
The bucket comes from Cloudfront, a CDN thats part of AWS. There was a bunch of discussion about this back in 2013 https://lists.apache.org/thread.html/9a72ff7ce913dd85a6b112b1b2de536dcda74b28b050f70646aba0ac@1380147885@%3Cdev.spark.apache.org%3E Shivaram On Wed, Sep 13, 2017 at 9:30 AM, Sean Owe

Submitting SparkR to CRAN

2017-05-09 Thread Shivaram Venkataraman
Closely related to the PyPi upload thread (https://s.apache.org/WLtM), I just wanted to give a heads up that we are working on submitting SparkR from Spark 2.1.1 as a package to CRAN. The package submission is under review with CRAN right now and I will post updates to this thread. The main ticket

Re: Build completed: spark 866-master

2017-03-04 Thread Shivaram Venkataraman
;> spark/settings >> >> I'd like to note that I disabled the notification in the appveyor.yml but >> it seems the configurations are merged in Web UI, >> according to the documentation (https://www.appveyor.com/ >> docs/notifications/#global-email-notifica

Fwd: Build completed: spark 866-master

2017-03-04 Thread Shivaram Venkataraman
I'm not sure why the AppVeyor updates are coming to the dev list. Hyukjin -- Do you know if we made any recent changes that might have caused this ? Thanks Shivaram -- Forwarded message -- From: AppVeyor Date: Sat, Mar 4, 2017 at 2:46 PM Subject: Build completed: spark 866-maste

Re: Can anyone edit JIRAs SPARK-19191 to SPARK-19202?

2017-01-13 Thread Shivaram Venkataraman
FWIW there is an option to Delete the issue (in More -> Delete). Shivaram On Fri, Jan 13, 2017 at 8:11 AM, Shivaram Venkataraman wrote: > I can't see the resolve button either - Maybe we can forward this to > Apache Infra and see if they can close these issues ? > > Shivara

Re: Can anyone edit JIRAs SPARK-19191 to SPARK-19202?

2017-01-13 Thread Shivaram Venkataraman
I can't see the resolve button either - Maybe we can forward this to Apache Infra and see if they can close these issues ? Shivaram On Fri, Jan 13, 2017 at 6:35 AM, Sean Owen wrote: > Yes, I'm asking about a specific range: 19191 - 19202. These seem to be the > ones created during the downtime.

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-15 Thread Shivaram Venkataraman
In addition to usual binary artifacts, this is the first release where we have installable packages for Python [1] and R [2] that are part of the release. I'm including instructions to test the R package below. Holden / other Python developers can chime in if there are special instructions to test

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-08 Thread Shivaram Venkataraman
+0 I am not sure how much of a problem this is but the pip packaging seems to have changed the size of the hadoop-2.7 artifact. As you can see in http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-bin/, the Hadoop 2.7 build is 359M almost double the size of the other Hadoop versions

Re: [ANNOUNCE] Apache Spark 2.0.2

2016-11-14 Thread Shivaram Venkataraman
FWIW 2.0.1 is also used in the 'Link With Spark' and 'Spark Source Code Management' sections in that page. Shivaram On Mon, Nov 14, 2016 at 11:11 PM, Reynold Xin wrote: > It's on there on the page (both the release notes and the download version > dropdown). > > The one line text is outdated. I'

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-14 Thread Shivaram Venkataraman
The release is available on http://www.apache.org/dist/spark/ and its on Maven central http://repo1.maven.org/maven2/org/apache/spark/spark-core_2.11/2.0.2/ I guess Reynold hasn't yet put together the release notes / updates to the website. Thanks Shivaram On Mon, Nov 14, 2016 at 12:49 PM, Nicho

Re: statistics collection and propagation for cost-based optimizer

2016-11-14 Thread Shivaram Venkataraman
Do we have any query workloads for which we can benchmark these proposals in terms of performance ? Thanks Shivaram On Sun, Nov 13, 2016 at 5:53 PM, Reynold Xin wrote: > One additional note: in terms of size, the size of a count-min sketch with > eps = 0.1% and confidence 0.87, uncompressed, is

Re: StructuredStreaming status

2016-10-19 Thread Shivaram Venkataraman
At the AMPLab we've been working on a research project that looks at just the scheduling latencies and on techniques to get lower scheduling latency. It moves away from the micro-batch model, but reuses the fault tolerance etc. in Spark. However we haven't yet figure out all the parts in integratin

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Shivaram Venkataraman
+1 - Given that our website is now on github (https://github.com/apache/spark-website), I think we can move most of our wiki into the main website. That way we'll only have two sources of documentation to maintain: A release specific one in the main repo and the website which is more long lived. T

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-13 Thread Shivaram Venkataraman
isolating specific changes that are required etc. It'd also be great to hear other approaches / next steps to concretize some of these goals. Thanks Shivaram On Thu, Oct 13, 2016 at 8:39 AM, Fred Reiss wrote: > On Tue, Oct 11, 2016 at 11:02 AM, Shivaram Venkataraman > wrote: >> >>

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-11 Thread Shivaram Venkataraman
Thanks Fred - that is very helpful. > Delivering low latency, high throughput, and stability simultaneously: Right > now, our own tests indicate you can get at most two of these characteristics > out of Spark Streaming at the same time. I know of two parties that have > abandoned Spark Streaming b

Re: [ANNOUNCE] Announcing Spark 2.0.1

2016-10-05 Thread Shivaram Venkataraman
Yeah I see the apache maven repos have the 2.0.1 artifacts at https://repository.apache.org/content/repositories/releases/org/apache/spark/spark-core_2.11/ -- Not sure why they haven't synced to maven central yet Shivaram On Wed, Oct 5, 2016 at 8:37 PM, Luciano Resende wrote: > It usually don't

Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Shivaram Venkataraman
+1 I think having a 4 month window instead of a 3 month window sounds good. However I think figuring out a timeline for maintenance releases would also be good. This is a common concern that comes up in many user threads and it'll be better to have some structure around this. It doesn't need to be

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-09-26 Thread Shivaram Venkataraman
Disclaimer - I am not very closely involved with Structured Streaming design / development, so this is just my two cents from looking at the discussion in the linked JIRAs and PRs. It seems to me there are a couple of issues being conflated here: (a) is the question of how to specify or add more f

Re: R docs no longer building for branch-2.0

2016-09-22 Thread Shivaram Venkataraman
I looked into this and found the problem. Will send a PR now to fix this. If you are curious about what is happening here: When we build the docs separately we don't have the JAR files from the Spark build in the same tree. We added a new set of docs recently in SparkR called an R vignette that ru

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Shivaram Venkataraman
>> versions, I think it'd be fine. >> >> One concern is, I am not sure if SparkR tests can pass on branch-1.6 (I >> checked it passes on branch-2.0 before). >> >> I can try to check if it passes and identify the related causes if it >> does not pass.

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Shivaram Venkataraman
e of the account :). > > > On 10 Sep 2016 12:41 a.m., "Shivaram Venkataraman" > wrote: >> >> Thanks for debugging - I'll reply on >> https://issues.apache.org/jira/browse/INFRA-12590 and ask for this >> change. >> >> FYI I don&#

Re: Change the settings in AppVeyor to prevent triggering the tests in other PRs in other branches

2016-09-09 Thread Shivaram Venkataraman
Thanks for debugging - I'll reply on https://issues.apache.org/jira/browse/INFRA-12590 and ask for this change. FYI I don't any of the committers have access to the appveyor account which is at https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark . To request changes that need to be don

Re: Discuss SparkR executors/workers support virtualenv

2016-09-07 Thread Shivaram Venkataraman
I think this makes sense -- making it easier to use additional R packages would be a good feature. I am not sure we need Packrat for this use case though. Lets continue discussion on the JIRA at https://issues.apache.org/jira/browse/SPARK-17428 Thanks Shivaram On Tue, Sep 6, 2016 at 11:36 PM, Yan

Re: sparkR array type not supported

2016-09-02 Thread Shivaram Venkataraman
I think it needs a type for the elements in the array. For example f <- structField("x", "array") Thanks Shivaram On Fri, Sep 2, 2016 at 8:26 AM, Paul R wrote: > Hi there, > > I’ve noticed the following command in sparkR > field = structField(“x”, “array”) > > Throws this error > Erro

Re: KMeans calls takeSample() twice?

2016-08-30 Thread Shivaram Venkataraman
I think takeSample itself runs multiple jobs if the amount of samples collected in the first pass is not enough. The comment and code path at https://github.com/apache/spark/blob/412b0e8969215411b97efd3d0984dc6cac5d31e0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L508 should explain when th

Re: Spark R - Loading Third Party R Library in YARN Executors

2016-08-17 Thread Shivaram Venkataraman
I think you can also pass in a zip file using the --files option (http://spark.apache.org/docs/latest/running-on-yarn.html has some examples). The files should then be present in the current working directory of the driver R process. Thanks Shivaram On Wed, Aug 17, 2016 at 4:16 AM, Felix Cheung

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Shivaram Venkataraman
+1 SHA and MD5 sums match for all binaries. Docs look fine this time around. Built and ran `dev/run-tests` with Java 7 on a linux machine. No blocker bugs on JIRA and the only critical bug with target as 2.0.0 is SPARK-16633, which doesn't look like a release blocker. I also checked issues which

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-15 Thread Shivaram Venkataraman
Hashes, sigs match. I built and ran tests with Hadoop 2.3 ("-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver"). I couldn't get the following tests to pass but I think it might be something specific to my setup as Jenkins on branch-2.0 seems quite stable. [error] Failed tests: [error] o

Re: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-14 Thread Shivaram Venkataraman
I think the docs build was broken because of https://issues.apache.org/jira/browse/SPARK-16553 - A fix has been merged and we are testing it now Shivaram On Thu, Jul 14, 2016 at 1:56 PM, Matthias Niehoff wrote: > Some of the programming guides in the docs only give me blank page (Spark > program

Re: Call to new JObject sometimes returns an empty R environment

2016-07-05 Thread Shivaram Venkataraman
-sparkr-dev@googlegroups +dev@spark.apache.org [Please send SparkR development questions to the Spark user / dev mailing lists. Replies inline] > From: > Date: Tue, Jul 5, 2016 at 3:30 AM > Subject: Call to new JObject sometimes returns an empty R environment > To: SparkR Developers > > > > H

Re: spark-ec2 scripts with spark-2.0.0-preview

2016-06-14 Thread Shivaram Venkataraman
Can you open an issue on https://github.com/amplab/spark-ec2 ? I think we should be able to escape the version string and pass the 2.0.0-preview through the scripts Shivaram On Tue, Jun 14, 2016 at 12:07 PM, Sunil Kumar wrote: > Hi, > > The spark-ec2 scripts are missing from spark-2.0.0-preview

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-07 Thread Shivaram Venkataraman
As far as I know the process is just to copy docs/_site from the build to the appropriate location in the SVN repo (i.e. site/docs/2.0.0-preview). Thanks Shivaram On Tue, Jun 7, 2016 at 8:14 AM, Sean Owen wrote: > As a stop-gap, I can edit that page to have a small section about > preview releas

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

2016-05-12 Thread Shivaram Venkataraman
On Thu, May 12, 2016 at 2:29 PM, Reynold Xin wrote: > We currently have three levels of interface annotation: > > - unannotated: stable public API > - DeveloperApi: A lower-level, unstable API intended for developers. > - Experimental: An experimental user-facing API. > > > After using this annota

Re: SparkR unit test failures on local master

2016-04-28 Thread Shivaram Venkataraman
I just ran the tests using a recently synced master branch and the tests seemed to work fine. My guess is some of the Java classes changed and you need to rebuild Spark ? Thanks Shivaram On Thu, Apr 28, 2016 at 1:19 PM, Gayathri Murali wrote: > Hi All, > > I am running the sparkR unit test(./R/r

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Shivaram Venkataraman
Overall this sounds good to me. One question I have is that in addition to the ML algorithms we have a number of linear algebra (various distributed matrices) and statistical methods in the spark.mllib package. Is the plan to port or move these to the spark.ml namespace in the 2.x series ? Thanks

Re: Are we running SparkR tests in Jenkins?

2016-01-15 Thread Shivaram Venkataraman
Ah I see. I wasn't aware of that PR. We should do a find and replace in all the documentation and rest of the repository as well. Shivaram On Fri, Jan 15, 2016 at 3:20 PM, Reynold Xin wrote: > +Shivaram > > Ah damn - we should fix it. > > This was broken by https://github.com/apache/spark/pull/1

Re: Are we running SparkR tests in Jenkins?

2016-01-15 Thread Shivaram Venkataraman
Yes - we should be running R tests AFAIK. That error message is a deprecation warning about the script `bin/sparkR` which needs to be changed in https://github.com/apache/spark/blob/7cd7f2202547224593517b392f56e49e4c94cabc/R/run-tests.sh#L26 to bin/spark-submit. Thanks Shivaram On Fri, Jan 15, 2

Re: [SparkR] Any reason why saveDF's mode is append by default ?

2015-12-14 Thread Shivaram Venkataraman
I think its just a bug -- I think we originally followed the Python API (in the original PR [1]) but the Python API seems to have been changed to match Scala / Java in https://issues.apache.org/jira/browse/SPARK-6366 Feel free to open a JIRA / PR for this. Thanks Shivaram [1] https://github.com/

Re: Specifying Scala types when calling methods from SparkR

2015-12-09 Thread Shivaram Venkataraman
The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do som

Re: How to add 1.5.2 support to ec2/spark_ec2.py ?

2015-12-01 Thread Shivaram Venkataraman
t;1.5.1" > > On Tue, Dec 1, 2015 at 12:22 AM, Shivaram Venkataraman > wrote: >> >> Yeah we just need to add 1.5.2 as in >> >> https://github.com/apache/spark/commit/97956669053646f00131073358e53b05d0c3d5d0#diff-ada66bbeb2f1327b508232ef6c3805a5 >> to th

Re: How to add 1.5.2 support to ec2/spark_ec2.py ?

2015-12-01 Thread Shivaram Venkataraman
Yeah we just need to add 1.5.2 as in https://github.com/apache/spark/commit/97956669053646f00131073358e53b05d0c3d5d0#diff-ada66bbeb2f1327b508232ef6c3805a5 to the master branch as well Thanks Shivaram On Mon, Nov 30, 2015 at 11:38 PM, Alexander Pivovarov wrote: > just want to follow up > > On N

Re: A proposal for Spark 2.0

2015-11-10 Thread Shivaram Venkataraman
+1 On a related note I think making it lightweight will ensure that we stay on the current release schedule and don't unnecessarily delay 2.0 to wait for new features / big architectural changes. In terms of fixes to 1.x, I think our current policy of back-porting fixes to older releases would st

Re: Recommended change to core-site.xml template

2015-11-05 Thread Shivaram Venkataraman
Thanks for investigating this. The right place to add these is the core-site.xml template we have at https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/spark/conf/core-site.xml and/or https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/ephemeral-hdfs/conf/core-site.x

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
gt; > Thanks for sharing that tip. Looks like you can also use as_json (vs. > asjson). > > Nick > > > On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman > wrote: >> >> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas >> wrote: >> > OK, I’ll focus

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
which has a 'preferred' field set to the closest mirror. Shivaram > Nick > > > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman > wrote: >> >> I think that getting them from the ASF mirrors is a better strategy in >> general as it'll remove the o

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
I think that getting them from the ASF mirrors is a better strategy in general as it'll remove the overhead of keeping the S3 bucket up to date. It works in the spark-ec2 case because we only support a limited number of Hadoop versions from the tool. FWIW I don't have write access to the bucket and

Re: SparkR package path

2015-09-24 Thread Shivaram Venkataraman
sh to talk to it from RStudio. While >> that is a bigger task, for now, first step could be not requiring them to >> download Spark source and run a script that is named install-dev.sh. I filed >> SPARK-10776 to track this. >> >> >> >> >> --Hossein >

Re: SparkR package path

2015-09-22 Thread Shivaram Venkataraman
As Rui says it would be good to understand the use case we want to support (supporting CRAN installs could be one for example). I don't think it should be very hard to do as the RBackend itself doesn't use the R source files. The RRDD does use it and the value comes from https://github.com/apache/s

Re: SparkR streaming source code

2015-09-16 Thread Shivaram Venkataraman
I think Hao posted a link to the source code in the description of https://issues.apache.org/jira/browse/SPARK-6803 On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin wrote: > You should reach out to the speakers directly. > > > On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong wrote: >> >> SparkR streami

Re: SparkR driver side JNI

2015-09-11 Thread Shivaram Venkataraman
k-submit and setting them with SparkConf to R diver's in-process > JVM through JNI? > > On Thu, Sep 10, 2015 at 9:29 PM, Shivaram Venkataraman > wrote: >> >> Yeah in addition to the downside of having 2 JVMs the command line >> arguments and SparkConf etc. will b

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-28 Thread Shivaram Venkataraman
I've seen similar tar file warnings and in my case it was because I was using the default tar on a Macbook. Using gnu-tar from brew made the warnings go away. Thanks Shivaram On Fri, Aug 28, 2015 at 2:37 PM, Luciano Resende wrote: > The binary archives seems to be having some issues, which seems

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

2015-08-20 Thread Shivaram Venkataraman
FYI The staging repository published as version 1.5.0 is at https://repository.apache.org/content/repositories/orgapachespark-1136 while the staging repository published as version 1.5.0-rc1 is at https://repository.apache.org/content/repositories/orgapachespark-1137 Thanks Shivaram On Thu, Aug

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-17 Thread Shivaram Venkataraman
his, let me know if you need help > > 2015-08-16 23:38 GMT+02:00 Shivaram Venkataraman > : >> >> I just investigated this and this is happening because of a Maven >> version requirement not being met. I'll look at modifying the build >> scripts t

Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-16 Thread Shivaram Venkataraman
I just investigated this and this is happening because of a Maven version requirement not being met. I'll look at modifying the build scripts to use Maven 3.3.3 (with build/mvn --force ?) Shivaram On Sun, Aug 16, 2015 at 10:16 AM, Olivier Girardot wrote: > Hi Patrick, > is there any way for the

Re: SparkR DataFrame fail to return data of Decimal type

2015-08-14 Thread Shivaram Venkataraman
Thanks for the catch. Could you send a PR with this diff ? On Fri, Aug 14, 2015 at 10:30 AM, Shkurenko, Alex wrote: > Got an issue similar to https://issues.apache.org/jira/browse/SPARK-8897, > but with the Decimal datatype coming from a Postgres DB: > > //Set up SparkR > >>Sys.setenv(SPARK_HOME=

Re: SparkR driver side JNI

2015-08-06 Thread Shivaram Venkataraman
The in-process JNI only works out when the R process comes up first and we launch a JVM inside it. In many deploy modes like YARN (or actually in anything using spark-submit) the JVM comes up first and we launch R after that. Using an inter-process solution helps us cover both use cases Thanks Shi

Re: Why SparkR didn't reuse PythonRDD

2015-08-06 Thread Shivaram Venkataraman
PythonRDD.scala has a number of PySpark specific conventions (for example worker reuse, exceptions etc.) and PySpark specific protocols (e.g. for communicating accumulators, broadcasts between the JVM and Python etc.). While it might be possible to refactor the two classes to share some more code I

Re: Should spark-ec2 get its own repo?

2015-08-03 Thread Shivaram Venkataraman
I sent a note to the Mesos developers and created https://github.com/apache/spark/pull/7899 to change the repository pointer. There are 3-4 open PRs right now in the mesos/spark-ec2 repository and I'll work on migrating them to amplab/spark-ec2 later today. My thoughts on moving the python script

Moving spark-ec2 to amplab github organization

2015-08-03 Thread Shivaram Venkataraman
Hi Mesos developers The Apache Spark project has been hosting using https://github.com/mesos/spark-ec2 as a supporting repository for some of our EC2 scripts. This is a remnant from the days when the Spark project itself was hosted at github.com/mesos/spark. Based on discussions in the Spark Devel

Re: Should spark-ec2 get its own repo?

2015-07-31 Thread Shivaram Venkataraman
ike something that would be > good to do before 1.5.0, if it's going to happen soon. > > On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman > wrote: > > Yeah I'll send a note to the mesos dev list just to make sure they are > > informed. > > > > Shivaram &g

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Shivaram Venkataraman
f I am not wrong, since the code was hosted within mesos project > > repo, I assume (atleast part of it) is owned by mesos project and so > > its PMC ? > > > > - Mridul > > > > On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman > > wrote: > >>

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Shivaram Venkataraman
, Mridul Muralidharan wrote: > If I am not wrong, since the code was hosted within mesos project > repo, I assume (atleast part of it) is owned by mesos project and so > its PMC ? > > - Mridul > > On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman > wrote: > > There

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Shivaram Venkataraman
; prevent future issues with apache. > > Regards, > Mridul > > On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman > wrote: > > I've created https://github.com/amplab/spark-ec2 and added an initial > set of > > committers. Note that this is not a fork of the

Re: Should spark-ec2 get its own repo?

2015-07-20 Thread Shivaram Venkataraman
, Reynold Xin wrote: > Is amplab the right owner, given its ending next year? Maybe we should > create spark-ec2, or spark-project instead? > > > On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> I've created h

  1   2   >