from:"Jakob Odersky"

Re: Can I add a new method to RDD class?

2016-12-06 Thread Jakob Odersky

thods in. > > How can I specify a custom version? modify version numbers in all the > pom.xml file? > > > > On Dec 5, 2016, at 9:12 PM, Jakob Odersky wrote: > > m rdds in an "org.apache.spark" package as well > > -

Re: Can I add a new method to RDD class?

2016-12-05 Thread Jakob Odersky

It looks like you're having issues with including your custom spark version (with the extensions) in your test project. To use your local spark version: 1) make sure it has a custom version (let's call it 2.1.0-CUSTOM) 2) publish it to your local machine with `sbt publishLocal` 3) include the modif

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-04 Thread Jakob Odersky

Hi everyone, is there any ongoing discussion/documentation on the redesign of sinks? I think it could be a good thing to abstract away the underlying streaming model, however that isn't directly related to Holden's first point. The way I understand it, is to slightly change the DataStreamWriter AP

Re: Running Spark master/slave instances in non Daemon mode

2016-10-03 Thread Jakob Odersky

d and binds to the output fds from that process, so daemonizing is > causing us minor hardship and seems like an easy thing to make optional. > We'd be happy to make the PR as well. > > --Mike > > On Thu, Sep 29, 2016 at 5:25 PM, Jakob Odersky wrote: >> >> I'm c

Re: java.util.NoSuchElementException when serializing Map with default value

2016-10-03 Thread Jakob Odersky

Hi Kabeer, which version of Spark are you using? I can't reproduce the error in latest Spark master. regards, --Jakob - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Running Spark master/slave instances in non Daemon mode

2016-09-29 Thread Jakob Odersky

I'm curious, what kind of container solutions require foreground processes? Most init systems work fine with "starter" processes that run other processes. IIRC systemd and start-stop-daemon have an option called "fork", that will expect the main process to run another one in the background and only

Re: java.util.NoSuchElementException when serializing Map with default value

2016-09-28 Thread Jakob Odersky

I agree with Sean's answer, you can check out the relevant serializer here https://github.com/twitter/chill/blob/develop/chill-scala/src/main/scala/com/twitter/chill/Traversable.scala On Wed, Sep 28, 2016 at 3:11 AM, Sean Owen wrote: > My guess is that Kryo specially handles Maps generically or

Re: What's the use of RangePartitioner.hashCode

2016-09-22 Thread Jakob Odersky

Hash codes should try to avoid collisions of objects that are not equal. Integer overflowing is not an issue by itself On Wed, Sep 21, 2016 at 10:49 PM, WangJianfei wrote: > Than you very much sir! but what i want to know is whether the hashcode > overflow will make a trouble. thank you! > > > >

Re: What's the use of RangePartitioner.hashCode

2016-09-21 Thread Jakob Odersky

b.hashCode when > a.equals(b), the bidirectional case is usually harder to satisfy due to > possibility of collisions. > > Good info: > http://www.programcreek.com/2011/07/java-equals-and-hashcode-contract/ > _____ > From: Jakob Odersky > Sent: Wedne

Re: What's the use of RangePartitioner.hashCode

2016-09-21 Thread Jakob Odersky

Hi, It is used jointly with a custom implementation of the `equals` method. In Scala, you can override the `equals` method to change the behaviour of `==` comparison. On example of this would be to compare classes based on their parameter values (i.e. what case classes do). Partitioners aren't case

Re: java.lang.NoClassDefFoundError, is this a bug?

2016-09-21 Thread Jakob Odersky

Hi Xiang, this error also appears in client mode (maybe the situation that you were referring to and that worked was local mode?), however the error is expected and is not a bug. this line in your snippet: object Main extends A[String] { //... is, after desugaring, equivalent to: object M

Re: Test fails when compiling spark with tests

2016-09-13 Thread Jakob Odersky

There are some flaky tests that occasionally fail, my first recommendation would be to re-run the test suite. Another thing to check is if there are any applications listening to spark's default ports. Btw, what is your environment like? In case it is windows, I don't think tests are regularly run

Re: @scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-08 Thread Jakob Odersky

+1 to Sean's answer, importing varargs. In this case the _root_ is also unnecessary (it would be required in case you were using it in a nested package called "scala" itself) On Thu, Sep 8, 2016 at 9:27 AM, Sean Owen wrote: > I think the @_root_ version is redundant because > @scala.annotation.va

Re: help getting started

2016-09-02 Thread Jakob Odersky

Hi Dayne, you can look at this page for some starter issues: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened). Also check out this guide on how to contribute to Spark https://cwiki.apa

Re: spark-packages with maven

2016-07-19 Thread Jakob Odersky

Luciano, afaik the spark-package-tool also makes it easy to upload packages to spark-packages website. You are of course free to include any maven coordinate in the --packages parameter --jakob On Fri, Jul 15, 2016 at 1:42 PM, Ismaël Mejía wrote: > Thanks for the info Burak, I will check the rep

Re: SBT doesn't pick resource file after clean

2016-05-20 Thread Jakob Odersky

> > However, even on generating the file under the default resourceDirectory => > core/src/resources doesn't pick the file in jar after doing a clean. So this > seems to be a different issue. > > > > > > On Thu, May 19, 2016 at 4:17 PM, Jakob Odersky wrote: >

Re: SBT doesn't pick resource file after clean

2016-05-19 Thread Jakob Odersky

To echo my comment on the PR: I think the "sbt way" to add extra, generated resources to the classpath is by adding a new task to the `resourceGenerators` setting. Also, the task should output any files into the directory specified by the `resourceManaged` setting. See http://www.scala-sbt.org/0.13

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Jakob Odersky

I just found out how the hash is calculated: gpg --print-md sha512 .tgz you can use that to check if the resulting output matches the contents of .tgz.sha On Mon, Apr 4, 2016 at 3:19 PM, Jakob Odersky wrote: > The published hash is a SHA512. > > You can verify the integrity of the pa

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Jakob Odersky

o do with us moving release-build.sh? >>> >>> >>> On Mon, Mar 21, 2016 at 1:43 PM Nicholas Chammas >>> wrote: >>>> >>>> Is someone going to retry fixing these packages? It's still a problem. >>>> >>>> Also, i

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Jakob Odersky

I mean from the perspective of someone developing Spark, it makes things more complicated. It's just my point of view, people that actually support Spark deployments may have a different opinion ;) On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky wrote: > You can, but since it's

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Jakob Odersky

You can, but since it's going to be a maintainability issue I would argue it is in fact a problem. On Thu, Mar 24, 2016 at 2:34 PM, Marcelo Vanzin wrote: > Hi Jakob, > > On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky wrote: >> Reynold's 3rd point is particularly strong

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Jakob Odersky

Reynold's 3rd point is particularly strong in my opinion. Supporting Scala 2.12 will require Java 8 anyway, and introducing such a change is probably best done in a major release. Consider what would happen if Spark 2.0 doesn't require Java 8 and hence not support Scala 2.12. Will it be stuck on an

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-18 Thread Jakob Odersky

I just experienced the issue, however retrying the download a second time worked. Could it be that there is some load balancer/cache in front of the archive and some nodes still serve the corrupt packages? On Fri, Mar 18, 2016 at 8:00 AM, Nicholas Chammas wrote: > I'm seeing the same. :( > > On F

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-18 Thread Jakob Odersky

ried the Spark 1.6.1 / Hadoop 2.6 download and got a corrupt ZIP > file. > > Jakob, are you sure the ZIP unpacks correctly for you? Is it the same Spark > 1.6.1/Hadoop 2.6 package you had a success with? > > On Fri, Mar 18, 2016 at 6:11 PM Jakob Odersky wrote: >> >>

Re: [discuss] DataFrame vs Dataset in Spark 2.0

2016-02-26 Thread Jakob Odersky

I would recommend (non-binding) option 1. Apart from the API breakage I can see only advantages, and that sole disadvantage is minimal for a few reasons: 1. the DataFrame API has been "Experimental" since its implementation, so no stability was ever implied 2. considering that the change is for a

Re: Scala 2.11 default build

2016-02-01 Thread Jakob Odersky

Awesome! +1 on Steve Loughran's question, how does this affect support for 2.10? Do future contributions need to work with Scala 2.10? cheers On Mon, Feb 1, 2016 at 7:02 AM, Ted Yu wrote: > The following jobs have been established for build against Scala 2.10: > > https://amplab.cs.berkeley.edu/

Re: Spark 2.0.0 release plan

2016-01-29 Thread Jakob Odersky

I'm not an authoritative source but I think it is indeed the plan to move the default build to 2.11. See this discussion for more detail http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-Spark-2-0-td15122.html On Fri, Jan 29, 2016 at 11:43 AM, Deenar Toraskar wrote: > A re

Re: spark job scheduling

2016-01-27 Thread Jakob Odersky

Nitpick: the up-to-date version of said wiki page is https://spark.apache.org/docs/1.6.0/job-scheduling.html (not sure how much it changed though) On Wed, Jan 27, 2016 at 7:50 PM, Chayapan Khannabha wrote: > I would start at this wiki page > https://spark.apache.org/docs/1.2.0/job-scheduling.html

Mutiple spark contexts

2016-01-27 Thread Jakob Odersky

A while ago, I remember reading that multiple active Spark contexts per JVM was a possible future enhancement. I was wondering if this is still on the roadmap, what the major obstacles are and if I can be of any help in adding this feature? regards, --Jakob ---

Re: Fastest way to build Spark from scratch

2015-12-07 Thread Jakob Odersky

make-distribution and the second code snippet both create a distribution from a clean state. They therefore require that every source file be compiled and that takes time (you can maybe tweak some settings or use a newer compiler to gain some speed). I'm inferring from your question that for your

Datasets on experimental dataframes?

2015-11-23 Thread Jakob Odersky

Hi, datasets are being built upon the experimental DataFrame API, does this mean DataFrames won't be experimental in the near future? thanks, --Jakob

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Jakob Odersky

Hey Jeff, Do you mean reading from multiple text files? In that case, as a workaround, you can use the RDD#union() (or ++) method to concatenate multiple rdds. For example: val lines1 = sc.textFile("file1") val lines2 = sc.textFile("file2") val rdd = lines1 union lines2 regards, --Jakob On 11 N

Re: State of the Build

2015-11-06 Thread Jakob Odersky

> > Can you clarify which sbt jar (by path) ? > > I tried 'git log' on the following files but didn't see commit history: > > ./build/sbt-launch-0.13.7.jar > ./build/zinc-0.3.5.3/lib/sbt-interface.jar > ./sbt/sbt-launch-0.13.2.jar > ./sbt/sbt-launch-0.13.5.jar >

Re: State of the Build

2015-11-06 Thread Jakob Odersky

[Reposting to the list again, I really should double-check that reply-to-all button] in the mean-time, as a light Friday-afternoon patch I was thinking about splitting the ~600loc-single-build sbt file into something more manageable like the Akka build (without changing any dependencies or setting

Re: State of the Build

2015-11-06 Thread Jakob Odersky

ri, Nov 6, 2015 at 1:48 AM, Koert Kuipers wrote: >> > People who do upstream builds of spark (think bigtop and hadoop >> distros) are >> > used to legacy systems like maven, so maven is the default build. I >> don't >> > think it will change. >&g

State of the Build

2015-11-05 Thread Jakob Odersky

Hi everyone, in the process of learning Spark, I wanted to get an overview of the interaction between all of its sub-projects. I therefore decided to have a look at the build setup and its dependency management. Since I am alot more comfortable using sbt than maven, I decided to try to port the mav

Re: Insight into Spark Packages

2015-10-16 Thread Jakob Odersky

[repost to mailing list] I don't know much about packages, but have you heard about the sbt-spark-package plugin? Looking at the code, specifically https://github.com/databricks/sbt-spark-package/blob/master/src/main/scala/sbtsparkpackage/SparkPackagePlugin.scala, might give you insight on the det

Status of SBT Build

2015-10-14 Thread Jakob Odersky

Hi everyone, I've been having trouble building Spark with SBT recently. Scala 2.11 doesn't work and in all cases I get large amounts of warnings and even errors on tests. I was therefore wondering what the official status of spark with sbt is? Is it very new and still buggy or unmaintained and "f

Re: Spark Event Listener

2015-10-13 Thread Jakob Odersky

the path of the source file defining the event API is `core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala` On 13 October 2015 at 16:29, Jakob Odersky wrote: > Hi, > I came across the spark listener API while checking out possible UI > extensions recently. I noticed

Spark Event Listener

2015-10-13 Thread Jakob Odersky

Hi, I came across the spark listener API while checking out possible UI extensions recently. I noticed that all events inherit from a sealed trait `SparkListenerEvent` and that a SparkListener has a corresponding `onEventXXX(event)` method for every possible event. Considering that events inherit

Live UI

2015-10-12 Thread Jakob Odersky

Hi everyone, I am just getting started working on spark and was thinking of a first way to contribute whilst still trying to wrap my head around the codebase. Exploring the web UI, I noticed it is a classic request-response website, requiring manual refresh to get the latest data. I think it would

41 matches

Mail list logo