Great - well we do hope we hear from you, since the user list is for interesting success stories and anecdotes, as well as blog posts etc too :)
On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee <unorthodox.engine...@gmail.com> wrote: > Oh. Yes of course. *facepalm* > > I'm sure I typed that at first, but at some point my fingers decided to > grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart > from error) It certainly takes a while to do it. > > Thanks for the maven offer, but I'm not scheduled to learn that until > after Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll > probably need to know it for yarn, but I'm really hoping to put it off > until then. (fortunately I already knew about linux, AWS, eclipse, git, > java, distributed programming and ssh keyfiles, or I would have been in > real trouble) > > Ha! OK, that worked for the Kafka project... fails on the other old 0.9 > Twitter project, but who cares... now for mine.... > > HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that uses > one external library! Now the compiler and I can have a _proper_ > conversation. > > Hopefully you won't be hearing from me for a while. > > > > On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > >> The "magic incantation" is "sbt assembly" (not "assemble"). >> >> Actually I find maven with their assembly plugins to be very easy (mvn >> package). I can send a Pom.xml for a skeleton project if you need >> — >> Sent from Mailbox <https://www.dropbox.com/mailbox> >> >> >> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee < >> unorthodox.engine...@gmail.com> wrote: >> >>> Hmm.. That's not working so well for me. First, I needed to add a >>> "project/plugin.sbt" file with the contents: >>> >>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4") >>> >>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that >>> version number, but "0.9.1" isn't working much better and "11.4" is the >>> latest one recommended by the sbt project site. Where did you get your >>> version from? >>> >>> Second, even when I do get it to build a .jar, spark-submit is still >>> telling me the external.twitter library is missing. >>> >>> I tried using your github project as-is, but it also complained about >>> the missing plugin.. I'm trying it with various versions now to see if I >>> can get that working, even though I don't know anything about kafka. Hmm, >>> and no. Here's what I get: >>> >>> [info] Set current project to Simple Project (in build >>> file:/home/ubuntu/spark-1.0.0/SparkKafka/) >>> [error] Not a valid command: assemble >>> [error] Not a valid project ID: assemble >>> [error] Expected ':' (if selecting a configuration) >>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName, >>> assemblyDirectory) >>> [error] assemble >>> [error] >>> >>> I also found this project which seemed to be exactly what I was after: >>> https://github.com/prabeesh/SparkTwitterAnalysis >>> >>> ...but it was for Spark 0.9, and though I updated all the version >>> references to "1.0.0", that one doesn't work either. I can't even get it to >>> build. >>> >>> *sigh* >>> >>> Is it going to be easier to just copy the external/ source code into my >>> own project? Because I will... especially if creating "Uberjars" takes this >>> long every... single... time... >>> >>> >>> >>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee < >>> unorthodox.engine...@gmail.com> wrote: >>> >>>> Thanks Patrick! >>>> >>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to >>>> the example! I shall work through that today. >>>> >>>> I'm still learning sbt and it's many options... the last new framework >>>> I learned was node.js, and I think I've been rather spoiled by "npm". >>>> >>>> At least it's not maven. Please, oh please don't make me learn maven >>>> too. (The only people who seem to like it have Software Stockholm Syndrome: >>>> "I know maven kidnapped me and beat me up, but if you spend long enough >>>> with it, you eventually start to sympathize and see it's point of view".) >>>> >>>> >>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pwend...@gmail.com> >>>> wrote: >>>> >>>>> Hey Jeremy, >>>>> >>>>> The issue is that you are using one of the external libraries and >>>>> these aren't actually packaged with Spark on the cluster, so you need >>>>> to create an uber jar that includes them. >>>>> >>>>> You can look at the example here (I recently did this for a kafka >>>>> project and the idea is the same): >>>>> >>>>> https://github.com/pwendell/kafka-spark-example >>>>> >>>>> You'll want to make an uber jar that includes these packages (run sbt >>>>> assembly) and then submit that jar to spark-submit. Also, I'd try >>>>> running it locally first (if you aren't already) just to make the >>>>> debugging simpler. >>>>> >>>>> - Patrick >>>>> >>>>> >>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote: >>>>> > Ah sorry, this may be the thing I learned for the day. The issue is >>>>> > that classes from that particular artifact are missing though. Worth >>>>> > interrogating the resulting .jar file with "jar tf" to see if it made >>>>> > it in? >>>>> > >>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath < >>>>> nick.pentre...@gmail.com> wrote: >>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala >>>>> major version >>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be >>>>> correct >>>>> >> syntax for the build. >>>>> >> >>>>> >> I seemed to run into this issue with some missing Jackson deps, and >>>>> solved >>>>> >> it by including the jar explicitly on the driver class path: >>>>> >> >>>>> >> bin/spark-submit --driver-class-path >>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class >>>>> "SimpleApp" >>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar >>>>> >> >>>>> >> Seems redundant to me since I thought that the JAR as argument is >>>>> copied to >>>>> >> driver and made available. But this solved it for me so perhaps >>>>> give it a >>>>> >> try? >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> >>>>> wrote: >>>>> >>> >>>>> >>> Those aren't the names of the artifacts: >>>>> >>> >>>>> >>> >>>>> >>> >>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22 >>>>> >>> >>>>> >>> The name is "spark-streaming-twitter_2.10" >>>>> >>> >>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee >>>>> >>> <unorthodox.engine...@gmail.com> wrote: >>>>> >>> > Man, this has been hard going. Six days, and I finally got a >>>>> "Hello >>>>> >>> > World" >>>>> >>> > App working that I wrote myself. >>>>> >>> > >>>>> >>> > Now I'm trying to make a minimal streaming app based on the >>>>> twitter >>>>> >>> > examples, (running standalone right now while learning) and when >>>>> running >>>>> >>> > it >>>>> >>> > like this: >>>>> >>> > >>>>> >>> > bin/spark-submit --class "SimpleApp" >>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar >>>>> >>> > >>>>> >>> > I'm getting this error: >>>>> >>> > >>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError: >>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$ >>>>> >>> > >>>>> >>> > Which I'm guessing is because I haven't put in a dependency to >>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs >>>>> on it. >>>>> >>> > Here's my build file so far: >>>>> >>> > >>>>> >>> > simple.sbt >>>>> >>> > ------------------------------------------ >>>>> >>> > name := "Simple Project" >>>>> >>> > >>>>> >>> > version := "1.0" >>>>> >>> > >>>>> >>> > scalaVersion := "2.10.4" >>>>> >>> > >>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % >>>>> "1.0.0" >>>>> >>> > >>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % >>>>> "1.0.0" >>>>> >>> > >>>>> >>> > libraryDependencies += "org.apache.spark" %% >>>>> "spark-streaming-twitter" % >>>>> >>> > "1.0.0" >>>>> >>> > >>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % >>>>> "3.0.3" >>>>> >>> > >>>>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/ >>>>> " >>>>> >>> > ------------------------------------------ >>>>> >>> > >>>>> >>> > I've tried a few obvious things like adding: >>>>> >>> > >>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" % >>>>> "1.0.0" >>>>> >>> > >>>>> >>> > libraryDependencies += "org.apache.spark" %% >>>>> "spark-external-twitter" % >>>>> >>> > "1.0.0" >>>>> >>> > >>>>> >>> > because, well, that would match the naming scheme implied so >>>>> far, but it >>>>> >>> > errors. >>>>> >>> > >>>>> >>> > >>>>> >>> > Also, I just realized I don't completely understand if: >>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the >>>>> workers, or >>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, >>>>> which are >>>>> >>> > supposed to already have the jar file installed (or in hdfs), or >>>>> >>> > (c) the Context is supposed to list the jars to be distributed. >>>>> (is that >>>>> >>> > deprecated?) >>>>> >>> > >>>>> >>> > One part of the documentation says: >>>>> >>> > >>>>> >>> > "Once you have an assembled jar you can call the >>>>> bin/spark-submit >>>>> >>> > script as >>>>> >>> > shown here while passing your jar." >>>>> >>> > >>>>> >>> > but another says: >>>>> >>> > >>>>> >>> > "application-jar: Path to a bundled jar including your >>>>> application and >>>>> >>> > all >>>>> >>> > dependencies. The URL must be globally visible inside of your >>>>> cluster, >>>>> >>> > for >>>>> >>> > instance, an hdfs:// path or a file:// path that is present on >>>>> all >>>>> >>> > nodes." >>>>> >>> > >>>>> >>> > I suppose both could be correct if you take a certain point of >>>>> view. >>>>> >>> > >>>>> >>> > -- >>>>> >>> > Jeremy Lee BCompSci(Hons) >>>>> >>> > The Unorthodox Engineers >>>>> >> >>>>> >> >>>>> >>>> >>>> >>>> >>>> -- >>>> Jeremy Lee BCompSci(Hons) >>>> The Unorthodox Engineers >>>> >>> >>> >>> >>> -- >>> Jeremy Lee BCompSci(Hons) >>> The Unorthodox Engineers >>> >> >> > > > -- > Jeremy Lee BCompSci(Hons) > The Unorthodox Engineers >