Hi Jeremy , if you are using *addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4") * in "project/plugin.sbt"
You also need to edit "project / project / build.scala" with same sbt version(0.11.4). like import sbt._ object Plugins extends Build { lazy val root = Project("root", file(".")) dependsOn( uri("git://github.com/sbt/sbt-assembly.git#0.11.4") ) } Then try *sbt assembly.* Let me know is it working or not. Regards, prabeesh On Thu, Jun 5, 2014 at 1:16 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Great - well we do hope we hear from you, since the user list is for > interesting success stories and anecdotes, as well as blog posts etc too :) > > > On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee <unorthodox.engine...@gmail.com > > wrote: > >> Oh. Yes of course. *facepalm* >> >> I'm sure I typed that at first, but at some point my fingers decided to >> grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart >> from error) It certainly takes a while to do it. >> >> Thanks for the maven offer, but I'm not scheduled to learn that until >> after Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll >> probably need to know it for yarn, but I'm really hoping to put it off >> until then. (fortunately I already knew about linux, AWS, eclipse, git, >> java, distributed programming and ssh keyfiles, or I would have been in >> real trouble) >> >> Ha! OK, that worked for the Kafka project... fails on the other old 0.9 >> Twitter project, but who cares... now for mine.... >> >> HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that >> uses one external library! Now the compiler and I can have a _proper_ >> conversation. >> >> Hopefully you won't be hearing from me for a while. >> >> >> >> On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <nick.pentre...@gmail.com> >> wrote: >> >>> The "magic incantation" is "sbt assembly" (not "assemble"). >>> >>> Actually I find maven with their assembly plugins to be very easy (mvn >>> package). I can send a Pom.xml for a skeleton project if you need >>> — >>> Sent from Mailbox <https://www.dropbox.com/mailbox> >>> >>> >>> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee < >>> unorthodox.engine...@gmail.com> wrote: >>> >>>> Hmm.. That's not working so well for me. First, I needed to add a >>>> "project/plugin.sbt" file with the contents: >>>> >>>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4") >>>> >>>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that >>>> version number, but "0.9.1" isn't working much better and "11.4" is the >>>> latest one recommended by the sbt project site. Where did you get your >>>> version from? >>>> >>>> Second, even when I do get it to build a .jar, spark-submit is still >>>> telling me the external.twitter library is missing. >>>> >>>> I tried using your github project as-is, but it also complained about >>>> the missing plugin.. I'm trying it with various versions now to see if I >>>> can get that working, even though I don't know anything about kafka. Hmm, >>>> and no. Here's what I get: >>>> >>>> [info] Set current project to Simple Project (in build >>>> file:/home/ubuntu/spark-1.0.0/SparkKafka/) >>>> [error] Not a valid command: assemble >>>> [error] Not a valid project ID: assemble >>>> [error] Expected ':' (if selecting a configuration) >>>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName, >>>> assemblyDirectory) >>>> [error] assemble >>>> [error] >>>> >>>> I also found this project which seemed to be exactly what I was after: >>>> https://github.com/prabeesh/SparkTwitterAnalysis >>>> >>>> ...but it was for Spark 0.9, and though I updated all the version >>>> references to "1.0.0", that one doesn't work either. I can't even get it to >>>> build. >>>> >>>> *sigh* >>>> >>>> Is it going to be easier to just copy the external/ source code into my >>>> own project? Because I will... especially if creating "Uberjars" takes this >>>> long every... single... time... >>>> >>>> >>>> >>>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee < >>>> unorthodox.engine...@gmail.com> wrote: >>>> >>>>> Thanks Patrick! >>>>> >>>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to >>>>> the example! I shall work through that today. >>>>> >>>>> I'm still learning sbt and it's many options... the last new framework >>>>> I learned was node.js, and I think I've been rather spoiled by "npm". >>>>> >>>>> At least it's not maven. Please, oh please don't make me learn maven >>>>> too. (The only people who seem to like it have Software Stockholm >>>>> Syndrome: >>>>> "I know maven kidnapped me and beat me up, but if you spend long enough >>>>> with it, you eventually start to sympathize and see it's point of view".) >>>>> >>>>> >>>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pwend...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey Jeremy, >>>>>> >>>>>> The issue is that you are using one of the external libraries and >>>>>> these aren't actually packaged with Spark on the cluster, so you need >>>>>> to create an uber jar that includes them. >>>>>> >>>>>> You can look at the example here (I recently did this for a kafka >>>>>> project and the idea is the same): >>>>>> >>>>>> https://github.com/pwendell/kafka-spark-example >>>>>> >>>>>> You'll want to make an uber jar that includes these packages (run sbt >>>>>> assembly) and then submit that jar to spark-submit. Also, I'd try >>>>>> running it locally first (if you aren't already) just to make the >>>>>> debugging simpler. >>>>>> >>>>>> - Patrick >>>>>> >>>>>> >>>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote: >>>>>> > Ah sorry, this may be the thing I learned for the day. The issue is >>>>>> > that classes from that particular artifact are missing though. Worth >>>>>> > interrogating the resulting .jar file with "jar tf" to see if it >>>>>> made >>>>>> > it in? >>>>>> > >>>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath < >>>>>> nick.pentre...@gmail.com> wrote: >>>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala >>>>>> major version >>>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be >>>>>> correct >>>>>> >> syntax for the build. >>>>>> >> >>>>>> >> I seemed to run into this issue with some missing Jackson deps, >>>>>> and solved >>>>>> >> it by including the jar explicitly on the driver class path: >>>>>> >> >>>>>> >> bin/spark-submit --driver-class-path >>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class >>>>>> "SimpleApp" >>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar >>>>>> >> >>>>>> >> Seems redundant to me since I thought that the JAR as argument is >>>>>> copied to >>>>>> >> driver and made available. But this solved it for me so perhaps >>>>>> give it a >>>>>> >> try? >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> >>>>>> wrote: >>>>>> >>> >>>>>> >>> Those aren't the names of the artifacts: >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22 >>>>>> >>> >>>>>> >>> The name is "spark-streaming-twitter_2.10" >>>>>> >>> >>>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee >>>>>> >>> <unorthodox.engine...@gmail.com> wrote: >>>>>> >>> > Man, this has been hard going. Six days, and I finally got a >>>>>> "Hello >>>>>> >>> > World" >>>>>> >>> > App working that I wrote myself. >>>>>> >>> > >>>>>> >>> > Now I'm trying to make a minimal streaming app based on the >>>>>> twitter >>>>>> >>> > examples, (running standalone right now while learning) and >>>>>> when running >>>>>> >>> > it >>>>>> >>> > like this: >>>>>> >>> > >>>>>> >>> > bin/spark-submit --class "SimpleApp" >>>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar >>>>>> >>> > >>>>>> >>> > I'm getting this error: >>>>>> >>> > >>>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError: >>>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$ >>>>>> >>> > >>>>>> >>> > Which I'm guessing is because I haven't put in a dependency to >>>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any >>>>>> docs on it. >>>>>> >>> > Here's my build file so far: >>>>>> >>> > >>>>>> >>> > simple.sbt >>>>>> >>> > ------------------------------------------ >>>>>> >>> > name := "Simple Project" >>>>>> >>> > >>>>>> >>> > version := "1.0" >>>>>> >>> > >>>>>> >>> > scalaVersion := "2.10.4" >>>>>> >>> > >>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % >>>>>> "1.0.0" >>>>>> >>> > >>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" >>>>>> % "1.0.0" >>>>>> >>> > >>>>>> >>> > libraryDependencies += "org.apache.spark" %% >>>>>> "spark-streaming-twitter" % >>>>>> >>> > "1.0.0" >>>>>> >>> > >>>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % >>>>>> "3.0.3" >>>>>> >>> > >>>>>> >>> > resolvers += "Akka Repository" at " >>>>>> http://repo.akka.io/releases/" >>>>>> >>> > ------------------------------------------ >>>>>> >>> > >>>>>> >>> > I've tried a few obvious things like adding: >>>>>> >>> > >>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" % >>>>>> "1.0.0" >>>>>> >>> > >>>>>> >>> > libraryDependencies += "org.apache.spark" %% >>>>>> "spark-external-twitter" % >>>>>> >>> > "1.0.0" >>>>>> >>> > >>>>>> >>> > because, well, that would match the naming scheme implied so >>>>>> far, but it >>>>>> >>> > errors. >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > Also, I just realized I don't completely understand if: >>>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the >>>>>> workers, or >>>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, >>>>>> which are >>>>>> >>> > supposed to already have the jar file installed (or in hdfs), or >>>>>> >>> > (c) the Context is supposed to list the jars to be distributed. >>>>>> (is that >>>>>> >>> > deprecated?) >>>>>> >>> > >>>>>> >>> > One part of the documentation says: >>>>>> >>> > >>>>>> >>> > "Once you have an assembled jar you can call the >>>>>> bin/spark-submit >>>>>> >>> > script as >>>>>> >>> > shown here while passing your jar." >>>>>> >>> > >>>>>> >>> > but another says: >>>>>> >>> > >>>>>> >>> > "application-jar: Path to a bundled jar including your >>>>>> application and >>>>>> >>> > all >>>>>> >>> > dependencies. The URL must be globally visible inside of your >>>>>> cluster, >>>>>> >>> > for >>>>>> >>> > instance, an hdfs:// path or a file:// path that is present on >>>>>> all >>>>>> >>> > nodes." >>>>>> >>> > >>>>>> >>> > I suppose both could be correct if you take a certain point of >>>>>> view. >>>>>> >>> > >>>>>> >>> > -- >>>>>> >>> > Jeremy Lee BCompSci(Hons) >>>>>> >>> > The Unorthodox Engineers >>>>>> >> >>>>>> >> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jeremy Lee BCompSci(Hons) >>>>> The Unorthodox Engineers >>>>> >>>> >>>> >>>> >>>> -- >>>> Jeremy Lee BCompSci(Hons) >>>> The Unorthodox Engineers >>>> >>> >>> >> >> >> -- >> Jeremy Lee BCompSci(Hons) >> The Unorthodox Engineers >> > >