Hmm.. That's not working so well for me. First, I needed to add a
"project/plugin.sbt" file with the contents:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")

Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
version number, but "0.9.1" isn't working much better and "11.4" is the
latest one recommended by the sbt project site. Where did you get your
version from?

Second, even when I do get it to build a .jar, spark-submit is still
telling me the external.twitter library is missing.

I tried using your github project as-is, but it also complained about the
missing plugin.. I'm trying it with various versions now to see if I can
get that working, even though I don't know anything about kafka. Hmm, and
no. Here's what I get:

[info] Set current project to Simple Project (in build
file:/home/ubuntu/spark-1.0.0/SparkKafka/)
[error] Not a valid command: assemble
[error] Not a valid project ID: assemble
[error] Expected ':' (if selecting a configuration)
[error] Not a valid key: assemble (similar: assembly, assemblyJarName,
assemblyDirectory)
[error] assemble
[error]

I also found this project which seemed to be exactly what I was after:
https://github.com/prabeesh/SparkTwitterAnalysis

...but it was for Spark 0.9, and though I updated all the version
references to "1.0.0", that one doesn't work either. I can't even get it to
build.

*sigh*

Is it going to be easier to just copy the external/ source code into my own
project? Because I will... especially if creating "Uberjars" takes this
long every... single... time...



On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <unorthodox.engine...@gmail.com>
wrote:

> Thanks Patrick!
>
> Uberjars. Cool. I'd actually heard of them. And thanks for the link to the
> example! I shall work through that today.
>
> I'm still learning sbt and it's many options... the last new framework I
> learned was node.js, and I think I've been rather spoiled by "npm".
>
> At least it's not maven. Please, oh please don't make me learn maven too.
> (The only people who seem to like it have Software Stockholm Syndrome: "I
> know maven kidnapped me and beat me up, but if you spend long enough with
> it, you eventually start to sympathize and see it's point of view".)
>
>
> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pwend...@gmail.com>
> wrote:
>
>> Hey Jeremy,
>>
>> The issue is that you are using one of the external libraries and
>> these aren't actually packaged with Spark on the cluster, so you need
>> to create an uber jar that includes them.
>>
>> You can look at the example here (I recently did this for a kafka
>> project and the idea is the same):
>>
>> https://github.com/pwendell/kafka-spark-example
>>
>> You'll want to make an uber jar that includes these packages (run sbt
>> assembly) and then submit that jar to spark-submit. Also, I'd try
>> running it locally first (if you aren't already) just to make the
>> debugging simpler.
>>
>> - Patrick
>>
>>
>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>> > Ah sorry, this may be the thing I learned for the day. The issue is
>> > that classes from that particular artifact are missing though. Worth
>> > interrogating the resulting .jar file with "jar tf" to see if it made
>> > it in?
>> >
>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>> nick.pentre...@gmail.com> wrote:
>> >> @Sean, the %% syntax in SBT should automatically add the Scala major
>> version
>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
>> >> syntax for the build.
>> >>
>> >> I seemed to run into this issue with some missing Jackson deps, and
>> solved
>> >> it by including the jar explicitly on the driver class path:
>> >>
>> >> bin/spark-submit --driver-class-path
>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>> "SimpleApp"
>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>> >>
>> >> Seems redundant to me since I thought that the JAR as argument is
>> copied to
>> >> driver and made available. But this solved it for me so perhaps give
>> it a
>> >> try?
>> >>
>> >>
>> >>
>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>>
>> >>> Those aren't the names of the artifacts:
>> >>>
>> >>>
>> >>>
>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>> >>>
>> >>> The name is "spark-streaming-twitter_2.10"
>> >>>
>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>> >>> <unorthodox.engine...@gmail.com> wrote:
>> >>> > Man, this has been hard going. Six days, and I finally got a "Hello
>> >>> > World"
>> >>> > App working that I wrote myself.
>> >>> >
>> >>> > Now I'm trying to make a minimal streaming app based on the twitter
>> >>> > examples, (running standalone right now while learning) and when
>> running
>> >>> > it
>> >>> > like this:
>> >>> >
>> >>> > bin/spark-submit --class "SimpleApp"
>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>> >>> >
>> >>> > I'm getting this error:
>> >>> >
>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>> >>> >
>> >>> > Which I'm guessing is because I haven't put in a dependency to
>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on
>> it.
>> >>> > Here's my build file so far:
>> >>> >
>> >>> > simple.sbt
>> >>> > ------------------------------------------
>> >>> > name := "Simple Project"
>> >>> >
>> >>> > version := "1.0"
>> >>> >
>> >>> > scalaVersion := "2.10.4"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>> "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %%
>> "spark-streaming-twitter" %
>> >>> > "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>> "3.0.3"
>> >>> >
>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>> >>> > ------------------------------------------
>> >>> >
>> >>> > I've tried a few obvious things like adding:
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>> "1.0.0"
>> >>> >
>> >>> > libraryDependencies += "org.apache.spark" %%
>> "spark-external-twitter" %
>> >>> > "1.0.0"
>> >>> >
>> >>> > because, well, that would match the naming scheme implied so far,
>> but it
>> >>> > errors.
>> >>> >
>> >>> >
>> >>> > Also, I just realized I don't completely understand if:
>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the workers,
>> or
>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, which
>> are
>> >>> > supposed to already have the jar file installed (or in hdfs), or
>> >>> > (c) the Context is supposed to list the jars to be distributed. (is
>> that
>> >>> > deprecated?)
>> >>> >
>> >>> > One part of the documentation says:
>> >>> >
>> >>> >  "Once you have an assembled jar you can call the bin/spark-submit
>> >>> > script as
>> >>> > shown here while passing your jar."
>> >>> >
>> >>> > but another says:
>> >>> >
>> >>> > "application-jar: Path to a bundled jar including your application
>> and
>> >>> > all
>> >>> > dependencies. The URL must be globally visible inside of your
>> cluster,
>> >>> > for
>> >>> > instance, an hdfs:// path or a file:// path that is present on all
>> >>> > nodes."
>> >>> >
>> >>> > I suppose both could be correct if you take a certain point of view.
>> >>> >
>> >>> > --
>> >>> > Jeremy Lee  BCompSci(Hons)
>> >>> >   The Unorthodox Engineers
>> >>
>> >>
>>
>
>
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers
>



-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Reply via email to