Oh. Yes of course. *facepalm*

I'm sure I typed that at first, but at some point my fingers decided to
grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart
from error) It certainly takes a while to do it.

Thanks for the maven offer, but I'm not scheduled to learn that until after
Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll probably
need to know it for yarn, but I'm really hoping to put it off until then.
(fortunately I already knew about linux, AWS, eclipse, git, java,
distributed programming and ssh keyfiles, or I would have been in real
trouble)

Ha! OK, that worked for the Kafka project... fails on the other old 0.9
Twitter project, but who cares... now for mine....

HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that uses
one external library! Now the compiler and I can have a _proper_
conversation.

Hopefully you won't be hearing from me for a while.



On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> The "magic incantation" is "sbt assembly" (not "assemble").
>
> Actually I find maven with their assembly plugins to be very easy (mvn
> package). I can send a Pom.xml for a skeleton project if you need
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <unorthodox.engine...@gmail.com
> > wrote:
>
>> Hmm.. That's not working so well for me. First, I needed to add a
>> "project/plugin.sbt" file with the contents:
>>
>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
>>
>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
>> version number, but "0.9.1" isn't working much better and "11.4" is the
>> latest one recommended by the sbt project site. Where did you get your
>> version from?
>>
>> Second, even when I do get it to build a .jar, spark-submit is still
>> telling me the external.twitter library is missing.
>>
>> I tried using your github project as-is, but it also complained about the
>> missing plugin.. I'm trying it with various versions now to see if I can
>> get that working, even though I don't know anything about kafka. Hmm, and
>> no. Here's what I get:
>>
>>  [info] Set current project to Simple Project (in build
>> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
>> [error] Not a valid command: assemble
>> [error] Not a valid project ID: assemble
>> [error] Expected ':' (if selecting a configuration)
>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
>> assemblyDirectory)
>> [error] assemble
>> [error]
>>
>> I also found this project which seemed to be exactly what I was after:
>>  https://github.com/prabeesh/SparkTwitterAnalysis
>>
>> ...but it was for Spark 0.9, and though I updated all the version
>> references to "1.0.0", that one doesn't work either. I can't even get it to
>> build.
>>
>> *sigh*
>>
>> Is it going to be easier to just copy the external/ source code into my
>> own project? Because I will... especially if creating "Uberjars" takes this
>> long every... single... time...
>>
>>
>>
>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <
>> unorthodox.engine...@gmail.com> wrote:
>>
>>> Thanks Patrick!
>>>
>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to
>>> the example! I shall work through that today.
>>>
>>> I'm still learning sbt and it's many options... the last new framework I
>>> learned was node.js, and I think I've been rather spoiled by "npm".
>>>
>>> At least it's not maven. Please, oh please don't make me learn maven
>>> too. (The only people who seem to like it have Software Stockholm Syndrome:
>>> "I know maven kidnapped me and beat me up, but if you spend long enough
>>> with it, you eventually start to sympathize and see it's point of view".)
>>>
>>>
>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pwend...@gmail.com>
>>> wrote:
>>>
>>>> Hey Jeremy,
>>>>
>>>> The issue is that you are using one of the external libraries and
>>>> these aren't actually packaged with Spark on the cluster, so you need
>>>> to create an uber jar that includes them.
>>>>
>>>> You can look at the example here (I recently did this for a kafka
>>>> project and the idea is the same):
>>>>
>>>> https://github.com/pwendell/kafka-spark-example
>>>>
>>>> You'll want to make an uber jar that includes these packages (run sbt
>>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>>> running it locally first (if you aren't already) just to make the
>>>> debugging simpler.
>>>>
>>>> - Patrick
>>>>
>>>>
>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>>>> > Ah sorry, this may be the thing I learned for the day. The issue is
>>>> > that classes from that particular artifact are missing though. Worth
>>>> > interrogating the resulting .jar file with "jar tf" to see if it made
>>>> > it in?
>>>> >
>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>>> nick.pentre...@gmail.com> wrote:
>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala major
>>>> version
>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be
>>>> correct
>>>> >> syntax for the build.
>>>> >>
>>>> >> I seemed to run into this issue with some missing Jackson deps, and
>>>> solved
>>>> >> it by including the jar explicitly on the driver class path:
>>>> >>
>>>> >> bin/spark-submit --driver-class-path
>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>>> "SimpleApp"
>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>> >>
>>>> >> Seems redundant to me since I thought that the JAR as argument is
>>>> copied to
>>>> >> driver and made available. But this solved it for me so perhaps give
>>>> it a
>>>> >> try?
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com>
>>>> wrote:
>>>> >>>
>>>> >>> Those aren't the names of the artifacts:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>> >>>
>>>> >>> The name is "spark-streaming-twitter_2.10"
>>>> >>>
>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>>> >>> <unorthodox.engine...@gmail.com> wrote:
>>>> >>> > Man, this has been hard going. Six days, and I finally got a
>>>> "Hello
>>>> >>> > World"
>>>> >>> > App working that I wrote myself.
>>>> >>> >
>>>> >>> > Now I'm trying to make a minimal streaming app based on the
>>>> twitter
>>>> >>> > examples, (running standalone right now while learning) and when
>>>> running
>>>> >>> > it
>>>> >>> > like this:
>>>> >>> >
>>>> >>> > bin/spark-submit --class "SimpleApp"
>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>> >>> >
>>>> >>> > I'm getting this error:
>>>> >>> >
>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>>> >>> >
>>>> >>> > Which I'm guessing is because I haven't put in a dependency to
>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs
>>>> on it.
>>>> >>> > Here's my build file so far:
>>>> >>> >
>>>> >>> > simple.sbt
>>>> >>> > ------------------------------------------
>>>> >>> > name := "Simple Project"
>>>> >>> >
>>>> >>> > version := "1.0"
>>>> >>> >
>>>> >>> > scalaVersion := "2.10.4"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" %
>>>> "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>>> "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>> "spark-streaming-twitter" %
>>>> >>> > "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>>>> "3.0.3"
>>>> >>> >
>>>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>>>> >>> > ------------------------------------------
>>>> >>> >
>>>> >>> > I've tried a few obvious things like adding:
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>>>> "1.0.0"
>>>> >>> >
>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>> "spark-external-twitter" %
>>>> >>> > "1.0.0"
>>>> >>> >
>>>> >>> > because, well, that would match the naming scheme implied so far,
>>>> but it
>>>> >>> > errors.
>>>> >>> >
>>>> >>> >
>>>> >>> > Also, I just realized I don't completely understand if:
>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the
>>>> workers, or
>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers,
>>>> which are
>>>> >>> > supposed to already have the jar file installed (or in hdfs), or
>>>> >>> > (c) the Context is supposed to list the jars to be distributed.
>>>> (is that
>>>> >>> > deprecated?)
>>>> >>> >
>>>> >>> > One part of the documentation says:
>>>> >>> >
>>>> >>> >  "Once you have an assembled jar you can call the bin/spark-submit
>>>> >>> > script as
>>>> >>> > shown here while passing your jar."
>>>> >>> >
>>>> >>> > but another says:
>>>> >>> >
>>>> >>> > "application-jar: Path to a bundled jar including your
>>>> application and
>>>> >>> > all
>>>> >>> > dependencies. The URL must be globally visible inside of your
>>>> cluster,
>>>> >>> > for
>>>> >>> > instance, an hdfs:// path or a file:// path that is present on all
>>>> >>> > nodes."
>>>> >>> >
>>>> >>> > I suppose both could be correct if you take a certain point of
>>>> view.
>>>> >>> >
>>>> >>> > --
>>>> >>> > Jeremy Lee  BCompSci(Hons)
>>>> >>> >   The Unorthodox Engineers
>>>> >>
>>>> >>
>>>>
>>>
>>>
>>>
>>> --
>>> Jeremy Lee  BCompSci(Hons)
>>>   The Unorthodox Engineers
>>>
>>
>>
>>
>> --
>> Jeremy Lee  BCompSci(Hons)
>>   The Unorthodox Engineers
>>
>
>


-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Reply via email to