Hi,

 

I noticed it is hard to find a thorough introduction to using IntelliJ to
debug SPARK-1.1 Apps with mvn/sbt, which is not straightforward for
beginners. So I spent several days to figure it out and hope that it would
be helpful for beginners like me and that professionals can help me improve
it. (The intro with figures can be found at:
http://kylinx.com/spark/Debug-Spark-in-IntelliJ.htm)

 

(1) Install the Scala plugin

 

(2) Download, unzip and open spark-1.1.0 in IntelliJ 

a) mvn: File -> Open. 

    Select the Spark source folder (e.g., /root/spark-1.1.0). Maybe it will
take a long time to download and compile a lot of things

b) sbt: File -> Import Project. 

    Select "Import project from external model", then choose SBT project,
click Next. Input the Spark source path (e.g., /root/spark-1.1.0) for "SBT
project", and select Use auto-import.

 

(3) First compile and run spark examples in the console to ensure everything
OK

# mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package

# ./sbt/sbt assembly -Phadoop-2.2 -Dhadoop.version=2.2.0

 

(4) Add the compiled spark-hadoop library (spark-assembly-1.1.0-hadoop2.2.0)
to "Libraries" (File -> Project Structure. -> Libraries -> green +). And
choose modules that use it (right-click the library and click "Add to
Modules"). It seems only spark-examples need it.

 

(5) In the "Dependencies" page of the modules using this library, ensure
that the "Scope" of this library is "Compile" (File -> Project Structure. ->
Modules)

(6) For sbt, it seems that we have to label the scope of all other hadoop
dependencies (SBT: org.apache.hadoop.hadoop-*) as "Test" (due to poor
Internet connection?) And this has to be done every time opening IntelliJ
(due to a bug?)

 

(7) Configure debug environment (using LogQuery as an example). Run -> Edit
Configurations.

Main class: org.apache.spark.examples.LogQuery

VM options: -Dspark.master=local

Working directory: /root/spark-1.1.0

Use classpath of module: spark-examples_2.10

Before launch: External tool: mvn

    Program: /root/Programs/apache-maven-3.2.1/bin/mvn

    Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests package

    Working directory: /root/spark-1.1.0

Before launch: External tool: sbt

    Program: /root/spark-1.1.0/sbt/sbt

    Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 assembly 

    Working directory: /root/spark-1.1.0

 

(8) Click Run -> Debug 'LogQuery' to start debugging

 

 

Cheers,

Yiming

Reply via email to