Hi,
I noticed it is hard to find a thorough introduction to using IntelliJ to debug SPARK-1.1 Apps with mvn/sbt, which is not straightforward for beginners. So I spent several days to figure it out and hope that it would be helpful for beginners like me and that professionals can help me improve it. (The intro with figures can be found at: http://kylinx.com/spark/Debug-Spark-in-IntelliJ.htm) (1) Install the Scala plugin (2) Download, unzip and open spark-1.1.0 in IntelliJ a) mvn: File -> Open. Select the Spark source folder (e.g., /root/spark-1.1.0). Maybe it will take a long time to download and compile a lot of things b) sbt: File -> Import Project. Select "Import project from external model", then choose SBT project, click Next. Input the Spark source path (e.g., /root/spark-1.1.0) for "SBT project", and select Use auto-import. (3) First compile and run spark examples in the console to ensure everything OK # mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package # ./sbt/sbt assembly -Phadoop-2.2 -Dhadoop.version=2.2.0 (4) Add the compiled spark-hadoop library (spark-assembly-1.1.0-hadoop2.2.0) to "Libraries" (File -> Project Structure. -> Libraries -> green +). And choose modules that use it (right-click the library and click "Add to Modules"). It seems only spark-examples need it. (5) In the "Dependencies" page of the modules using this library, ensure that the "Scope" of this library is "Compile" (File -> Project Structure. -> Modules) (6) For sbt, it seems that we have to label the scope of all other hadoop dependencies (SBT: org.apache.hadoop.hadoop-*) as "Test" (due to poor Internet connection?) And this has to be done every time opening IntelliJ (due to a bug?) (7) Configure debug environment (using LogQuery as an example). Run -> Edit Configurations. Main class: org.apache.spark.examples.LogQuery VM options: -Dspark.master=local Working directory: /root/spark-1.1.0 Use classpath of module: spark-examples_2.10 Before launch: External tool: mvn Program: /root/Programs/apache-maven-3.2.1/bin/mvn Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests package Working directory: /root/spark-1.1.0 Before launch: External tool: sbt Program: /root/spark-1.1.0/sbt/sbt Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 assembly Working directory: /root/spark-1.1.0 (8) Click Run -> Debug 'LogQuery' to start debugging Cheers, Yiming