Apache Spark with Maven template

I provide hereby a template for starting a Spark project with Maven. Code examples are in both Java and Scala. It provides a template for Spark Core, spark SQL and Spark Streaming.

This is a good starting point for any Spark project that you want to start.

The project template can be found in this GitHub repository.

project structure

Project structure is of a parent pom and a child module.

See picture below:

screenshot-from-2017-01-20-14-59-42

pom.xml explained

All dependencies except for the Spark libraries are defined in <dependencyManagement> tag in the parent pom.

The Spark libraries are defined in two profiles: spark-prod and spark-dev.

‘spark-prod’ contains all spark libraries in scope provided. This provides a fat jar that does not contain the spark libraries in it and hence can be deployed in a cluster using the command spark-submit.

‘spark-dev’ contains all spark libraries in scope compile. This allows debugging it in IntelliJ or eclipse in local mode.

In order to debug the project one needs to set the profile to -Pspark-dev or set it in the IDE profile. For example, in IntelliJ you set it like this:

screenshot-from-2017-01-20-15-14-49

Source templates

There is one Java source template for Spark core called HelloWorldJava.java.

There are three Scala source templates for Spark core, Spark SQL and Spark Streaming respectively:

  • HelloWorldScala.scala
  • HelloWorldSqlScala.scala
  • HelloWorldStreaming.scala

There is one test class Hello_Test.scala. This test uses the testing framework ScalaTest

Trouble Shooting

If you get the following error while trying to run any of the classes:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
 at com.tikalk.HelloWorldScala$.main(HelloWorldScala.scala:13)
 at com.tikalk.HelloWorldScala.main(HelloWorldScala.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

It is due to forgetting to set the profile to ‘spark-dev’.

Set the profile to ‘spark-dev’ as explained in the section: pom.xml explained.


The project template can be found in this GitHub repository.

Author: Ran Silberman

I am a tour guide in Israel with a passion for the Bible. For many years I work in the software industry as a software consultant. I blog in http://ransilberman.blog

1 thought on “Apache Spark with Maven template”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s