This guide assumes you have a working installation of Spark 2.x available and that you have access to the system where it is installed.

Installation Summary

  1. Grab the latest release from here: https://github.com/SparkTC/spark-bench/releases/latest.
  2. Unpack the tarball using tar -xvzf.
  3. cd into the newly created folder.
  4. Set your environment variables
    • Option 1: modify SPARK_HOME and SPARK_MASTER_HOST in bin/spark-bench-env.sh to reflect your environment.
    • Option 2: Recommended! Modify the config files in the examples and set spark-home and spark-args = { master } to reflect your environment. (See below for more info!)
  5. Start using spark-bench!
    ./bin/spark-bench.sh /path/to/your/config/file.conf
    

Table of Contents generated with DocToc

Setting Environment Variables

There are two ways to set the Spark home and master variables necessary to run the examples.

Option 1: Setting Bash Environment Variables

Inside the bin folder is a file called spark-bench-env.sh. In this folder are two environment variables that you will be required to set. The first is SPARK_HOME which is simply the full path to the top level of your Spark installation on your laptop or cluster. The second is SPARK_MASTER_HOST which is the same as what you would enter as --master in a spark submit script for this environment. This might be local[2] on your laptop, yarn on a Yarn cluster, an IP address and port if you’re running in standalone mode, you get the idea!

You can set those environment variables in your bash profile or by uncommenting the lines in spark-bench-env.sh and filling them out in place.

For example, in the minimal-example.conf, which looks like this:

spark-bench = {
  spark-submit-config = [{
    workload-suites = [
      {
        descr = "One run of SparkPi and that's it!"
        benchmark-output = "console"
        workloads = [
          {
            name = "sparkpi"
            slices = 10
          }
        ]
      }
    ]
  }]
}

Add the spark-home and master keys.

spark-bench = {
  spark-home = "/path/to/your/spark/install/" 
  spark-submit-config = [{
    spark-args = {
      master = "local[*]" // or whatever the correct master is for your environment
    }
    workload-suites = [
      {
        descr = "One run of SparkPi and that's it!"
        benchmark-output = "console"
        workloads = [
          {
            name = "sparkpi"
            slices = 10
          }
        ]
      }
    ]
  }]
}

Running the Examples

From the spark-bench distribution file, simply run:

./bin/spark-bench.sh ./examples/minimal-example.conf

The example scripts and associated configuration files are a great starting point for learning spark-bench by example. You can also read more about spark-bench at our documentation site