Contributing to Spark-Bench
Thank you for your interest in contributing to Spark-Bench!
Fork The Project
In order to contribute code to Spark-Bench, you will need to first fork the project. Github provides lots of excellent documentation: https://help.github.com/articles/fork-a-repo/
Getting Your Dev Environment Setup
Follow the instructions for cloning and compiling Spark-Bench in the compilation guide
Finding a Starter Issue
There are many issues marked in Github as help wanted
. Usually these are easy to medium difficulty.
If any interest you, feel free to pick it up directly or get in touch with the main Spark-Bench devs to
ask questions about it :)
Setup Your Feature Branch
From the master branch in your fork, create a new branch with a
short but descriptive name for what your changes will be. For example,
if you’re doing updates to the documentation site, you may want
to name your branch doc-updates
. How you structure branches in your fork is totally
up to you!
Making Your Pull Request
You’ve made your code changes, you’ve tested them locally, you’re almost ready to contribute your changes back! Woo!!
First, you’ll need to touch up your branch and commit messages using git.
A good pull request has just 1 commit with a well-formatted commit message. It is up to date with the current master, and can be easily rebased onto the main master branch with no merge commits.
There are a variety of ways to squash all the commits in your branch down to one. We’ll describe just one way here, but there are many others.
Syncing Master
Let’s assume that you have a fork under your own username on Github, and that you’ve
setup the main SparkTC/spark-bench repo
as a remote named upstream
,
like so:
$ git remote -v
origin git@github.com:ecurtin/spark-bench.git (fetch)
origin git@github.com:ecurtin/spark-bench.git (push)
upstream git@github.com:SparkTC/spark-bench.git (fetch)
upstream git@github.com:SparkTC/spark-bench.git (push)
A best practice is to keep your master branch in your repo free of any changes so that you can easily sync it with the upstream repo, and to do all of your development work in a separate branch.
Let’s say you have two branches, master
which is free of any changes, and doc-updates
which has some new changes to contribute.
First, make sure your local copy of master
is up to date.
git checkout master
git pull upstream master
git push origin master
This pulls any new changes from SparkTC/spark-bench into your local repo, then pushes them up to your fork which in this case is ecurtin/spark-bench.
Rebasing Your Changes On Top Of Master
Now let’s go back to the branch with your changes
git checkout doc-updates
If you’re inexperienced with squashing and rebasing, it might be good to use a copy of your branch in case something goes wrong.
git checkout -b rebasing-is-fun
We’re going to rebase the changes in this branch on top of the changes in master
git rebase master
Squashing Your Commits
Now that all our changes are on top of the latest updates from master, we want to squash all the commits you’ve made while working on your feature into just one commit. Again, there are many ways to do this, this is just one.
Let’s say you have three commits that you want to get down to just one.
git rebase -i HEAD~3
This will open up a file in Vim or whatever default editor you have set for git with a list of your commits in chronological order.
Beside each commit will be the word pick
. Using your text editor, keep the first
commit as pick
but change the second two to s
or squash
.
Save and close this file and git will run the process. Then it will give you a chance
to rewrite your commit message.
Formatting Your Commit Message
Your commit message should have a subject that is 50 characters or less followed by a body that is line-wrapped at 72 characters.
For instructions, examples, and more information about formatting a great commit message, see this blog post
Sync Your New Changes
Now that your changes are on your branch, push them up to your fork on Github.
git push origin
Create Your PR on Github
Github has lots of excellent documentation for this if you’re unfamiliar with the process.
Don’t Be Shy!
If you’re a first-time open source contributor, we are SO glad that you’re here! Please feel free to reach out to the main Spark-Bench devs for questions and for help with getting your contribution into Spark-Bench 🙂