I am just starting out with Machine Learning in Spark and this guide was a great introduction. Really great walk through!
Just wanted to add something I came across on the web: In Spark 2.0, you can create a SparkContext in a more concise way:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
SparkSession essentially condenses SparkConf, SparkContext, sqlContext all into one unified API
Thanks again for the article!