In the previous post we saw how to quickly get IPython up and running with PySpark.
Now we will set up Zeppelin, which can run both Spark-Shell (in scala) and PySpark (in python) Spark jobs from its notebooks.
We will build, run and configure Zeppelin to run the same Spark jobs in Scala and Python, using the Zeppelin SQL interpreter and Matplotlib to visualize SparkSQL query results.
A comparison between Scala and Python speeds, and between Zeppelin and IPython will be made to conclude this post.