In this part, we will run a simple Word Count application on the cluster using Hadoop and Spark on various platforms and cluster sizes.
We will run and benchmark the same program on 5 datasets of different sizes on :
- A single MinnowBoard MAX, using a multi-threaded simple java application
- A real home computer (my laptop), using the same simple java application
- MapReduce, using a cluster of 2 to 4 slaves
- Spark, using a cluster of 2 to 4 slaves
Using these results we will hopefully be able to answer to the original questions of this section : is a home cluster with such small computers worth it ? How many nodes does it take to be faster than a single node, or faster than a real computer ?