Month: October 2015

Benchmark ArtIn this part, we will run a simple Word Count application on the cluster using Hadoop and Spark on various platforms and cluster sizes.

We will run and benchmark the same program on 5 datasets of different sizes on :

  • A single MinnowBoard MAX, using a multi-threaded simple java application
  • A real home computer (my laptop), using the same simple java application
  • MapReduce, using a cluster of 2 to 4 slaves
  • Spark, using a cluster of 2 to 4 slaves

Using these results we will hopefully be able to answer to the original questions of this section : is a home cluster with such small computers worth it ? How many nodes does it take to be faster than a single node, or faster than a real computer ?

Mini-Cluster

Hadoop Spark logos

In this part, we will see how to install and configure Hadoop (2.7.1) and Spark (1.5.1) to have one master and four slaves.

The configurations in this part are adapted for MinnowBoard SBCs. I tried to give as much explanations on the chosen values, which are relative to the resources of this specific cluster. If you have any questions, or if you doubt my configuration, feel free to comment. 🙂

We start by creating a user which we will use for all Hadoop related tasks. Then we will see how to install and configure the master and slaves. Finally we will finish by running a simple MapReduce job to check that everything works and to start being familiar with the Hadoop ecosystem.

Mini-Cluster

Ethernet IconIn this part, I will explain how I installed and configured the Ubuntu Server OS (along with the necessary tools and libraries) and network settings in order to prepare each node for Hadoop and Spark.

I wrote this part as a memo for myself and also to help out beginners who are not comfortable with Ubuntu and networking. It can be useful as well for people who have some trouble specifically with the Minnowboard MAX.

Mini-Cluster

In this first part, I will explain how and why I selected the various hardware (computer, storage, networking, etc.) to build my home cluster.

MinnowBoard MAX
The MinnowBoard MAX Single Board Computer

First we will compare the specs from the MinnowBoard with those of the Raspberry Pi 2. Then we will see the different available storage media on Single Board Computers, with a few explanations and benchmarks that made. Then I will show what network and rack setup I chose, and finally we will sum up all these component prices to see what was the total cost of my mini-cluster !

Mini-Cluster

This first series of posts presents my home mini-cluster that I have set up in September 2015.

Having only a Windows laptop, I had decided to buy a cheap SBC (Single Board Computer), sometimes called SoC (System on a Chip), to have my own Linux server at home.

I decided to buy a MinnowBoard MAX (choice explained in Part I) and was very satisfied with it. It has enough specs for any home usage, however it was not very powerful to make calculations, and was slow to process a lot of data compared to my laptop. So I thought why not buy a few more and make a cluster !

Home-made Cluster
Home-made Cluster

But can I actually make a powerful cluster with only SBCs ? How fast can it be ? How many nodes does it take to be faster than a single SBC ? Can it be faster than my Core-i7 laptop ?

Mini-Cluster