YARN Archives - Nico's Blog

Published October 31, 2015 by Nicolas

Mini-Cluster Part IV : Word Count Benchmark

Benchmark Art In this part, we will run a simple Word Count application on the cluster using Hadoop and Spark on various platforms and cluster sizes.

We will run and benchmark the same program on 5 datasets of different sizes on :

A single MinnowBoard MAX, using a multi-threaded simple java application
A real home computer (my laptop), using the same simple java application
MapReduce, using a cluster of 2 to 4 slaves
Spark, using a cluster of 2 to 4 slaves

Using these results we will hopefully be able to answer to the original questions of this section : is a home cluster with such small computers worth it ? How many nodes does it take to be faster than a single node, or faster than a real computer ?

Mini-Cluster

Benchmark MapReduce MinnowBoard Spark YARN

Published October 24, 2015 by Nicolas

Mini-Cluster Part III : Hadoop & Spark Installation

Hadoop Spark logos

In this part, we will see how to install and configure Hadoop (2.7.1) and Spark (1.5.1) to have one master and four slaves.

The configurations in this part are adapted for MinnowBoard SBCs. I tried to give as much explanations on the chosen values, which are relative to the resources of this specific cluster. If you have any questions, or if you doubt my configuration, feel free to comment. 🙂

We start by creating a user which we will use for all Hadoop related tasks. Then we will see how to install and configure the master and slaves. Finally we will finish by running a simple MapReduce job to check that everything works and to start being familiar with the Hadoop ecosystem.

Mini-Cluster

Hadoop HDFS MapReduce Spark YARN

Tag: <span>YARN</span>

Mini-Cluster Part IV : Word Count Benchmark

Mini-Cluster Part III : Hadoop & Spark Installation