In this first part, I will explain how and why I selected the various hardware (computer, storage, networking, etc.) to build my home cluster.
First we will compare the specs from the MinnowBoard with those of the Raspberry Pi 2. Then we will see the different available storage media on Single Board Computers, with a few explanations and benchmarks that made. Then I will show what network and rack setup I chose, and finally we will sum up all these component prices to see what was the total cost of my mini-cluster !
Nowadays SBCs (Single Board Computers) such as the famous Raspberry-Pi are very popular for using as home-servers because they are very cheap, use very little space, and have very low energy consumption. For these reasons I have decided to go for a SBC, but for a more expensive model called the MinnowBoard MAX.
Here is a comparison of both boards, showing the specs which are important to me for a distributed computing cluster using tools such as Hadoop :
|MinnowBoard MAX||Raspberry-Pi 2 Model B|
|Processor||1.33 Ghz Dual Core 64-bit Intel Atom E3825||900 Mhz Quad-core ARM Cortex-A7|
|RAM||2 Gb||1 Gb|
|Ethernet Port||1 Gbit||100 Mbit|
|USB Ports||1 USB 2.0 + 1 USB 3.0||4 x USB 2.0|
The Raspberry-Pi is a lot cheaper, but I still decided to go for the MinnowBoard, because :
- In a distributed setting the most critical and common bottlenecks are Network speed and Hard Disk IO. And the the Minnowboard has the following specs which the Raspberry-Pi doesn’t :
- 1 Gbit Ethernet port
- 2 GB of RAM : useful for distributed computing tools. Some frameworks won’t even run with only 1 GB.
- USB 3.0 port : convenient for an external hard drive, as USB 2.0 HDDs are typically limited to 40 MB/s.
- SATA port (theoretically good, but which I didn’t use, read further).
- It has an x86_64 Intel processor, which enables to run any kind of software, whereas some libraries will not run on an ARM (unless you try to compile them yourself).
Below are the different available storage media on MinnowBoard.
The MinnowBoard and the Raspberry-Pi both have Micro-SD card readers. This is the cheapest way of storing data, since a 32GB Class-10 card (best speed class on the market right now) costs about 10$. However I don’t recommend using this to install an Operating System because :
- Micro SD is Flash memory so it can only support a limited number of writes before dying. This is bad to store an Operating System since it requires constantly lots of writes.
- Even a Class-10 is a lot slower than an external HDD.
Thumbrives are also Flash memory so they are not too suitable to install an OS. However it did show better performance than a Micro SD in my case (see benchmark in the next paragraph).
They are a bit more expensive for small memory (≤ 32 GB) but get a lot cheaper than Micro SD if you when you get to bigger sizes (≥ 128 GB) and keep your standards up to USB 3.0 and Class-10 respectively.
The MinnowBoard MAX has a USB 3.0 port, which enables a high potential for external HDD disk throughput. For any given size above 500 GB they are a lot cheaper than thumbdrives and micro SD cards, and have a longer life expectancy.
Now let’s put these 3 types of storage to the test with some of the latest models currently on the market.
Micro SD vs Thumbdrive vs External HDD
Here is a little test on how long it takes to copy/paste a 3.5 GB file on each of these devices.
The results were quite predictable, but I just I love to benchmark stuff:
|Micro SD||Kingston 32GB Class-10 SDHC||~10m|
|Thumbdrive||Kingston 32GB Data Traveler Micro 3.1||6m 45s|
|External HDD||Samsung P3 500GB||1m 30s|
Note : The micro SD time is approximate because it was quite unstable as it sometimes freezes for a few seconds …
Another storage medium that can be used here is a NAS (Network Attached Storage), connected to the rack switch.
A NAS is a storage server which can use RAID to accelerate IOs and/or for redundancy for backup. Each board could have their OS and data partitions stored on the NAS and wouldn’t need individual DASs.
NASs are quite affordable, however I overlooked this option because :
- The network speed to the NAS would probably be an issue since all nodes would constantly we reading/writing at the same time.
- This is not Hadoop’s philosophy, which prefers using JBOD (Just a Buch of Disks), since it already takes care of replication at a software level.
There is also a SATA port on the MinnowBoard, which enables to connect an SSD drive, however I have not tried it for the following reasons :
- The SATA port is not completely standard. You can’t just buy an SDD drive and plug it. According to what I’ve read on the net, some special customization has to be made and it sounds a bit risky.
- SSD is expensive (around 150$ vs 40$ for 500GB Samsung SSD vs HDD).
- If it turned out not to work, what would I do with an SSD drive? I don’t even own a desktop computer.
- I’ll admit I was a bit stingy and cowardly for this option 😉
My final choice : External HDD
So I chose to use External HDDs in my cluster, and needed to get 5 disks for my 5 MinnowBoards.
I bought a few different brands of HDD to limit my chances of ending up unlucky with 5 units of the slowest model on the market.
Most external HDDs have very little price difference between 500GB and 1TB (around 50-60$), like for example the WD MyPassport Ultra, for which I bought the 1TB because the 500GB was nearly at the same price. Want to know what I mean when I say “nearly the same price” ? 1$ ! Check it out on Google shopping :
One exception is Samsung, which offer 500TB P3 and M3 models for around 38$.
Here is the full benchmark of the different external HDDs I bought, tested on a MinnowBoard :
|Samsung P3||500GB||95.1 MB/s||91.8 MB/s||36.5 MB/s|
|Samsung M3||500GB||97.8 MB/s||84.8 MB/s||35.4 MB/s|
|Toshiba Canvio-3||500GB||88.5 MB/s||91.4 MB/s||26.6 MB/s|
|WD MyPassport Ultra||1TB||97.5 MB/s||63.1 MB/s||25.9 MB/s|
For the “Read” test on Ubuntu I did an average of 6 executions of the following command :
hduser@ubuntu1:~$ sudo hdparm -t /dev/sda4 /dev/sda4: Timing buffered disk reads: 292 MB in 3.00 seconds = 97.23 MB/sec
For the “Write” test on Ubuntu I did an average of 6 executions of the following command :
hduser@ubuntu1:~$ dd if=/dev/zero of=/data/output conv=fdatasync bs=512k count=1k; rm -f /data/output 1024+0 records in 1024+0 records out 536870912 bytes (537 MB) copied, 5.89009 s, 91.1 MB/s
For the “Read+Write” test, I just copied a big file from the disk to itself and timed it. Average of 3.
- Well it turned out that the most expensive HDD (MyPassport Ultra) was the slowest for writing
- And that the cheapest (Samsung P3) was the fastest overall !
- I tried the “Read+Write” test on my laptop’s internal 1TB Toshiba HDD and the average speed was 26.9 MB/s. So I guess the external hard drives are as good as an internal one.
- I also tried the “Read+Write” test on my laptop’s internal 64GB Samsung SSD, and the average speed was 100.0 MB/s.
- The reason why “Read+Write” is a lot slower than “Read” and “Write” is because HDDs have very slow seek time, so when reading and writing multiple files in parallel makes it slow down a lot. That is also why the SSD of my laptop is very fast, seek time is incredibly small in SDDs (about 150 times faster).
I bought a D-Link Gigabit 8-Port switch. It costs around 20$ only and it would be a bit stupid to go for the FastEthernet (100Mbps) version and ruin the whole cluster performance just to save 5 bucks ..
There are also 10-Gigabit switches on the market, but they are way too expensive. Around 800$ for an 8 ports and 1400$ for a 16 ports.
I used Category 6 cables to get the maximum possible out of the theoretical 1 Gigabit throughput. Category 5e are also supposed to be good enough on short distances, but for less than 2$ a piece I went for the Cat. 6.
The max speed between computers using this setup should be 1 Gbit/s = 125 MB/s. I have sent big files by FTP from my laptop’s SSD to Minnowboards’ External HDDs at a rates of around 90 MB/s, which is close to the external HDDs max write speed. So the transfer is probably bottle-necked by the write speed of the HDD. So I consider this network setting to be sufficient.
Rack and Physical layout
As you can see in the picture below, I decided to crudely attach my boards inside a shoebox. The shoebox has been cut open on most sides, and mounted on a specialized desktop stand, which has integrated ventilation (notice the gray holed plate below the boards). The air is comes in from below and is ventilated up through the MinnowBoards. Ventilation is important, as I have noticed some boards unexpectedly shutting down when I was using the first version on my shoebox without ventilation and without side and bottom openings.
As you can see, the “Top-of-Rack” switch is the white box on the top right corner. Each board has an external HDD connected to the USB 3.0 port on the front side.
I have decided to use Ubuntu Server 14.04 because … it’s Ubuntu ! And I don’t need a GUI. This also saves a bit of memory and CPU (not much, but it might make a small difference on such tiny computers ..)
Well, technical stuff and benchmarks are nice, but how about MONEY ?? It’s what matters in this world isn’t it ? Right.
I didn’t keep all the receipts, and I’m lazy so prices are averaged sometimes, but here is an approximate cost breakdown for such a cluster.
|Ethernet Cable (Cat. 6)||6||2$||12$|
|USB 2.0 Hub||1||10$||10$|
|Micro-HDMI to HDMI cable||1||5$||5$|
Notice that I had to buy power adapters (5V, 2.5A, DC) myself, and they were not very easy to find. I don’t understand why it’s not included with the MinnowBoard …
The USB Hub and HDMI cable are necessary if you don’t have them (explained in the next part).
For the sake of simplicity, this total is theoretical for someone living in the US or in a country where they sell MinnowBoards locally, because I live in Malaysia and ordered the boards from the US, which cost me a lot for international shipment.
So the total value of the cluster is around 1000 US$ ! That’s quite expensive. But can the price be justified ? Does it work well ? Read the next pages to find out !
First, let’s set up the cluster.