December 2015 - Nico's Blog

Published December 28, 2015 by Nicolas

Hadoop Basics III: Secondary Sort in MapReduce

In the previous part we needed to sort our results on a single field. In this post we will learn how to sort data on multiple fields by using Secondary Sort.

We will first pose a query to solve, as we did in the last post, which will require sorting the dataset on multiple fields. Then we will study how the MapReduce shuffle phase works, before implementing our Secondary Sort to obtain the results for the given query.

Hadoop Basics

MapReduce Sorting

Published December 24, 2015 by Nicolas

Hadoop Basics II: Filter, Aggregate and Sort with MapReduce

Now that we have a Sequence File containing our newly “structured” data, let’s see how can get the results to a basic query using MapReduce.

We will illustrate how filtering, aggregation and simple sorting can be achieved in MapReduce. For beginners, these are fundamental operations that can help you understand the MapReduce framework. Advanced readers can still read it quickly to get familiar with the dataset and get ready for the next posts which will be about more advanced sorting and joining techniques.

Hadoop Basics

MapReduce Sorting

Published December 18, 2015 by Nicolas

Hadoop Basics I: Working with Sequence Files

In this new series of posts, we will explore basic techniques on how to query structured data. Querying means filtering, projecting, aggregating, sorting and joining data. We will view different methods of querying on different Hadoop frameworks (MapReduce, Hive, Spark, etc …).

This first part will briefly introduce the dataset which will be used throughout this series, and then present the Sequence File data format. We will see how to write and read Sequence Files with a few code snippets, and benchmark the different compression types to choose the best one.

We will format our dataset in a Sequence File, for later use in various frameworks in my following posts.

Hadoop Basics

Benchmark Data Format Hadoop HDFS

Month: <span>December 2015</span>

Hadoop Basics III: Secondary Sort in MapReduce

Hadoop Basics II: Filter, Aggregate and Sort with MapReduce

Hadoop Basics I: Working with Sequence Files