Apache Spark Working with Key Value Pairs in RDD

Why Key/Value Pairs

Key/Value pairs are a very common data type required in order to do most operations in Spark. Most commonly Key/Value RDDs are used to perform aggregations, there is an example in this article: Apache Spark Resilient Distributed Dataset (RDD) Programming Action Operations. In addition working with Key/Value pairs in RDDs provides us with new operations which makes things a lot easier.

Spark gives Key/Value pair RDDs a special name, pair RDDs, and these RDDs get special operations. These operations enables you to act on each key in parallel or regroup data. Examples of these special operations includes the reduceByKey() method and the join() method.

Checkout the articles below to learn about different aspects on Key/Value Pair RDDs:

Subscribe to our mailing list

* indicates required