This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Spark

Overview

Time: min
Objectives
  • Spark introduction

  • RDD

Screen Shot 2022-04-05 at 10 14 13 AM

A general execution engine to improve/replace MapReduce. Spark’s operators are a superset of MapReduce

Limitations of MapReduce.

Screen Shot 2022-04-05 at 10 18 47 AM

Screen Shot 2022-04-05 at 10 19 05 AM

image

RDD — Resilient Distributed Dataset

The features of RDDs (decomposing the name):

image

Lazy execution

image

Key Points