Deploying a Hadoop/Spark Cluster

Overview

In this tutorial, you will be able to deploy a Hadoop/Spark Cluster on your Nectar Cloud instances. This can enable you to conduct faster large scale data processing/analysis.

ElastiCluster will be used to deploy Hadoop/Spark. ElastiCluster is an open-source tool to create and manage compute clusters on cloud infrastructures. The project was originally created by the Grid Computing Competence Center from the University of Zurich. However, the Nectar cloud team maintains a fork of ElastiCluster which has been tailored to work optimally on the Nectar Research Cloud.

What you’ll learn

  • How to install Python Development tools, virtual environment and ElastiCluster on their cloud instance
  • How to configure ElastiCluster
  • How to use elasticluster to deploy Hadoop/Spark
  • How to list information about the cluster, grow the cluster and terminate the cluster.

What you’ll need

  • A running Nectar Instance
  • Terminal Software