The Sahara project (ex. Savanna), integrated project in Juno under the OpenStack Data Processing program, provides users an ability to provision and manage Hadoop clusters on OpenStack, and has seen a great deal of progress, development, and changes during the Icehouse development cycle. The focus of the project is on two primary use cases: on-demand cluster provisioning and on-demand Hadoop tasks execution (Elastic Data Processing).
This presentation takes an in-depth look at Savanna’s EDP facilities. Since Savanna’s initial release, this key feature has been hardened and expanded to support streaming MapReduce and Java workflows, operation over private Neutron networks and execution on transient clusters. We’ll start with a description of EDP’s general concepts and a definition of terms, then its current status in Savanna, supported Data Sources, Job Types, data locality and the roadmap for the Juno release cycle.
Lastly, we’ll show a live demo of EDP to bring all of these concepts together. The demo will cover job and data source definition, job execution and collection of job results.