There are three common abilities across the cloud providers that I want to focus on and to see how they work together and build on each other to help you maximize agility and data insights in the cloud. They are: cloud storage, running workloads on demand, and elastic resource management. In addition, we’ll talk about how you can pull this all together with Hortonworks Cloudbreak on a path towards big data insights in the cloud.
Let’s start with cloud storage. Cloud storage is key and lays the foundation to take full advantage of the other abilities we’ll talk about. Simply put, cloud storage is elastic and HDFS is not. This is critical when capacity planning for a shared data environment can be tough to get right and you commonly end up with costly ad-hoc provisioning of unplanned resources or suffer from low resource utilization due to costly up-front provisioning. Cloud storage’s pay-as-you-go model allows you to effectively manage cost as your storage needs grow. All the while the cloud storage provider is provisioning resources under the hood and transparently to you.
However, a big benefit with cloud storage is we can now separate storage from compute, and as a result, we can now launch use case specific workloads on demand in a shared data environment. For example, this separation of compute and storage allows for different Apache Spark applications such as a data engineering ETL job and an ad-hoc data science model training cluster to run on their own clusters, preventing concurrency issues that affect multi-user fixed-sized Apache Hadoop clusters. This separation and the flexible accommodation of running disparate workloads on demand not only lowers cost but also improves the user experience.
Now that you have separated storage from compute and disparate workloads to run on demand, you can truly take advantage of the elasticity that the cloud provides on a level of granularity that makes business and technical sense. For example if you have a cluster experiencing YARN memory saturation or a need to increase data read throughput, you can simply scale up the existing cluster or launch a new, larger cluster for a smaller period of time to handle the increased workload and meet business demand.
How do we tie all this together and operationalize our big data environment in the cloud? That’s where Cloudbreak comes in. Cloudbreak simplifies the deployment of big data workloads with cloud storage on cloud providers such as Amazon Web Services, Microsoft Azure, Google Cloud Platform, and OpenStack. It is easy to get started with Cloudbreak and you can use the wizard interface to deploy your first Hadoop cluster in 6 easy steps using one of our prebuilt Apache Ambari blueprints for data science, EDW, or ETL style workloads. When you are ready to take things to the next level, Cloudbreak is full of enterprise features including:
To learn more about Cloudbreak and and see a live demonstration, please join us for an upcoming webinar on December 5th. Details can be found here.