뉴스레터

이메일로 Hortonworks의 새 업데이트를 받으세요.

한 달에 한 번 빅 데이터와 관련한 최신 인사이트, 동향, 분석 정보, 지식을 받아 보세요.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

한 달에 한 번 빅 데이터와 관련한 최신 인사이트, 동향, 분석 정보, 지식을 받아 보세요.

CTA

시작하기

클라우드

시작할 준비가 되셨습니까?

Sandbox 다운로드

어떤 도움이 필요하십니까?

* 저는 언제든지 구독을 해지할 수 있다는 점을 이해합니다. 또한 저는 Hortonworks이 개인정보 보호정책에 추가된 정보를 확인하였습니다.
닫기닫기 버튼
HDP > Hadoop 관리 > Hortonworks Sandbox

Sandbox Architecture

클라우드 시작할 준비가 되셨습니까?

SANDBOX 다운로드

소개

This tutorial will explain the current Hortonworks Sandbox architecture, starting in HDP 2.6.5 a new Sandbox structure is introduced making it possible to instantiate two single node clusters (i.e. HDP and HDF) within a single Sandbox with the purpose of combining the best features of the Data-At-Rest and Data-In-Motion methodologies in a single environment. Have a look at the graphical representation of the Sandbox below, it shows where the Sandbox exists in relation to the outside world, the instance depicted is of the Connected Data Architecture (CDA) if you are not yet familiarized with the concept of CDA do not worry, we will review it at a later section.

cda-architecture

At a high level the Sandbox is a Linux (CentOS 7) Virtual Machine leveraging docker to host different Sandbox distributions, namely HDP or HDF. In order to orchestrate communication between the outside world and the Sandbox a reverse proxy server NGINX is containerized and configured to only open the ports needed to the outside enabling us to granularly interact with each container.

필수 전제 조건

개요

Docker Architecture

cda-architecture

In the Docker architecture above, Docker registry are services used for storing Docker images, such as Docker Hub. Docker Host is the computer Docker runs on. Diving deeper into the host, you can see the Docker Daemon, which is used to create and manage Docker objects, such as images, containers, networks and volumes. The user or client is able to interact with Docker daemon via Client Docker’s Command Line Interface (CLI). The Docker daemon is a long-running program also known as a server, and the CLI utilizes Docker’s REST API to interact with the Docker daemon. As you can observe, the Docker Engine is a client-server application comprised of Client Docker CLI, REST API and Docker daemon.

Sandbox Proxy

On this new architecture NGINX is used as a reverse proxy server; traditionally, a proxy server is used as an intermediary which forwards traffic across multiple clients in the internet. In contrast, a reverse proxy server resides behind a firewall and directs incoming requests to specific back-end servers, in our case these severs are the HDP and HDF containers.

Why a Reverse Proxy Server is Needed

One of the biggest obstacles to overcome with this architecture is keeping ports consistent and reduce conflicts as much as possible between containers; for example, we wanted to keep Ambari UI as port 8080 across any Sandbox. The best solution is to keep the default ports as they are but distinguish the back-end server by domain name, this is why in this build we must change the host’s name from:

sandbox.hortonworks.com:<PORT>

to

sandbox-hdp.hortonworks.com:<PORT>

and

sandbox-hdf.hortonworks.com:<PORT>

This allows us to maintain consistency across different Sandboxes and avoid conflicts, so when CDA is deployed we may reach Ambari UI at:

sandbox-hdp.hortonworks.com:8080

and

sandbox-hdf.hortonworks.com:8080

In this example Ambari UI is reachable for different Sandboxes at the same time by specifying the domain name of the Sandbox we are trying to reach:

both-running

Cool stuff right? Now let’s take a look at where out containers are in relation to our virtual environment.

View Running Containers

If you would like to visualize the running Sandbox container and proxy you you must log on to the host, you may chose to follow along; however, this is not necessary.

Native Docker Sandbox

The Sandbox may also run using Docker which is native to the host operating system; for example, rather than running a VM to instantiate the containers you may directly interact with the docker daemon. In the Docker architecture for the Sandbox you directly interact with Docker environment as your native operating system is the host for the Sandboxes.

If you are using VMWare or VirtualBox you may log on to both the Sandbox or the host, here is a complete list of the TCP open ports for SSH services:

Destination TCP Port for SSH
HDP 2201
HDF 2202
VM – VirtualBox 2200
VM – VMWare 22

If you are running the VirtualBox VM:

# SSH on to VirtualBox Virtual Machine
ssh root@sandbox-hdp.hortonworks.com -p 2200

Or if you are using VMWare:

# SSH on to VMWare Virtual Machine
ssh root@sandbox-hdp.hortonworks.com -p 22

Note: The default password is hadoop.

Now that you are in the Virtual Machine hosting the containers we can see what Docker images are ready for deployment:

docker images

docker-images

Furthermore, we can see what containers are currently running by using the following command:

docker ps

If you started out with HDP you will see two containers running, the first is the NGINX proxy container followed by a list of open ports and where they are being forwarded. Since HDP was used as a base in this example we can see that it is listed as a running container.

docker-ps

here is some context on the information displayed:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Container ID given to an instantiated image by docker. The executable package from which your container has been instantiated. Command used to instantiate your container, typically this is the path of an initialization script. How long ago the container was created. A container may be:
UP
UP-PAUSED
RESTARTING
DEAD
CREATED
EXITED.
Open ports. Note that the proxy container also tells us where ports are being forwarded to. This is the container name e.g. “sandbox-hdp” & “sandbox-proxy”

When CDA has been deployed both HDP and HDF are displayed as running containers:

cda-dockerps

HDP vs HDF

hdp-stand-alone

HDF (Data-In-Motion)

data-in-motion은 모든 종류의 다양한 장치로부터 흐름이나 스트림으로 데이터가 수집된다는 개념입니다. 데이터가 이러한 흐름을 따라 이동하는 동안 NiFi가 '프로세서'라고 부르는 구성요소가 데이터를 수정, 변환, 집계 및 라우팅하고 있습니다. 활성 데이터는 Big Data Application을 구축할 때 사전 처리 단계의 대부분을 담당합니다. 예를 들어, 데이터 처리란 데이터 과학자들이 데이터를 분석 및 시각화하는 데 집중할 수 있도록 데이터 엔지니어들이 원시 데이터를 향상된 스키마로 포맷하는 활동을 일컫습니다.

HDP(Data-At-Rest)

Data-At-Rest는 데이터가 이동하지 않으며 Hadoop Distributed File System(HDFS)과 같은 분산 데이터 스토리지에 상주하는 데이터베이스 또는 강력한 데이터 저장소에 저장된다는 개념입니다. 데이터를 쿼리로 전송하는 대신 의미 있는 통찰력을 확보하기 위해 쿼리를 데이터로 전송합니다. 이러한 스테이지 데이터에서 Big Data Application을 구축할 때 데이터 처리와 분석이 이루어집니다.

What is CDA?

Hortonworks Connected Data Architecture (CDA) is composed of both Hortonworks DataFlow (HDF) and Hortonworks DataPlatform (HDP) sandboxes and allows you to play with both data-in-motion and data-at-rest frameworks simultaneously.

HDF_secure_data_collection

Why CDA?

As data is coming in from the edge, it is collected, curated and analyzed in real-time, on premise or in the cloud using the HDF framework. Once the data has been captured you can convert the your Data-In-Motion into Data-At-Rest with the HDP framework to gain further insights.

How CDA is made possible in the sandbox

In order for HDF to send data into HDP, both sandboxes need to be set up to communicate with each other. If you would like to know more about the deployment of CDA check out the Sandbox Deployment and Install Guide under the Advanced Topic section. When CDA is enabled a script internal to the Sandbox takes into account what base you started with and calls on the Docker daemon to instantiate the image of the complementing Sandbox flavour (e.g. HDP installs HDF, and HDF installs HDP).

In the image below we used HDP as our base and launched the initialization script for CDA. As you can see all the needed components for HDF are being loaded into a new container:

pulling-hdf

A custom Docker network was created between the running containers through Docker Engine, this is one of the many advantages of being a container because inside the Docker Engine containers can communicate directly with each other through a Docker network named bridge, thus making it possible for the clusters to communicate.

cda-network

Summary

Congratulations, you have learned a great deal about the structure of our Sandbox, and how HDP and HDF single node clusters are implemented. Additionally, you have learned what CDA is and how it can be used to capture insights from both Data-At-Rest and Data-In-Motion. Additionally, you have learned about the inter-container communication made possible by Docker’s internal network and communication with the outside world done via NGINX. Now that you know the internal workings of CDA on the Sandbox, bring your understanding to practice with these great CDA ready tutorials:

더 읽기

사용자 리뷰

사용자 등급
0 No Reviews
5 Star 0%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
튜토리얼 이름
Sandbox Architecture

질문을 하거나 답변을 찾으시려면, Hortonworks Community Connection을 방문하시기 바랍니다.

No Reviews
리뷰 작성

등록

리뷰를 작성하려면 등록해주세요

나의 경험 공유하기

예: 내가 본 최고의 튜토리얼

이 필드에는 최소 50글자를 입력해야 합니다.

성공

리뷰를 공유해 주셔서 감사합니다!