뉴스레터

이메일로 Hortonworks의 새 업데이트를 받으세요.

한 달에 한 번 빅 데이터와 관련한 최신 인사이트, 동향, 분석 정보, 지식을 받아 보세요.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

한 달에 한 번 빅 데이터와 관련한 최신 인사이트, 동향, 분석 정보, 지식을 받아 보세요.

CTA

시작하기

클라우드

시작할 준비가 되셨습니까?

Sandbox 다운로드

어떤 도움이 필요하십니까?

* 저는 언제든지 구독을 해지할 수 있다는 점을 이해합니다. 또한 저는 Hortonworks이 개인정보 보호정책에 추가된 정보를 확인하였습니다.
닫기닫기 버튼
CDA > Data Engineers & Scientists > Data Science Applications

Building a Server Log Analysis Application

클라우드 시작할 준비가 되셨습니까?

SANDBOX 다운로드

소개

Security Breaches are common problem for businesses with the question of when it
will happen? One of the first lines of defense for detecting potential
vulnerabilities in the network is to investigate the logs from your server. You
have been brought on to apply your skills in Data Engineering and Data Analysis
to acquire server log data, preprocess the data and store it into reliable
distributed storage HDFS using the dataflow management framework Apache NiFi.
You will need to further clean and refine the data using Apache Spark for
specific insights into what activities are happening on your server, such as
most frequent hosts hitting the server and which country or city causes the
most network traffic with your server. You will then visualize these events
using the data science notebook Apache Zeppelin to be able to tell a story
to about the activities occurring on the server and if there is anything your
team should be cautious about.

Big Data Technologies used to develop the Application:

Goals and Objectives

  • Learn about server log data, log data analysis, how it works, the various use cases
    and best practices
  • Learn to build a NiFi dataflow to acquire server log data
  • Learn to clean the data for filtering down to messages that can tell users
    about the activities happening on their servers using Spark
  • Learn to visualization your finding after cleaning the data using Zeppelin
    visualization

필수 전제 조건

개요

The tutorial series consists of the following tutorial modules:

1. Application Development Concepts: Covers what is server log data, log data analysis, how log data analysis works, various use cases and some best practices that can be used in server log analysis.

2. Setting up the Development Environment: You will perform any configurations on software services and/or install dependencies for software services that are needed to develop the application.

3. Acquiring NASA Server Log Data: You will learn to build a NiFi dataflow that acquires 2 months worth of NASA log data, preprocesses the data and stores it into HDFS

4. Cleaning the Raw NASA Log Data: You will learn to create a Zeppelin Notebook for cleaning the NASA log data and use Zeppelin’s Spark Interpreter to clean the data and gather any valuable insight about the activities going on with the server.

5. Visualizing NASA Log Data: You will create another Zeppelin Notebook whose purpose will be to visualize the key points you found when cleaning the data with Spark. Your data visualization will illustrate from the NASA log data, the Most Frequent Hosts – count per IP address of hosts hitting the server, Response Codes – count per response code in association with the server, Type of Extensions – count of the type of file formats being transferred between devices interacting with the server, Network Traffic per Location – location on where the server hits are coming from.

사용자 리뷰

사용자 등급
0 No Reviews
5 Star 0%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
튜토리얼 이름
Building a Server Log Analysis Application

질문을 하거나 답변을 찾으시려면, Hortonworks Community Connection을 방문하시기 바랍니다.

No Reviews
리뷰 작성

등록

리뷰를 작성하려면 등록해주세요

나의 경험 공유하기

예: 내가 본 최고의 튜토리얼

이 필드에는 최소 50글자를 입력해야 합니다.

성공

리뷰를 공유해 주셔서 감사합니다!