# 항공 출발 지연 데이터 분석


# 출발 지연 분석 맵리듀스 입출력 데이터 타입

클래스 입출력 구분
 Mapper  입력  오프셋  항공 운항 통계 데이터
 출력  운항연도, 운항월  출발 지연 건수
 Reducer  입력  운항연도, 운항월  출발 지연 건수
 출력  운항연도, 운항월  출발 지연 건수 합계

# CLASS_PATH 설정 확인
[hadoop@master ~]$ vi ~/.bash_profile
HADOOP_CORE=/home/hadoop/hadoop-1.1.2/hadoop-core-1.1.2.jar
export HADOOP_CORE 
 
export CLASS_PATH=.:$HADOOP_CORE

[hadoop@master ~]$ echo $CLASS_PATH
.:/home/hadoop/hadoop-1.1.2/hadoop-core-1.1.2.jar

# MapReduce & Shuffle 설정 확인
http://develop.sunshiny.co.kr/897

# 데이터 분석 자료 준비
http://develop.sunshiny.co.kr/889

# 항공 출발지연 데이터 분석 java(MapReduce & DepartureDelayCount) 파일 다운로드


# java 컴파일

[hadoop@master ~]$ javac -cp $CLASS_PATH -d supp /home/hadoop/supp/*.java
[hadoop@master ~]$ ls -R supp/
supp/:
DelayCountReducer.java  DepartureDelayCount.java  DepartureDelayCountMapper.java  hadoop

supp/hadoop:
chapter05

supp/hadoop/chapter05:
DelayCountReducer.class  DepartureDelayCount.class  DepartureDelayCountMapper.class

# jar 파일 생성
[hadoop@master ~]$ jar -cvf DepartureDelayCount.jar -C supp/ .
Manifest를 추가함
추가하는 중: DepartureDelayCount.java(입력 = 1589) (출력 = 611)(61%를 감소함)
추가하는 중: DepartureDelayCountMapper.java(입력 = 1138) (출력 = 578)(49%를 감소함)
추가하는 중: DelayCountReducer.java(입력 = 588) (출력 = 295)(49%를 감소함)
추가하는 중: hadoop/(입력 = 0) (출력 = 0)(0%를 저장함)
추가하는 중: hadoop/chapter05/(입력 = 0) (출력 = 0)(0%를 저장함)
추가하는 중: hadoop/chapter05/DepartureDelayCount.class(입력 = 1769) (출력 = 889)(49%를 감소함)
추가하는 중: hadoop/chapter05/DelayCountReducer.class(입력 = 1715) (출력 = 727)(57%를 감소함)
추가하는 중: hadoop/chapter05/DepartureDelayCountMapper.class(입력 = 2281) (출력 = 996)(56%를 감소함)
[hadoop@master ~]$ ls
DepartureDelayCount.jar

# 분석할 데이터 조회
[hadoop@master bin]$ ./hadoop fs -ls data
Found 22 items
-rw-r--r--   1 hadoop supergroup  127162942 2013-05-15 10:38 /user/hadoop/data/1987.csv
-rw-r--r--   1 hadoop supergroup  501039472 2013-05-15 10:39 /user/hadoop/data/1988.csv
-rw-r--r--   1 hadoop supergroup  486518821 2013-05-15 10:39 /user/hadoop/data/1989.csv
-rw-r--r--   1 hadoop supergroup  509194687 2013-05-15 10:40 /user/hadoop/data/1990.csv
-rw-r--r--   1 hadoop supergroup  491210093 2013-05-15 10:41 /user/hadoop/data/1991.csv
-rw-r--r--   1 hadoop supergroup  492313731 2013-05-15 10:42 /user/hadoop/data/1992.csv
-rw-r--r--   1 hadoop supergroup  490753652 2013-05-15 10:42 /user/hadoop/data/1993.csv
-rw-r--r--   1 hadoop supergroup  501558665 2013-05-15 10:43 /user/hadoop/data/1994.csv
-rw-r--r--   1 hadoop supergroup  530751568 2013-05-15 10:44 /user/hadoop/data/1995.csv
-rw-r--r--   1 hadoop supergroup  533922363 2013-05-15 10:44 /user/hadoop/data/1996.csv
-rw-r--r--   1 hadoop supergroup  540347861 2013-05-15 10:45 /user/hadoop/data/1997.csv
-rw-r--r--   1 hadoop supergroup  538432875 2013-05-15 10:46 /user/hadoop/data/1998.csv
-rw-r--r--   1 hadoop supergroup  552926022 2013-05-15 10:47 /user/hadoop/data/1999.csv
-rw-r--r--   1 hadoop supergroup  570151613 2013-05-15 10:48 /user/hadoop/data/2000.csv
-rw-r--r--   1 hadoop supergroup  600411462 2013-05-15 10:48 /user/hadoop/data/2001.csv
-rw-r--r--   1 hadoop supergroup  530507013 2013-05-15 10:49 /user/hadoop/data/2002.csv
-rw-r--r--   1 hadoop supergroup  626745242 2013-05-15 10:50 /user/hadoop/data/2003.csv
-rw-r--r--   1 hadoop supergroup  669879113 2013-05-15 10:51 /user/hadoop/data/2004.csv
-rw-r--r--   1 hadoop supergroup  671027265 2013-05-15 10:52 /user/hadoop/data/2005.csv
-rw-r--r--   1 hadoop supergroup  672068096 2013-05-15 10:53 /user/hadoop/data/2006.csv
-rw-r--r--   1 hadoop supergroup  702878193 2013-05-15 10:54 /user/hadoop/data/2007.csv
-rw-r--r--   1 hadoop supergroup  689413344 2013-05-15 10:55 /user/hadoop/data/2008.csv

# 맵리듀스 잡 실행
[hadoop@master bin]$ ./hadoop jar [jar 파일] [실행할 패키지경로.클래스] [분석 데이터 디렉토리] [출력할 디렉토리]

[hadoop@master bin]$ ./hadoop jar /home/hadoop/DepartureDelayCount.jar hadoop.chapter05.DepartureDelayCount data dep_delay_count
13/05/15 11:03:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/05/15 11:03:32 INFO input.FileInputFormat: Total input paths to process : 22
13/05/15 11:03:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/05/15 11:03:32 WARN snappy.LoadSnappy: Snappy native library not loaded
13/05/15 11:03:32 INFO mapred.JobClient: Running job: job_201305151032_0001
13/05/15 11:03:33 INFO mapred.JobClient:  map 0% reduce 0%
13/05/15 11:03:40 INFO mapred.JobClient:  map 1% reduce 0%
13/05/15 11:03:44 INFO mapred.JobClient:  map 2% reduce 0%
13/05/15 11:03:47 INFO mapred.JobClient:  map 3% reduce 0%
13/05/15 11:03:51 INFO mapred.JobClient:  map 4% reduce 0%
13/05/15 11:03:54 INFO mapred.JobClient:  map 5% reduce 0%
13/05/15 11:03:58 INFO mapred.JobClient:  map 6% reduce 0%
13/05/15 11:04:02 INFO mapred.JobClient:  map 7% reduce 2%
13/05/15 11:04:06 INFO mapred.JobClient:  map 8% reduce 2%
13/05/15 11:04:09 INFO mapred.JobClient:  map 9% reduce 2%
13/05/15 11:04:13 INFO mapred.JobClient:  map 10% reduce 2%
...
13/05/15 11:09:00 INFO mapred.JobClient:  map 100% reduce 31%
13/05/15 11:09:01 INFO mapred.JobClient:  map 100% reduce 33%
13/05/15 11:09:13 INFO mapred.JobClient:  map 100% reduce 71%
13/05/15 11:09:16 INFO mapred.JobClient:  map 100% reduce 80%
13/05/15 11:09:19 INFO mapred.JobClient:  map 100% reduce 88%
13/05/15 11:09:22 INFO mapred.JobClient:  map 100% reduce 98%
13/05/15 11:09:23 INFO mapred.JobClient:  map 100% reduce 100%
13/05/15 11:09:23 INFO mapred.JobClient: Job complete: job_201305151032_0001
13/05/15 11:09:23 INFO mapred.JobClient: Counters: 29
13/05/15 11:09:23 INFO mapred.JobClient:   Job Counters
13/05/15 11:09:23 INFO mapred.JobClient:     Launched reduce tasks=1
13/05/15 11:09:23 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=620044
13/05/15 11:09:23 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/05/15 11:09:23 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/05/15 11:09:23 INFO mapred.JobClient:     Launched map tasks=187
13/05/15 11:09:23 INFO mapred.JobClient:     Data-local map tasks=187
13/05/15 11:09:23 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=328227
13/05/15 11:09:23 INFO mapred.JobClient:   File Output Format Counters
13/05/15 11:09:23 INFO mapred.JobClient:     Bytes Written=3635
13/05/15 11:09:23 INFO mapred.JobClient:   FileSystemCounters
13/05/15 11:09:23 INFO mapred.JobClient:     FILE_BYTES_READ=663185664
13/05/15 11:09:23 INFO mapred.JobClient:     HDFS_BYTES_READ=12029911999
13/05/15 11:09:23 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1335865656
13/05/15 11:09:23 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3635
13/05/15 11:09:23 INFO mapred.JobClient:   File Input Format Counters
13/05/15 11:09:23 INFO mapred.JobClient:     Bytes Read=12029889933
13/05/15 11:09:23 INFO mapred.JobClient:   Map-Reduce Framework
13/05/15 11:09:23 INFO mapred.JobClient:     Map output materialized bytes=663186768
13/05/15 11:09:23 INFO mapred.JobClient:     Map input records=123534991
13/05/15 11:09:23 INFO mapred.JobClient:     Reduce shuffle bytes=663186768
13/05/15 11:09:23 INFO mapred.JobClient:     Spilled Records=100036658
13/05/15 11:09:23 INFO mapred.JobClient:     Map output bytes=563148988
13/05/15 11:09:23 INFO mapred.JobClient:     Total committed heap usage (bytes)=84114866176
13/05/15 11:09:23 INFO mapred.JobClient:     CPU time spent (ms)=637390
13/05/15 11:09:23 INFO mapred.JobClient:     Combine input records=0
13/05/15 11:09:23 INFO mapred.JobClient:     SPLIT_RAW_BYTES=22066
13/05/15 11:09:23 INFO mapred.JobClient:     Reduce input records=50018329
13/05/15 11:09:23 INFO mapred.JobClient:     Reduce input groups=255
13/05/15 11:09:23 INFO mapred.JobClient:     Combine output records=0
13/05/15 11:09:23 INFO mapred.JobClient:     Physical memory (bytes) snapshot=79727542272
13/05/15 11:09:23 INFO mapred.JobClient:     Reduce output records=255
13/05/15 11:09:23 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=139442335744
13/05/15 11:09:23 INFO mapred.JobClient:     Map output records=50018329

# 출력 파일 디렉토리 조회
[hadoop@master bin]$ ./hadoop fs -ls dep_delay_count
Found 3 items
-rw-r--r--   1 hadoop supergroup          0 2013-05-15 11:09 /user/hadoop/dep_delay_count/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2013-05-15 11:03 /user/hadoop/dep_delay_count/_logs
-rw-r--r--   1 hadoop supergroup       3635 2013-05-15 11:09 /user/hadoop/dep_delay_count/part-r-00000


# 출력 파일 조회
[hadoop@master bin]$ ./hadoop fs -cat dep_delay_count/part-r-00000 | head -10
1987,10 175568
1987,11 177218
1987,12 218858
1988,1  198610
1988,10 162211
1988,11 175123
1988,12 189137
1988,2  177939
1988,3  187141
1988,4  159216

[hadoop@master bin]$ ./hadoop fs -cat dep_delay_count/part-r-00000 | tail -10
2008,11 157278
2008,12 263949
2008,2  252765
2008,3  271969
2008,4  220864
2008,5  220614
2008,6  271014
2008,7  253632
2008,8  231349
2008,9  147061


참고 : 시작하세요! 하둡 프로그래밍

※ 위 내용은, 여러 자료를 참고하거나 제가 주관적으로 정리한 것입니다.
   잘못된 정보나 보완이 필요한 부분을, 댓글 또는 메일로 보내주시면 많은 도움이 되겠습니다.
05 15, 2013 15:16 05 15, 2013 15:16


Trackback URL : http://develop.sunshiny.co.kr/trackback/898

Leave a comment

« Previous : 1 : ... 118 : 119 : 120 : 121 : 122 : 123 : 124 : 125 : 126 : ... 648 : Next »

Recent Posts

  1. HDFS - Python Encoding 오류 처리
  2. HP - Vertica ROS Container 관련 오류...
  3. HDFS - Hive 실행시 System Time 오류
  4. HP - Vertica 사용자 쿼리 이력 테이블...
  5. Client에서 HDFS 환경의 데이터 처리시...

Recent Comments

  1. 안녕하세요^^ 배그핵
  2. 안녕하세요^^ 도움이 되셨다니, 저... sunshiny
  3. 정말 큰 도움이 되었습니다.. 감사합... 사랑은
  4. 네, 안녕하세요. 댓글 남겨 주셔서... sunshiny
  5. 감사합니다 많은 도움 되었습니다!ㅎㅎ 프리시퀸스

Recent Trackbacks

  1. church building construction church building construction %M
  2. wireless clocks transmitter wireless clocks transmitter %M
  3. how to build a metal building how to build a metal building %M
  4. builder builder %M
  5. social media management company social media management company %M

Calendar

«   12 2019   »
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Bookmarks

  1. 위키피디아
  2. MysqlKorea
  3. 오라클 클럽
  4. API - Java
  5. Apache Hadoop API
  6. Apache Software Foundation
  7. HDFS 생태계 솔루션
  8. DNSBL - Spam Database Lookup
  9. Ready System
  10. Solaris Freeware
  11. Linux-Site
  12. 윈디하나의 솔라나라

Site Stats

TOTAL 2781546 HIT
TODAY 1129 HIT
YESTERDAY 1360 HIT