# HA 모드 백업 복구와 보조 네임노드를 이용한 복구


네임노드, 보조 네임노드, NFS 설정 정보 : http://develop.sunshiny.co.kr/894

# 복구 작업 확인 순서

1) 네임노드에서 실행(Master.NameNode) : 백업파일로 복구하여 네임노드 서버 그대로 사용
   - HA 모드로 네임노드를 [백업디렉토리 또는 NFS 디렉토리] 구조로 백업 구성이 되었을 경우, 이 설정을 우선시 확인함.
   - dfs.name.dir 에 HA 구성 정보가 있는데, 보조 네임노드 복구(-importCheckpoint)를 시도하면 /data/backup 백업 데이타가 있다는 메시지를 보여줌.
   - 네임노드의 파일만 복구하여 사용하고자 할경우 보조 네임노드의 fs.checkpoint.dir 에 설정된 checkpoint 디렉토리만 복사하여서 HA 모드처럼 진행 가능.

2) 보조 네임노드에서 실행(Secondary.NameNode) : 보조 네임노드 서버를 네임노드 서버로 변경
   - HA 모드로 백업 구성이 되어 있지 않을 경우 보조 네임노드를 이용한 복구
   - HA 구성되었을때 보조 네임노드 복구(-importCheckpoint) 를 진행할 시에 아래와 같이 HA 구성정보가 있다는 메시지 출력
   메시지 : Cannot import image from a checkpoint.  NameNode already contains an image in /data/backup
   - hdfs-site.xml 의 dfs.name.dir 속성에 정의된 [백업디렉토리 또는 NFS 디렉토리] 설정을 제거후 진행

    <property>
        <name>dfs.name.dir</name>
        <value>/data/name,/data/backup</value>
    </property>

1) [백업디렉토리 또는 NFS 디렉토리] 구조 복구

[hadoop@master bin]$ ./stop-all.sh
stopping jobtracker
secondary.namenode: stopping tasktracker
datanode01: stopping tasktracker
stopping namenode
secondary.namenode: stopping datanode
datanode01: stopping datanode
secondary.namenode: stopping secondarynamenode
[hadoop@master bin]$

// 에러를 내기 위해 dfs.name.dir 에 정의된 디렉토리 삭제 또는 이름 변경
[hadoop@master bin]$ mv /data/name /data/name_ORG

// /data/name 디렉토리 없이, 네임노드 실행
[hadoop@master bin]$ ./hadoop namenode
13/05/14 11:15:38 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master.namenode/192.168.1.17
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.1.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
13/05/14 11:15:38 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
13/05/14 11:15:38 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
13/05/14 11:15:38 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
13/05/14 11:15:38 INFO impl.MetricsSystemImpl: NameNode metrics system started
13/05/14 11:15:38 INFO impl.MetricsSourceAdapter: MBean for source ugi registered.
13/05/14 11:15:38 INFO impl.MetricsSourceAdapter: MBean for source jvm registered.
13/05/14 11:15:38 INFO impl.MetricsSourceAdapter: MBean for source NameNode registered.
13/05/14 11:15:38 INFO util.GSet: VM type       = 64-bit
13/05/14 11:15:38 INFO util.GSet: 2% max memory = 17.77875 MB
13/05/14 11:15:38 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/05/14 11:15:38 INFO util.GSet: recommended=2097152, actual=2097152
13/05/14 11:15:38 INFO namenode.FSNamesystem: fsOwner=hadoop
13/05/14 11:15:38 INFO namenode.FSNamesystem: supergroup=supergroup
13/05/14 11:15:38 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/05/14 11:15:38 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/05/14 11:15:38 WARN namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync.
13/05/14 11:15:38 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/05/14 11:15:39 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
13/05/14 11:15:39 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/05/14 11:15:39 INFO common.Storage: Storage directory /data/name does not exist.
13/05/14 11:15:39 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/name is in an inconsistent state: storage directory does not exist or is not accessible.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
13/05/14 11:15:39 ERROR namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/name is in an inconsistent state: storage directory does not exist or is not accessible.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)

13/05/14 11:15:39 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master.namenode/192.168.1.17
************************************************************/

// 복구를 위한 dfs.name.dir 에 정의된, 빈 디렉토리 생성
[hadoop@master bin]$ mkdir /data/name

# [백업디렉토리 또는 NFS 디렉토리] 를 이용한 복구
아래의 /data/backup 와 같이 NFS 형식의 백업파일 일경우도 복사를 진행

// 백업파일 복사
[hadoop@master data]$ cp -r /data/backup /data/name
[hadoop@master data]$ ls -alR /data/name
/data/name:
합계 20
drwxrwxr-x. 5 hadoop hadoop 4096 2013-05-14 11:29 .
drwxr-xr-x. 8 hadoop hadoop 4096 2013-05-14 11:29 ..
drwxrwxr-x. 2 hadoop hadoop 4096 2013-05-14 11:29 current
drwxrwxr-x. 2 hadoop hadoop 4096 2013-05-14 11:29 image
drwxrwxr-x. 2 hadoop hadoop 4096 2013-05-14 11:29 previous.checkpoint

/data/name/current:
합계 44
drwxrwxr-x. 2 hadoop hadoop  4096 2013-05-14 11:29 .
drwxrwxr-x. 5 hadoop hadoop  4096 2013-05-14 11:29 ..
-rw-rw-r--. 1 hadoop hadoop   100 2013-05-14 11:29 VERSION
-rw-rw-r--. 1 hadoop hadoop     4 2013-05-14 11:29 edits
-rw-rw-r--. 1 hadoop hadoop 22245 2013-05-14 11:29 fsimage
-rw-rw-r--. 1 hadoop hadoop     8 2013-05-14 11:29 fstime

/data/name/image:
합계 12
drwxrwxr-x. 2 hadoop hadoop 4096 2013-05-14 11:29 .
drwxrwxr-x. 5 hadoop hadoop 4096 2013-05-14 11:29 ..
-rw-rw-r--. 1 hadoop hadoop  157 2013-05-14 11:29 fsimage

/data/name/previous.checkpoint:
합계 44
drwxrwxr-x. 2 hadoop hadoop  4096 2013-05-14 11:29 .
drwxrwxr-x. 5 hadoop hadoop  4096 2013-05-14 11:29 ..
-rw-rw-r--. 1 hadoop hadoop   100 2013-05-14 11:29 VERSION
-rw-rw-r--. 1 hadoop hadoop     4 2013-05-14 11:29 edits
-rw-rw-r--. 1 hadoop hadoop 21597 2013-05-14 11:29 fsimage
-rw-rw-r--. 1 hadoop hadoop     8 2013-05-14 11:29 fstime


// 네임노드 실행
- hbase 등이 실행중일때 hbase 통신관련 메시지 발생함.
[hadoop@master bin]$ ./hadoop namenode
13/05/14 11:42:30 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master.namenode/192.168.1.17
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.1.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
13/05/14 11:42:31 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
13/05/14 11:42:31 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
13/05/14 11:42:31 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
13/05/14 11:42:31 INFO impl.MetricsSystemImpl: NameNode metrics system started
13/05/14 11:42:31 INFO impl.MetricsSourceAdapter: MBean for source ugi registered.
13/05/14 11:42:31 INFO impl.MetricsSourceAdapter: MBean for source jvm registered.
13/05/14 11:42:31 INFO impl.MetricsSourceAdapter: MBean for source NameNode registered.
13/05/14 11:42:31 INFO util.GSet: VM type       = 64-bit
13/05/14 11:42:31 INFO util.GSet: 2% max memory = 17.77875 MB
13/05/14 11:42:31 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/05/14 11:42:31 INFO util.GSet: recommended=2097152, actual=2097152
13/05/14 11:42:31 INFO namenode.FSNamesystem: fsOwner=hadoop
13/05/14 11:42:31 INFO namenode.FSNamesystem: supergroup=supergroup
13/05/14 11:42:31 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/05/14 11:42:31 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/05/14 11:42:31 WARN namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync.
13/05/14 11:42:31 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/05/14 11:42:31 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
13/05/14 11:42:31 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/05/14 11:42:31 INFO common.Storage: Number of files = 133
13/05/14 11:42:31 INFO common.Storage: Number of files under construction = 4
13/05/14 11:42:31 INFO common.Storage: Image file of size 22245 loaded in 0 seconds.
13/05/14 11:42:31 INFO namenode.FSEditLog: EOF of /data/name/current/edits, reached end of edit log Number of transactions found: 0.  Bytes read: 4
13/05/14 11:42:31 INFO common.Storage: Edits file /data/name/current/edits of size 4 edits # 0 loaded in 0 seconds.
13/05/14 11:42:31 INFO common.Storage: Image file of size 22245 saved in 0 seconds.
13/05/14 11:42:31 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/data/name/current/edits
13/05/14 11:42:31 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/data/name/current/edits
13/05/14 11:42:31 INFO common.Storage: Image file of size 22245 saved in 0 seconds.
13/05/14 11:42:31 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/data/backup/current/edits
13/05/14 11:42:31 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/data/backup/current/edits
13/05/14 11:42:31 INFO namenode.NameCache: initialized with 0 entries 0 lookups
13/05/14 11:42:31 INFO namenode.FSNamesystem: Finished loading FSImage in 387 msecs
13/05/14 11:42:31 INFO namenode.FSNamesystem: dfs.safemode.threshold.pct          = 0.9990000128746033
13/05/14 11:42:31 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
13/05/14 11:42:31 INFO namenode.FSNamesystem: dfs.safemode.extension              = 30000
13/05/14 11:42:31 INFO namenode.FSNamesystem: Number of blocks excluded by safe block count: 2 total blocks: 268 and thus the safe blocks: 266
13/05/14 11:42:31 INFO hdfs.StateChange: STATE* Safe mode ON.
The reported blocks is only 0 but the threshold is 0.9990 and the total blocks 266. Safe mode will be turned off automatically.
13/05/14 11:42:31 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
13/05/14 11:42:31 INFO impl.MetricsSourceAdapter: MBean for source FSNamesystemMetrics registered.
13/05/14 11:42:31 INFO ipc.Server: Starting SocketReader
13/05/14 11:42:31 INFO impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort9000 registered.
13/05/14 11:42:31 INFO impl.MetricsSourceAdapter: MBean for source RpcActivityForPort9000 registered.
13/05/14 11:42:31 INFO namenode.NameNode: Namenode up at: master.namenode/192.168.1.17:9000
13/05/14 11:42:31 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
13/05/14 11:42:31 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
13/05/14 11:42:31 INFO http.HttpServer: dfs.webhdfs.enabled = true
13/05/14 11:42:31 INFO http.HttpServer: Added filter 'SPNEGO' (class=org.apache.hadoop.hdfs.web.AuthFilter)
13/05/14 11:42:31 INFO http.HttpServer: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
13/05/14 11:42:31 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070
13/05/14 11:42:31 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
13/05/14 11:42:31 INFO http.HttpServer: Jetty bound to port 50070
13/05/14 11:42:31 INFO mortbay.log: jetty-6.1.26
13/05/14 11:42:31 WARN server.AuthenticationFilter: 'signature.secret' configuration not set, using a random value as secret
13/05/14 11:42:31 INFO mortbay.log: Started SelectChannelConnector@master.namenode:50070
13/05/14 11:42:31 INFO namenode.NameNode: Web-server up at: master.namenode:50070
13/05/14 11:42:31 INFO ipc.Server: IPC Server Responder: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server listener on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 0 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 1 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 2 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 3 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 4 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 5 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 6 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 7 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 8 on 9000: starting
13/05/14 11:42:31 INFO ipc.Server: IPC Server handler 9 on 9000: starting
^C13/05/14 11:42:32 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master.namenode/192.168.1.17
************************************************************/

// 전체(네임노드, 맵리듀스) 구동(정상 구동 확인)
[hadoop@master bin]$ ./start-all.sh
starting namenode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-namenode-master.namenode.out
secondary.namenode: starting datanode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-datanode-secondary.namenode.out
datanode01: starting datanode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-datanode-datanode01.out
secondary.namenode: starting secondarynamenode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-secondarynamenode-secondary.namenode.out
starting jobtracker, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-jobtracker-master.namenode.out
secondary.namenode: starting tasktracker, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-tasktracker-secondary.namenode.out
datanode01: starting tasktracker, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-tasktracker-datanode01.out


2) 보조 네임노드를 이용한 복구(보조 네임노드 서버에서 실행)

(1) /etc/hosts 파일에서 master.namenode 에 설정된 아이피를 사용할 보조 네임노드 서버 아이피로 변경
[root@secondary ~]# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

# 기존 : 192.168.1.17    master.namenode
192.168.1.19    master.namenode
192.168.1.19    secondary.namenode
192.168.1.45    datanode01

(2) 복구를 위한 dfs.name.dir 에 정의된, 빈 디렉토리 생성
[hadoop@secondary bin]$ mkdir /data/name

(3) -importCheckpoint 옵션을 추가하여 네임노드를 실행
[hadoop@secondary bin]$ ./hadoop namenode -importCheckpoint
13/05/14 14:34:24 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = secondary.namenode/192.168.1.19
STARTUP_MSG:   args = [-importCheckpoint]
STARTUP_MSG:   version = 1.1.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1440782; compiled by 'hortonfo' on Thu Jan 31 02:03:24 UTC 2013
************************************************************/
13/05/14 14:34:24 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
13/05/14 14:34:24 INFO impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
13/05/14 14:34:24 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
13/05/14 14:34:24 INFO impl.MetricsSystemImpl: NameNode metrics system started
13/05/14 14:34:24 INFO impl.MetricsSourceAdapter: MBean for source ugi registered.
13/05/14 14:34:24 INFO impl.MetricsSourceAdapter: MBean for source jvm registered.
13/05/14 14:34:24 INFO impl.MetricsSourceAdapter: MBean for source NameNode registered.
13/05/14 14:34:24 INFO util.GSet: VM type       = 32-bit
13/05/14 14:34:24 INFO util.GSet: 2% max memory = 17.77875 MB
13/05/14 14:34:24 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/05/14 14:34:24 INFO util.GSet: recommended=4194304, actual=4194304
13/05/14 14:34:24 INFO namenode.FSNamesystem: fsOwner=hadoop
13/05/14 14:34:24 INFO namenode.FSNamesystem: supergroup=supergroup
13/05/14 14:34:24 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/05/14 14:34:24 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/05/14 14:34:24 WARN namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync.
13/05/14 14:34:24 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/05/14 14:34:24 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
13/05/14 14:34:24 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/05/14 14:34:24 INFO common.Storage: Storage directory /data/name is not formatted.
13/05/14 14:34:24 INFO common.Storage: Formatting ...
13/05/14 14:34:24 INFO common.Storage: Number of files = 133
13/05/14 14:34:24 INFO common.Storage: Number of files under construction = 4
13/05/14 14:34:24 INFO common.Storage: Image file of size 22245 loaded in 0 seconds.
13/05/14 14:34:24 INFO namenode.FSEditLog: EOF of /data/checkpoint/current/edits, reached end of edit log Number of transactions found: 0.  Bytes read: 4
13/05/14 14:34:24 INFO common.Storage: Edits file /data/checkpoint/current/edits of size 4 edits # 0 loaded in 0 seconds.
13/05/14 14:34:24 INFO common.Storage: Image file of size 22245 saved in 0 seconds.
13/05/14 14:34:24 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/data/name/current/edits
13/05/14 14:34:24 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/data/name/current/edits
13/05/14 14:34:24 INFO namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
13/05/14 14:34:24 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/data/name/current/edits
13/05/14 14:34:24 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/data/name/current/edits
13/05/14 14:34:24 INFO common.Storage: Image file of size 22245 saved in 0 seconds.
13/05/14 14:34:24 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/data/name/current/edits
13/05/14 14:34:24 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/data/name/current/edits
13/05/14 14:34:24 INFO namenode.NameCache: initialized with 0 entries 0 lookups
13/05/14 14:34:24 INFO namenode.FSNamesystem: Finished loading FSImage in 362 msecs
13/05/14 14:34:24 INFO namenode.FSNamesystem: dfs.safemode.threshold.pct          = 0.9990000128746033
13/05/14 14:34:24 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
13/05/14 14:34:24 INFO namenode.FSNamesystem: dfs.safemode.extension              = 30000
13/05/14 14:34:24 INFO namenode.FSNamesystem: Number of blocks excluded by safe block count: 2 total blocks: 268 and thus the safe blocks: 266
13/05/14 14:34:24 INFO hdfs.StateChange: STATE* Safe mode ON.
The reported blocks is only 0 but the threshold is 0.9990 and the total blocks 266. Safe mode will be turned off automatically.
13/05/14 14:34:24 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
13/05/14 14:34:24 INFO impl.MetricsSourceAdapter: MBean for source FSNamesystemMetrics registered.
13/05/14 14:34:24 INFO ipc.Server: Starting SocketReader
13/05/14 14:34:24 INFO impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort9000 registered.
13/05/14 14:34:24 INFO impl.MetricsSourceAdapter: MBean for source RpcActivityForPort9000 registered.
13/05/14 14:34:24 INFO namenode.NameNode: Namenode up at: master.namenode/192.168.1.19:9000
13/05/14 14:34:25 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
13/05/14 14:34:25 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
13/05/14 14:34:25 INFO http.HttpServer: dfs.webhdfs.enabled = true
13/05/14 14:34:25 INFO http.HttpServer: Added filter 'SPNEGO' (class=org.apache.hadoop.hdfs.web.AuthFilter)
13/05/14 14:34:25 INFO http.HttpServer: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
13/05/14 14:34:25 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070
13/05/14 14:34:25 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
13/05/14 14:34:25 INFO http.HttpServer: Jetty bound to port 50070
13/05/14 14:34:25 INFO mortbay.log: jetty-6.1.26
13/05/14 14:34:25 WARN server.AuthenticationFilter: 'signature.secret' configuration not set, using a random value as secret
13/05/14 14:34:25 INFO mortbay.log: Started SelectChannelConnector@master.namenode:50070
13/05/14 14:34:25 INFO namenode.NameNode: Web-server up at: master.namenode:50070
13/05/14 14:34:25 INFO ipc.Server: IPC Server Responder: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server listener on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 0 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 1 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 2 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 3 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 5 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 4 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 6 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 7 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 9 on 9000: starting
13/05/14 14:34:25 INFO ipc.Server: IPC Server handler 8 on 9000: starting

^C13/05/14 14:34:45 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at secondary.namenode/192.168.1.19
************************************************************/

(4) 전체 구동시 데이터노드들에 SSH 인증 키 배포 작업 필요.
http://develop.sunshiny.co.kr/868

// HDFS 전체 구동(정상 구동 확인)
[hadoop@secondary bin]$ ./start-all.sh
starting namenode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-namenode-master.namenode.out
secondary.namenode: starting datanode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-datanode-secondary.namenode.out
datanode01: starting datanode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-datanode-datanode01.out
secondary.namenode: starting secondarynamenode, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-secondarynamenode-secondary.namenode.out
starting jobtracker, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-jobtracker-master.namenode.out
secondary.namenode: starting tasktracker, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-tasktracker-secondary.namenode.out
datanode01: starting tasktracker, logging to /home/hadoop/hadoop-1.1.2/logs/hadoop-hadoop-tasktracker-datanode01.out


# 네임노드 복구 순서

- 위의 HA모드 복구 방식은 백업파일 복구하여 네임노드 서버를 그대로 사용하는 경우를 테스트함.(아래는 네임노드 서버를 교체 하는 경우)

1) 기존 네임노드 서버의 네트워크를 차단
2) 백업 서버(혹은 보조 네임노드 서버)의 IP를 기존 네임노드 서버의 IP로 할당
   데이터노드가 호스트명으로 네임노드에 접근한다면 호스트명도 함께 변경
3) 백업 서버에 네임노드를 설치. 보조 네임노드 서버를 사용할 경우 이 단계는 무시하기 바람
4) dfs.name.dir를 NFS로 마운트돼 있는 디렉토리로 설정.
5) 네임노드 데몬을 실행. 단, 네임노드를 절대로 포맷해서는 안됨.
6) 네임노드가 정상적으로 구동됐는지 확인.


# 데이터노드 복구 순서

1) 해당 데이터노드를 중지
2) 손상된 디스크를 언마운트하고 새로운 디스크를 마운트.
3) 새로운 디스크에 dfs.data.dir 속성으로 설정한 디렉토리를 생성.
4) 데이터노드를 재실행.
5) 네임노드 서버에서 밸런서를 실행.
6) HDFS 웹 UI에서 해당 데이터노드가 조회되는지 확인.



참고 : 시작하세요! 하둡 프로그래밍

※ 위 내용은, 여러 자료를 참고하거나 제가 주관적으로 정리한 것입니다.
   잘못된 정보나 보완이 필요한 부분을, 댓글 또는 메일로 보내주시면 많은 도움이 되겠습니다.
05 14, 2013 15:10 05 14, 2013 15:10


Trackback URL : http://develop.sunshiny.co.kr/trackback/895

Leave a comment

« Previous : 1 : ... 121 : 122 : 123 : 124 : 125 : 126 : 127 : 128 : 129 : ... 648 : Next »

Recent Posts

  1. HDFS - Python Encoding 오류 처리
  2. HP - Vertica ROS Container 관련 오류...
  3. HDFS - Hive 실행시 System Time 오류
  4. HP - Vertica 사용자 쿼리 이력 테이블...
  5. Client에서 HDFS 환경의 데이터 처리시...

Recent Comments

  1. 안녕하세요^^ 배그핵
  2. 안녕하세요^^ 도움이 되셨다니, 저... sunshiny
  3. 정말 큰 도움이 되었습니다.. 감사합... 사랑은
  4. 네, 안녕하세요. 댓글 남겨 주셔서... sunshiny
  5. 감사합니다 많은 도움 되었습니다!ㅎㅎ 프리시퀸스

Recent Trackbacks

  1. Mysql - mysql 설치후 Character set... 멀고 가까움이 다르기 때문 %M

Calendar

«   10 2019   »
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Bookmarks

  1. 위키피디아
  2. MysqlKorea
  3. 오라클 클럽
  4. API - Java
  5. Apache Hadoop API
  6. Apache Software Foundation
  7. HDFS 생태계 솔루션
  8. DNSBL - Spam Database Lookup
  9. Ready System
  10. Solaris Freeware
  11. Linux-Site
  12. 윈디하나의 솔라나라

Site Stats

TOTAL 2724076 HIT
TODAY 542 HIT
YESTERDAY 589 HIT