# Oracle JDBC 드라이버 다운로드

> Oracle 버전에 맞는 JDBC 드라이버 다운로드
 - 10g(ojdbc14.jar), 11g(ojdbc6.jar)
   
http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html

> JDBC 드라이버를 $SQOOP_HOME/lib 디렉토리에 복사
[hadoop@master lib]$ pwd
/home/hadoop/sqoop/lib
[hadoop@master lib]$ ls
ojdbc14.jar          ojdbc6.jar

# JDBC 테스트

> 데이터 베이스 사용자 조회
[hadoop@master sqoop]$ sqoop list-databases --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g --username scott -P
Enter password:
13/07/29 14:19:44 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/29 14:19:44 INFO manager.OracleManager: Time zone has been set to GMT
SCOTT
.
.
.

> 데이터 베이스 테이블 조회
sqoop eval \
             --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
             --username scott \
             --password tiger \
             --query "SELECT  t.*  FROM  DEPT  t WHERE 1=0"


# HDFS 에 RDBMS(Oracle) 데이터 Import

> 테이블의 특정 컬럼만 import
sqoop import -D oracle.sessionTimeZone=America/Los_Angeles \
    --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
    --username scott \
    --password tiger \
    --table EMP \ -- DBMS 테이블 명
    --target-dir /user/hive/warehouse/fsc_new/EMP \ -- hdfs상에 저장될 디렉토리
    --columns SEQ,ENAME,DEPTNO \  -- 콤마와 컬럼에 공백 없이
    --split-by SEQ    -- Primary Key

> 테이블 컬럼 전체 import
sqoop import --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
    --username scott \
    --password tiger \
    --query "SELECT t.* FROM EMP t WHERE \$CONDITIONS" \
    --split-by t.SEQ \
    --target-dir /user/hive/warehouse/target_dir


# Hive 에 RDBMS(Oracle) 데이터 Import

sqoop import \
    --connect jdbc:oracle:thin:@//192.168.1.19:1521/ora11g \
    --username SCOTT \   -- 계정명 대문자
    --password tiger \
    --table U102 \  -- DBMS 테이블 명
    --hive-import \  -- Hive Import 옵션
    -m 1  -- Map 병렬 처리 수

# Hive 에 RDBMS(Oracle) 계정의 전체 테이블 Import

sqoop import-all-tables \
    --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
    --username SCOTT \
    --password tiger \
    --hive-import \
    --warehouse-dir /user/hive/warehouse \
    --fields-terminated-by ',' \ -- 컬럼 구분자
    --lines-terminated-by '\n' \ -- 로우 구분자
    --direct \
    -m 1 -- 순차적으로 가져오기 수행(테이블 별 Primary Key 지정이 어려울때)

# Sqoop 실행시 오류 참고


# Sqoop 명령어 정보

[hadoop@master bin]$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

# Sqoop 옵션 정보

Table 1. Common arguments
Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-home <dir> Override $HADOOP_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--connection-param-file <filename> Optional properties file that provides connection parameters

Table 2. Import control arguments:

Argument Description
--append Append data to an existing dataset in HDFS
--as-avrodatafile Imports data to Avro Data Files
--as-sequencefile Imports data to SequenceFiles
--as-textfile Imports data as plain text (default)
--boundary-query <statement> Boundary query to use for creating splits
--columns <col,col,col…> Columns to import from table
--direct Use direct import fast path
--direct-split-size <n> Split the input stream every n bytes when importing in direct mode
--inline-lob-limit <n> Set the maximum size for an inline LOB
-m,--num-mappers <n> Use n map tasks to import in parallel
-e,--query <statement> Import the results of statement.
--split-by <column-name> Column of the table used to split work units
--table <table-name> Table to read
--target-dir <dir> HDFS destination dir
--warehouse-dir <dir> HDFS parent for table destination
--where <where clause> WHERE clause to use during import
-z,--compress Enable compression
--compression-codec <c> Use Hadoop codec (default gzip)
--null-string <null-string> The string to be written for a null value for string columns
--null-non-string <null-string> The string to be written for a null value for non-string columns

Table 3. Parameters for overriding mapping

Argument Description
--map-column-java <mapping> Override mapping from SQL to Java type for configured columns.
--map-column-hive <mapping> Override mapping from SQL to Hive type for configured columns.

Table 4. Incremental import arguments:

Argument Description
--check-column (col) Specifies the column to be examined when determining which rows to import.
--incremental (mode) Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value) Specifies the maximum value of the check column from the previous import.

Table 5. Output line formatting arguments:

Argument Description
--enclosed-by <char> Sets a required field enclosing character
--escaped-by <char> Sets the escape character
--fields-terminated-by <char> Sets the field separator character
--lines-terminated-by <char> Sets the end-of-line character
--mysql-delimiters Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \
optionally-enclosed-by: '
--optionally-enclosed-by <char> Sets a field enclosing character

Table 6. Input parsing arguments:

Argument Description
--input-enclosed-by <char> Sets a required field encloser
--input-escaped-by <char> Sets the input escape character
--input-fields-terminated-by <char> Sets the input field separator
--input-lines-terminated-by <char> Sets the input end-of-line character
--input-optionally-enclosed-by <char> Sets a field enclosing character

Table 7. Hive arguments:

Argument Description
--hive-home <dir> Override $HIVE_HOME
--hive-import Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite Overwrite existing data in the Hive table.
--create-hive-table If set, then the job will fail if the target hive

table exits. By default this property is false.
--hive-table <table-name> Sets the table name to use when importing to Hive.
--hive-drop-import-delims Drops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacement Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key Name of a hive field to partition are sharded on
--hive-partition-value <v> String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map> Override default mapping from SQL type to Hive type for configured columns.

Table 8. HBase arguments:

Argument Description
--column-family <family> Sets the target column family for the import
--hbase-create-table If specified, create missing HBase tables
--hbase-row-key <col> Specifies which input column to use as the row key
--hbase-table <table-name> Specifies an HBase table to use as the target instead of HDFS

Table 9. Code generation arguments:

Argument Description
--bindir <dir> Output directory for compiled objects
--class-name <name> Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
--jar-file <file> Disable code generation; use specified jar
--outdir <dir> Output directory for generated code
--package-name <name> Put auto-generated classes in this package
--map-column-java <m> Override default mapping from SQL type to Java type for configured columns.

Table 10. Common arguments

Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-home <dir> Override $HADOOP_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--connection-param-file <filename> Optional properties file that provides connection parameters

Table 11. Import control arguments:

Argument Description
--as-avrodatafile Imports data to Avro Data Files
--as-sequencefile Imports data to SequenceFiles
--as-textfile Imports data as plain text (default)
--direct Use direct import fast path
--direct-split-size <n> Split the input stream every n bytes when importing in direct mode
--inline-lob-limit <n> Set the maximum size for an inline LOB
-m,--num-mappers <n> Use n map tasks to import in parallel
--warehouse-dir <dir> HDFS parent for table destination
-z,--compress Enable compression
--compression-codec <c> Use Hadoop codec (default gzip)

Table 12. Output line formatting arguments:

Argument Description
--enclosed-by <char> Sets a required field enclosing character
--escaped-by <char> Sets the escape character
--fields-terminated-by <char> Sets the field separator character
--lines-terminated-by <char> Sets the end-of-line character
--mysql-delimiters Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \
optionally-enclosed-by: '
--optionally-enclosed-by <char> Sets a field enclosing character

Table 13. Input parsing arguments:

Argument Description
--input-enclosed-by <char> Sets a required field encloser
--input-escaped-by <char> Sets the input escape character
--input-fields-terminated-by <char> Sets the input field separator
--input-lines-terminated-by <char> Sets the input end-of-line character
--input-optionally-enclosed-by <char> Sets a field enclosing character

Table 14. Hive arguments:

Argument Description
--hive-home <dir> Override $HIVE_HOME
--hive-import Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite Overwrite existing data in the Hive table.
--create-hive-table If set, then the job will fail if the target hive

table exits. By default this property is false.
--hive-table <table-name> Sets the table name to use when importing to Hive.
--hive-drop-import-delims Drops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacement Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key Name of a hive field to partition are sharded on
--hive-partition-value <v> String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map> Override default mapping from SQL type to Hive type for configured columns.

Table 15. Code generation arguments:

Argument Description
--bindir <dir> Output directory for compiled objects
--jar-file <file> Disable code generation; use specified jar
--outdir <dir> Output directory for generated code
--package-name <name> Put auto-generated classes in this package


# 참고자료


# Sqoop - User Guide (v1.4.3)

 http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html

# Sqoop - (Mysql, Oracle) to (HDFS, Hive, HBase) Import

http://ankitasblogger.blogspot.kr/2012/01/sqoop-export-and-import-commands.html

# Sqoop - Ingest data from database into Hadoop with Sqoop
http://pyfunc.blogspot.kr/2012/06/ingest-data-from-database-into-hdfs-for.html

# Sqoop - Job 이용(증분 데이터 가져오기)
http://pyfunc.blogspot.kr/2012/06/ingest-data-from-database-into-hdfs-for_07.html#more


※ 위 내용은, 여러 자료를 참고하거나 제가 주관적으로 정리한 것입니다.
   잘못된 정보나 보완이 필요한 부분을, 댓글 또는 메일로 보내주시면 많은 도움이 되겠습니다.
07 29, 2013 16:03 07 29, 2013 16:03


Trackback URL : http://develop.sunshiny.co.kr/trackback/941

Leave a comment


Recent Posts

  1. HDFS - Python Encoding 오류 처리
  2. HP - Vertica ROS Container 관련 오류...
  3. HDFS - Hive 실행시 System Time 오류
  4. HP - Vertica 사용자 쿼리 이력 테이블...
  5. Client에서 HDFS 환경의 데이터 처리시...

Recent Comments

  1. 안녕하세요^^ 배그핵
  2. 안녕하세요^^ 도움이 되셨다니, 저... sunshiny
  3. 정말 큰 도움이 되었습니다.. 감사합... 사랑은
  4. 네, 안녕하세요. 댓글 남겨 주셔서... sunshiny
  5. 감사합니다 많은 도움 되었습니다!ㅎㅎ 프리시퀸스

Recent Trackbacks

  1. Mysql - mysql 설치후 Character set... 멀고 가까움이 다르기 때문 %M

Calendar

«   10 2019   »
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Bookmarks

  1. 위키피디아
  2. MysqlKorea
  3. 오라클 클럽
  4. API - Java
  5. Apache Hadoop API
  6. Apache Software Foundation
  7. HDFS 생태계 솔루션
  8. DNSBL - Spam Database Lookup
  9. Ready System
  10. Solaris Freeware
  11. Linux-Site
  12. 윈디하나의 솔라나라

Site Stats

TOTAL 2724061 HIT
TODAY 527 HIT
YESTERDAY 589 HIT