# Oracle JDBC 드라이버 다운로드

> Oracle 버전에 맞는 JDBC 드라이버 다운로드
 - 10g(ojdbc14.jar), 11g(ojdbc6.jar)
   
http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html

> JDBC 드라이버를 $SQOOP_HOME/lib 디렉토리에 복사
[hadoop@master lib]$ pwd
/home/hadoop/sqoop/lib
[hadoop@master lib]$ ls
ojdbc14.jar          ojdbc6.jar

# JDBC 테스트

> 데이터 베이스 사용자 조회
[hadoop@master sqoop]$ sqoop list-databases --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g --username scott -P
Enter password:
13/07/29 14:19:44 INFO manager.SqlManager: Using default fetchSize of 1000
13/07/29 14:19:44 INFO manager.OracleManager: Time zone has been set to GMT
SCOTT
.
.
.

> 데이터 베이스 테이블 조회
sqoop eval \
             --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
             --username scott \
             --password tiger \
             --query "SELECT  t.*  FROM  DEPT  t WHERE 1=0"


# HDFS 에 RDBMS(Oracle) 데이터 Import

> 테이블의 특정 컬럼만 import
sqoop import -D oracle.sessionTimeZone=America/Los_Angeles \
    --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
    --username scott \
    --password tiger \
    --table EMP \ -- DBMS 테이블 명
    --target-dir /user/hive/warehouse/fsc_new/EMP \ -- hdfs상에 저장될 디렉토리
    --columns SEQ,ENAME,DEPTNO \  -- 콤마와 컬럼에 공백 없이
    --split-by SEQ    -- Primary Key

> 테이블 컬럼 전체 import
sqoop import --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
    --username scott \
    --password tiger \
    --query "SELECT t.* FROM EMP t WHERE \$CONDITIONS" \
    --split-by t.SEQ \
    --target-dir /user/hive/warehouse/target_dir


# Hive 에 RDBMS(Oracle) 데이터 Import

sqoop import \
    --connect jdbc:oracle:thin:@//192.168.1.19:1521/ora11g \
    --username SCOTT \   -- 계정명 대문자
    --password tiger \
    --table U102 \  -- DBMS 테이블 명
    --hive-import \  -- Hive Import 옵션
    -m 1  -- Map 병렬 처리 수

# Hive 에 RDBMS(Oracle) 계정의 전체 테이블 Import

sqoop import-all-tables \
    --connect jdbc:oracle:thin:@//192.168.1.190:1521/ora10g \
    --username SCOTT \
    --password tiger \
    --hive-import \
    --warehouse-dir /user/hive/warehouse \
    --fields-terminated-by ',' \ -- 컬럼 구분자
    --lines-terminated-by '\n' \ -- 로우 구분자
    --direct \
    -m 1 -- 순차적으로 가져오기 수행(테이블 별 Primary Key 지정이 어려울때)

# Sqoop 실행시 오류 참고


# Sqoop 명령어 정보

[hadoop@master bin]$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

# Sqoop 옵션 정보

Table 1. Common arguments
Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-home <dir> Override $HADOOP_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--connection-param-file <filename> Optional properties file that provides connection parameters

Table 2. Import control arguments:

Argument Description
--append Append data to an existing dataset in HDFS
--as-avrodatafile Imports data to Avro Data Files
--as-sequencefile Imports data to SequenceFiles
--as-textfile Imports data as plain text (default)
--boundary-query <statement> Boundary query to use for creating splits
--columns <col,col,col…> Columns to import from table
--direct Use direct import fast path
--direct-split-size <n> Split the input stream every n bytes when importing in direct mode
--inline-lob-limit <n> Set the maximum size for an inline LOB
-m,--num-mappers <n> Use n map tasks to import in parallel
-e,--query <statement> Import the results of statement.
--split-by <column-name> Column of the table used to split work units
--table <table-name> Table to read
--target-dir <dir> HDFS destination dir
--warehouse-dir <dir> HDFS parent for table destination
--where <where clause> WHERE clause to use during import
-z,--compress Enable compression
--compression-codec <c> Use Hadoop codec (default gzip)
--null-string <null-string> The string to be written for a null value for string columns
--null-non-string <null-string> The string to be written for a null value for non-string columns

Table 3. Parameters for overriding mapping

Argument Description
--map-column-java <mapping> Override mapping from SQL to Java type for configured columns.
--map-column-hive <mapping> Override mapping from SQL to Hive type for configured columns.

Table 4. Incremental import arguments:

Argument Description
--check-column (col) Specifies the column to be examined when determining which rows to import.
--incremental (mode) Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value) Specifies the maximum value of the check column from the previous import.

Table 5. Output line formatting arguments:

Argument Description
--enclosed-by <char> Sets a required field enclosing character
--escaped-by <char> Sets the escape character
--fields-terminated-by <char> Sets the field separator character
--lines-terminated-by <char> Sets the end-of-line character
--mysql-delimiters Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \
optionally-enclosed-by: '
--optionally-enclosed-by <char> Sets a field enclosing character

Table 6. Input parsing arguments:

Argument Description
--input-enclosed-by <char> Sets a required field encloser
--input-escaped-by <char> Sets the input escape character
--input-fields-terminated-by <char> Sets the input field separator
--input-lines-terminated-by <char> Sets the input end-of-line character
--input-optionally-enclosed-by <char> Sets a field enclosing character

Table 7. Hive arguments:

Argument Description
--hive-home <dir> Override $HIVE_HOME
--hive-import Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite Overwrite existing data in the Hive table.
--create-hive-table If set, then the job will fail if the target hive

table exits. By default this property is false.
--hive-table <table-name> Sets the table name to use when importing to Hive.
--hive-drop-import-delims Drops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacement Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key Name of a hive field to partition are sharded on
--hive-partition-value <v> String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map> Override default mapping from SQL type to Hive type for configured columns.

Table 8. HBase arguments:

Argument Description
--column-family <family> Sets the target column family for the import
--hbase-create-table If specified, create missing HBase tables
--hbase-row-key <col> Specifies which input column to use as the row key
--hbase-table <table-name> Specifies an HBase table to use as the target instead of HDFS

Table 9. Code generation arguments:

Argument Description
--bindir <dir> Output directory for compiled objects
--class-name <name> Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
--jar-file <file> Disable code generation; use specified jar
--outdir <dir> Output directory for generated code
--package-name <name> Put auto-generated classes in this package
--map-column-java <m> Override default mapping from SQL type to Java type for configured columns.

Table 10. Common arguments

Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-home <dir> Override $HADOOP_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--connection-param-file <filename> Optional properties file that provides connection parameters

Table 11. Import control arguments:

Argument Description
--as-avrodatafile Imports data to Avro Data Files
--as-sequencefile Imports data to SequenceFiles
--as-textfile Imports data as plain text (default)
--direct Use direct import fast path
--direct-split-size <n> Split the input stream every n bytes when importing in direct mode
--inline-lob-limit <n> Set the maximum size for an inline LOB
-m,--num-mappers <n> Use n map tasks to import in parallel
--warehouse-dir <dir> HDFS parent for table destination
-z,--compress Enable compression
--compression-codec <c> Use Hadoop codec (default gzip)

Table 12. Output line formatting arguments:

Argument Description
--enclosed-by <char> Sets a required field enclosing character
--escaped-by <char> Sets the escape character
--fields-terminated-by <char> Sets the field separator character
--lines-terminated-by <char> Sets the end-of-line character
--mysql-delimiters Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \
optionally-enclosed-by: '
--optionally-enclosed-by <char> Sets a field enclosing character

Table 13. Input parsing arguments:

Argument Description
--input-enclosed-by <char> Sets a required field encloser
--input-escaped-by <char> Sets the input escape character
--input-fields-terminated-by <char> Sets the input field separator
--input-lines-terminated-by <char> Sets the input end-of-line character
--input-optionally-enclosed-by <char> Sets a field enclosing character

Table 14. Hive arguments:

Argument Description
--hive-home <dir> Override $HIVE_HOME
--hive-import Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite Overwrite existing data in the Hive table.
--create-hive-table If set, then the job will fail if the target hive

table exits. By default this property is false.
--hive-table <table-name> Sets the table name to use when importing to Hive.
--hive-drop-import-delims Drops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacement Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key Name of a hive field to partition are sharded on
--hive-partition-value <v> String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map> Override default mapping from SQL type to Hive type for configured columns.

Table 15. Code generation arguments:

Argument Description
--bindir <dir> Output directory for compiled objects
--jar-file <file> Disable code generation; use specified jar
--outdir <dir> Output directory for generated code
--package-name <name> Put auto-generated classes in this package


# 참고자료


# Sqoop - User Guide (v1.4.3)

 http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html

# Sqoop - (Mysql, Oracle) to (HDFS, Hive, HBase) Import

http://ankitasblogger.blogspot.kr/2012/01/sqoop-export-and-import-commands.html

# Sqoop - Ingest data from database into Hadoop with Sqoop
http://pyfunc.blogspot.kr/2012/06/ingest-data-from-database-into-hdfs-for.html

# Sqoop - Job 이용(증분 데이터 가져오기)
http://pyfunc.blogspot.kr/2012/06/ingest-data-from-database-into-hdfs-for_07.html#more


※ 위 내용은, 여러 자료를 참고하거나 제가 주관적으로 정리한 것입니다.
   잘못된 정보나 보완이 필요한 부분을, 댓글 또는 메일로 보내주시면 많은 도움이 되겠습니다.
07 29, 2013 16:03 07 29, 2013 16:03


Trackback URL : http://develop.sunshiny.co.kr/trackback/941

Leave a comment

« Previous : 1 : ... 89 : 90 : 91 : 92 : 93 : 94 : 95 : 96 : 97 : ... 648 : Next »

Recent Posts

  1. HDFS - Python Encoding 오류 처리
  2. HP - Vertica ROS Container 관련 오류...
  3. HDFS - Hive 실행시 System Time 오류
  4. HP - Vertica 사용자 쿼리 이력 테이블...
  5. Client에서 HDFS 환경의 데이터 처리시...

Recent Comments

  1. Hi, I do think this is a great blo... 룸싸롱 01시 43분
  2. Thanks in favor of sharing such a... 리니지 프리서버 01 20,
  3. 안녕하세요^^ 배그핵
  4. 안녕하세요^^ 도움이 되셨다니, 저... sunshiny
  5. 정말 큰 도움이 되었습니다.. 감사합... 사랑은

Recent Trackbacks

  1. invoice printing and mailing invoice printing and mailing 20 01
  2. cabo san lucas packages cabo san lucas packages 20 01
  3. london relocation services fees london relocation services fees 20 01
  4. printing and mailing companies printing and mailing companies 20 01
  5. Web Site Web Site 19 01

Calendar

«   01 2020   »
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Bookmarks

  1. 위키피디아
  2. MysqlKorea
  3. 오라클 클럽
  4. API - Java
  5. Apache Hadoop API
  6. Apache Software Foundation
  7. HDFS 생태계 솔루션
  8. DNSBL - Spam Database Lookup
  9. Ready System
  10. Solaris Freeware
  11. Linux-Site
  12. 윈디하나의 솔라나라

Site Stats

TOTAL 2819260 HIT
TODAY 51 HIT
YESTERDAY 1318 HIT