Re: How to use DBInputFormat?

Amandeep Khurana Tue, 03 Feb 2009 19:25:36 -0800

Thanks Kevin

I couldnt get it work. Here's the error I get:


bin/hadoop jar ~/dbload.jar LoadTable1
09/02/03 19:21:17 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
09/02/03 19:21:20 INFO mapred.JobClient: Running job: job_local_0001
09/02/03 19:21:21 INFO mapred.JobClient:  map 0% reduce 0%
09/02/03 19:21:22 INFO mapred.MapTask: numReduceTasks: 0
09/02/03 19:21:24 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: ORA-00933: SQL command not properly ended

        at
org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:321)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
        at LoadTable1.run(LoadTable1.java:130)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at LoadTable1.main(LoadTable1.java:107)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

Exception closing file
/user/amkhuran/contract_table/_temporary/_attempt_local_0001_m_000000_0/part-00000
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
        at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053)
        at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:243)
        at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413)
        at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236)
        at
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221)


Here's my code:

public class LoadTable1 extends Configured implements Tool  {

      // data destination on hdfs
      private static final String CONTRACT_OUTPUT_PATH = "contract_table";

      // The JDBC connection URL and driver implementation class

private static final String CONNECT_URL = "jdbc:oracle:thin:@dbhost
:1521:PSEDEV";
      private static final String DB_USER = "user";
      private static final String DB_PWD = "pass";
      private static final String DATABASE_DRIVER_CLASS =
"oracle.jdbc.driver.OracleDriver";

      private static final String CONTRACT_INPUT_TABLE = "OSE_EPR_CONTRACT";

      private static final String [] CONTRACT_INPUT_TABLE_FIELDS = {
        "PORTFOLIO_NUMBER", "CONTRACT_NUMBER"};

      private static final String ORDER_CONTRACT_BY_COL = "CONTRACT_NUMBER";


    static class ose_epr_contract implements Writable, DBWritable {


        String CONTRACT_NUMBER;


        public void readFields(DataInput in) throws IOException {

            this.CONTRACT_NUMBER = Text.readString(in);

        }

        public void write(DataOutput out) throws IOException {

            Text.writeString(out, this.CONTRACT_NUMBER);


        }

        public void readFields(ResultSet in_set) throws SQLException {

            this.CONTRACT_NUMBER = in_set.getString(1);

        }

        @Override
        public void write(PreparedStatement prep_st) throws SQLException {
            // TODO Auto-generated method stub

        }

    }

    public static class LoadMapper extends MapReduceBase
                                implements Mapper<LongWritable,
ose_epr_contract, Text, NullWritable> {
        private static final char FIELD_SEPARATOR = 1;

        public void map(LongWritable arg0, ose_epr_contract arg1,
                OutputCollector<Text, NullWritable> arg2, Reporter arg3)
                throws IOException {

            StringBuilder sb = new StringBuilder();
;
            sb.append(arg1.CONTRACT_NUMBER);


            arg2.collect(new Text (sb.toString()), NullWritable.get());

        }

    }


    public static void main(String[] args) throws Exception {
        Class.forName("oracle.jdbc.driver.OracleDriver");
        int exit = ToolRunner.run(new LoadTable1(), args);

    }

    public int run(String[] arg0) throws Exception {
        JobConf conf = new JobConf(getConf(), LoadTable1.class);

        conf.setInputFormat(DBInputFormat.class);
        DBConfiguration.configureDB(conf, DATABASE_DRIVER_CLASS,
CONNECT_URL, DB_USER, DB_PWD);

        DBInputFormat.setInput(conf, ose_epr_contract.class,
                "select CONTRACT_NUMBER from OSE_EPR_CONTRACT",
                "select COUNT(CONTRACT_NUMBER) from OSE_EPR_CONTRACT");
        FileOutputFormat.setOutputPath(conf, new
Path(CONTRACT_OUTPUT_PATH));

        conf.setMapperClass(LoadMapper.class);
        conf.setNumReduceTasks(0);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(NullWritable.class);

        JobClient.runJob(conf);

        return 0;
    }
}

-Amandeep

Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Feb 3, 2009 at 6:51 PM, Kevin Peterson <[email protected]> wrote:

> On Tue, Feb 3, 2009 at 5:49 PM, Amandeep Khurana <[email protected]> wrote:
>
> > In the setInput(...) function in DBInputFormat, there are two sets of
> > arguments that one can use.
> >
> > 1. public static void *setInput*(JobConf
> >
> > a) In this, do we necessarily have to give all the fieldNames (which are
> > the
> > column names right?) that the table has, or do we need to specify only
> the
> > ones that we want to extract?
>
>
> You may specify only those columns that you are interested in.
>
> b) Do we have to have a orderBy or not necessarily? Does this relate to the
> > primary key in the table in any ways?
>
>
> Conditions and order by are not necessary.
>
> a) Is there any restriction on the kind of queries that this function
> > can take in the inputQuery string?
>
>
> I don't think so, but I don't use this method -- I just use the fieldNames
> and tableName method.
>
>
> > I am facing issues in getting this to work with an Oracle database and
> > have no idea of how to debug it (an email sent earlier).
> > Can anyone give me some inputs on this please?
>
>
> Create a new table that has one column, put about five entries into that
> table, then try to get a map job working that outputs the values to a text
> file. If that doesn't work, post your code and errors.
>

Re: How to use DBInputFormat?

Reply via email to