Hi,
I could not find any documentation or help on how to use
CqlBulkOutputFormat for bulk loading data into Cassandra. Could anyone
please share some guidelines on how to write MR job to bulkload data
into Cassandra using CqlBulkOutputFormat.

I tried something like shown below, which failed with an exception
given at the end:

    Configuration conf = getConf();
      Job job = new Job(conf, this.getClass().toString());
        FileInputFormat.setInputPaths(job, inputPath);
        FileOutputFormat.setOutputPath(job, outputPath);
        job.setJobName("Test");
        job.setJarByClass(Myloader.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setMapOutputKeyClass(Object.class);
        job.setMapOutputValueClass(List.class);
        job.setNumReduceTasks(0);
        job.setMapperClass(Map.class);
        job.setOutputFormatClass(CqlBulkOutputFormat.class);
           ConfigHelper.setOutputKeyspace(job.getConfiguration(),KEYSPACE);
        ConfigHelper.setOutputColumnFamily(job.getConfiguration(),KEYSPACE,
TABLE);

        ConfigHelper.setOutputRpcPort(job.getConfiguration(), "9160");
        ConfigHelper.setOutputInitialAddress(job.getConfiguration(),
"localhost");
        
ConfigHelper.setOutputPartitioner(job.getConfiguration(),"Murmur3Partitioner");

       CqlBulkOutputFormat.setTableSchema(job.getConfiguration(),
TABLE, SCHEMA);
        CqlBulkOutputFormat.setTableInsertStatement(job.getConfiguration(),
TABLE, INSERT_STMT);



public static class Map extends Mapper<LongWritable, Text, Object,
List<ByteBuffer>> {

        @Override
        public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
            ....
            context.write(someObj, list);

        }
    }


Even tried setting
    //    conf.set("cassandra.config",
"file:///opt/cluster/apache-cassandra-2.2.8/conf/cassandra.yaml");

but it did not work, here is the exception:

Error: org.apache.cassandra.exceptions.ConfigurationException:
Expecting URI in variable: [cassandra.config].  Please prefix the file
with file:/// for local files or file://<server>/ for remote files.
Aborting. If you are executing this from an external tool, it needs to
set Config.setClientMode(true) to avoid loading configuration.


Any help regarding how to fix above issue will be highly appreciated
Thank you
Afzal

Reply via email to