Dear flink users,

We're trying to switch from StringWriter to SequenceFileWriter to turn on
compression. StringWriter writes value only and we want to keep that way.

AFAIK, you can use NullWritable in Hadoop writers to escape key so you only
write the values.

So I tried with NullWritable as following code:

   BucketingSink<Tuple2<NullWritable, Text>> hdfsSink = new BucketingSink(
"/data/cjv");

  hdfsSink.setBucketer(new DateTimeBucketer<>("yyyy-MM-dd/HH", ZoneOffset.UTC));
  hdfsSink.setWriter(new SequenceFileWriter<NullWritable,
Text>("org.apache.hadoop.io.compress.SnappyCodec",
SequenceFile.CompressionType.BLOCK));
  hdfsSink.setBatchSize(1024 * 1024 * 250);
  hdfsSink.setBatchRolloverInterval(20 * 60 * 1000);


   joinedResults.map(new MapFunction<Tuple2<String, String>,
Tuple2<NullWritable, Text>>() {

    @Override
    public Tuple2<NullWritable, Text> map(Tuple2<String, String>
value) throws Exception {
        return Tuple2.of(NullWritable.get(), new Text(value.f1));
    }
}).addSink(hdfsSink).name("hdfs_sink").uid("hdfs_sink");


But out put file has key as string value (null)

eg:

    (null)      {"ts":1564168038,"os":"android",...}


So my question is how to escape the key completely and write value
only in SequenceFileWriter?

Your help will be much of my appreciation.


-- 
All the best

Liu Bo

Reply via email to