Hi,

I'm trying to use the insert, update and delete methods on OrcRecordUpdater
to programmatically mutate an ORC based Hive table (1.0.0). I've got
inserts working correctly but I'm hitting into a problem with deletes and
updates. I get an NPE which I have traced back to what seems like a missing
recIdField(?).

java.lang.NullPointerException
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:103)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addEvent(OrcRecordUpdater.java:296)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.delete(OrcRecordUpdater.java:330)


I've tried specifying a location for the field using
AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an
ObjectInspector mismatch. I'm not sure if I should be creating this field
as part of my table definition or not. Currently I'm constructing the table
with some code based on that located in the storm-hive project:

      Table tbl = new Table();
      tbl.setDbName(databaseName);
      tbl.setTableName(tableName);
      tbl.setTableType(TableType.MANAGED_TABLE.toString());
      StorageDescriptor sd = new StorageDescriptor();
      sd.setCols(getTableColumns(colNames, colTypes));
      sd.setNumBuckets(1);
      sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
      if (partNames != null && partNames.length != 0) {
        tbl.setPartitionKeys(getPartitionKeys(partNames));
      }

      tbl.setSd(sd);

      sd.setBucketCols(new ArrayList<String>(2));
      sd.setSerdeInfo(new SerDeInfo());
      sd.getSerdeInfo().setName(tbl.getTableName());
      sd.getSerdeInfo().setParameters(new HashMap<String, String>());

sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT,
"1");
      // Not sure if this does anything?
      sd.getSerdeInfo().getParameters().put("transactional",
Boolean.TRUE.toString());

      sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
      sd.setInputFormat(OrcInputFormat.class.getName());
      sd.setOutputFormat(OrcOutputFormat.class.getName());

      Map<String, String> tableParams = new HashMap<String, String>();
      // Not sure if this does anything?
      tableParams.put("transactional", Boolean.TRUE.toString());
      tbl.setParameters(tableParams);
      client.createTable(tbl);
      try {
        if (partVals != null && partVals.size() > 0) {
          addPartition(client, tbl, partVals);
        }
      } catch (AlreadyExistsException e) {
      }

I don't really know enough about Hive and ORCFile internals to work out
where I'm going wrong so any help would be appreciated.

Thanks - Elliot.

Reply via email to