Hi, I'm trying to use the insert, update and delete methods on OrcRecordUpdater to programmatically mutate an ORC based Hive table (1.0.0). I've got inserts working correctly but I'm hitting into a problem with deletes and updates. I get an NPE which I have traced back to what seems like a missing recIdField(?).
java.lang.NullPointerException at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:103) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addEvent(OrcRecordUpdater.java:296) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.delete(OrcRecordUpdater.java:330) I've tried specifying a location for the field using AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an ObjectInspector mismatch. I'm not sure if I should be creating this field as part of my table definition or not. Currently I'm constructing the table with some code based on that located in the storm-hive project: Table tbl = new Table(); tbl.setDbName(databaseName); tbl.setTableName(tableName); tbl.setTableType(TableType.MANAGED_TABLE.toString()); StorageDescriptor sd = new StorageDescriptor(); sd.setCols(getTableColumns(colNames, colTypes)); sd.setNumBuckets(1); sd.setLocation(dbLocation + Path.SEPARATOR + tableName); if (partNames != null && partNames.length != 0) { tbl.setPartitionKeys(getPartitionKeys(partNames)); } tbl.setSd(sd); sd.setBucketCols(new ArrayList<String>(2)); sd.setSerdeInfo(new SerDeInfo()); sd.getSerdeInfo().setName(tbl.getTableName()); sd.getSerdeInfo().setParameters(new HashMap<String, String>()); sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT, "1"); // Not sure if this does anything? sd.getSerdeInfo().getParameters().put("transactional", Boolean.TRUE.toString()); sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName()); sd.setInputFormat(OrcInputFormat.class.getName()); sd.setOutputFormat(OrcOutputFormat.class.getName()); Map<String, String> tableParams = new HashMap<String, String>(); // Not sure if this does anything? tableParams.put("transactional", Boolean.TRUE.toString()); tbl.setParameters(tableParams); client.createTable(tbl); try { if (partVals != null && partVals.size() > 0) { addPartition(client, tbl, partVals); } } catch (AlreadyExistsException e) { } I don't really know enough about Hive and ORCFile internals to work out where I'm going wrong so any help would be appreciated. Thanks - Elliot.