Background: I have devices generating avro files and throwing them at S3. The S3 consumers want to push some more data into them so they have a lambda that does a copy/transform to push the data in. For some reason they wrote their initial code with the 1.0 release of Avro, and added fields to the records with a default of null (which now breaks the 1.9.1 type checking, which manifested as avro-tools 1.9.1 barfing on the files.) They were told 1) Stop using the 10 year old library, use the new stuff 2) the way to default to null for a typed value is union null/type with null first.
Their first fix attempt, read all the records of the existing file and created them all again with the union null/type with the SchemaBuilder. This changed the shape of the data, and added a bunch of null/defaults that were not valid. My suggested code (in groovy, using the java 1.9.1 library) to just copy the records as is, and make a resolving schema you can get a generic avro data to manipulate was: for (fileName in args) { DatumReader<GenericRecord> datumReader = new GenericDatumReader<>() DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File(fileName), datumReader) Schema schema = dataFileReader.getSchema() List<Field> theFields = new ArrayList<Field>() for (f in schema.getFields()) { f.position = -1 theFields.add(f) } Field fieldWithDefault = new Field("withDefault", Schema.create(Type.STRING) ,"", "Spoon") fieldWithDefault.position = -1 theFields.add(fieldWithDefault) Schema newSchema = Schema.createRecord(schema.getName(), "", schema.getNamespace(), false, theFields) System.out.println(newSchema) } The f.position = -1 to get a Field type that is addable to another schema felt wrong, but seems to work. Is there a better idiom for "I want to add a field to a record and populate it with data" that I missed? Dan