Hi -

I have a data set which is mostly a 2D table, however one column
(called Attributes) contains a List of Structs in each cell. Each
Struct has three fields: Attribute Tag, Attribute Type and Attribute
Value.

The definition of the Attributes Field is:

/**
 * Attribute Tag - Two character tag.
 */
public static final Field ATTRIBUTE_TAG_FIELD =
        new Field("AttributeTag", FieldType.notNullable(new
ArrowType.FixedSizeBinary(2)), null);


/**
 * Attribute Type - One character type.
 */
// todo this could be dictionary encoded but would require building a
dictionary which requires access to the allocator
public static final Field ATTRIBUTE_TYPE_FIELD =
        new Field(
                "AttributeType",
                new FieldType(false,
                new ArrowType.FixedSizeBinary(1), null),
                null
        );

/**
 * String representation of the Attribute value.
 */
public static final Field ATTRIBUTE_VALUE_FIELD =
        new Field("AttributeValue", FieldType.notNullable(new
ArrowType.Utf8()), null);

/**
 * The field is a nullable List of Structs each with an attribute tag,
type and value.
 */
public static final Field ATTRIBUTES_FIELD =
        new Field("Attributes", FieldType.nullable(new
ArrowType.List()), List.of(
                new Field("Attribute", FieldType.nullable(new
ArrowType.Struct()), List.of(
                        ATTRIBUTE_TAG_FIELD, ATTRIBUTE_TYPE_FIELD,
ATTRIBUTE_VALUE_FIELD))
        ));



I have this code that attempts to populate the Attributes from some
source data. Although this produces no errors when run it doesn't
result in any values in the attributes vector.

final ListVector attributes = (ListVector)
ATTRIBUTES_FIELD.createVector(
allocator);

// this is the source of the attributes that I will populate into the
attributes vector
final List<SAMRecord.SAMTagAndValue> recordAttributes =
samRecord.getAttributes();

if (recordAttributes != null && recordAttributes.size() > 0 ) {
    final UnionListWriter listWriter = attributes.getWriter();
    listWriter.allocate();

    IntStream.range(0,
recordAttributes.size()).forEachOrdered(attributeIndex -> {
        listWriter.setPosition(attributeIndex);
        listWriter.startList();

        // put the values of the attribute in the arrow struct
        final SAMRecord.SAMTagAndValue samTagAndValue =
recordAttributes.get(attributeIndex);

        // I think the problem is here. In a debugger this seems to
create a new writer not related to my Vector??
        final BaseWriter.StructWriter structWriter =
listWriter.struct("Attribute");
        structWriter.start();

        final byte[] tagBytes =
samTagAndValue.tag.getBytes(StandardCharsets.UTF_8);
        // todo find out the type from the value
        final byte[] typeBytes = "S".getBytes(StandardCharsets.UTF_8);
        final byte[] valueBytes =
samTagAndValue.value.toString().getBytes(StandardCharsets.UTF_8);

        ArrowBuf tempBuf = allocator.buffer(tagBytes.length);
        tempBuf.setBytes(0, tagBytes);
        structWriter.varChar("AttributeTag").writeVarChar(0,
tagBytes.length, tempBuf);
        tempBuf.close();


        tempBuf = allocator.buffer(typeBytes.length);
        structWriter.varChar("AttributeType").writeVarChar(0,
typeBytes.length, tempBuf);
        tempBuf.close();

        tempBuf = allocator.buffer(valueBytes.length);
        structWriter.varChar("AttributeValue").writeVarChar(0,
valueBytes.length, tempBuf);
        tempBuf.close();

        structWriter.end();
    });

    listWriter.setValueCount(recordAttributes.size());
    listWriter.end();
}

What am I doing wrong?

Reply via email to