This is an automated email from the ASF dual-hosted git repository.
chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fory.git
The following commit(s) were added to refs/heads/main by this push:
new 1a4632a17 feat(format): support custom codecs keyed on Optional (#3800)
1a4632a17 is described below
commit 1a4632a17eaa59903a17cbf93429fa50343c4010
Author: Steven Schlansker <[email protected]>
AuthorDate: Mon Jun 29 21:02:55 2026 -0700
feat(format): support custom codecs keyed on Optional (#3800)
A row-format field typed Optional<X> is normally unwrapped to a nullable
X with emptiness mapped to the column null bit, so a codec keyed on X
only sees the present value and cannot control how absence is encoded.
Allow a codec keyed on Optional itself to own the whole field: it
receives both present and empty values and encodes optionality in-band
(e.g. a sentinel) into a single non-nullable column, matching external
wire formats an element codec cannot reproduce.
TypeInference.inferField and RowEncoderBuilder.addDecoderMethods now
resolve a codec on the raw field type before the Optional unwrap,
matching the serialize/deserialize paths. The reorder is
behavior-preserving: without an Optional-keyed codec the lookup is null
and the existing unwrap runs unchanged, so element codecs inside an
Optional keep working.
Because an Optional-keyed codec declares a non-nullable column,
serializeFor normalizes a null field reference to Optional.empty()
rather than setNullAt (which the compact row writer rejects). The decode
side returns a non-null Optional; this contract is documented on
CustomCodec.decode and the Encoders javadoc.
Collection and map element codecs were keyed on BinaryArray.class, so a
bean-scoped codec applied to a bean's fields but not its collection or
map elements, diverging schema from data. Resolve elements against the
enclosing type the schema uses: a bean field encoder keys on its own
bean, a standalone array/map encoder keys on Object to match inferField.
The new codecEnclosingType keeps codegen and schema in agreement.
Document the CustomCodec.getForyField width contract: a fixed-width
declaration must equal the exact byte width encode() produces, since the
compact layout sizes the slot from it. This cannot be statically
validated, so the compact writers assert it on every fixed-slot write
via a checkFixedWidth hook (a no-op for the uniform-slot default
layout). assert keeps the cost at zero in production and catches a
misconfigured codec instead of corrupting the row.
Tests cover present/empty/null-reference round-trips across the eager,
compact, and lazy paths; Optional-keyed codecs in List, Set, nested
List, and Map key/value; scope isolation between bean fields and
top-level collections; and the width-mismatch assert on the row and
array paths.
## Why?
Bug exploration on future feature request PR yielded an existing path
that is not well supported, therefore
peel off feature / bugfix aspect as a separate PR
## What does this PR do?
Correct inconsistent handling of Optional type in row format - only
mattered in the case of custom codecs.
Includes tests showing an actual bug was fixed.
## Related issues
N/A
## AI Contribution Checklist
- [x] Substantial AI assistance was used in this PR
- [x] AI Usage Disclosure
- substantial_ai_assistance: yes
- scope: all
- affected_files_or_subsystems: row-format
- ai_review: included below
- human_verification: :heavy_check_mark:
- performance_verification: :heavy_check_mark:
- provenance_license_confirmation: :heavy_check_mark:
```
Both reviewers independently reached approve with no blockers or majors,
and that holds under scrutiny. The change
correctly adds Optional-keyed custom codecs and fixes a real bug along
the way: switching the codec-enclosing key from
the internal BinaryArray.class sentinel to the enclosing bean, so
bean-scoped codecs now reach a bean's collection and
map elements instead of falling through to Object. The "raw-type codec
wins before Optional unwrap" invariant is applied
consistently across all four resolution sites with symmetric
encode/decode, every @Internal signature change is
non-breaking, and the test coverage is thorough — present/empty/null-ref
cases, interface decode, collection-element
scope, mixed-nullability asserted against BinaryRow.isNullAt wire state
rather than the decoded value, and three
width-mismatch traps gated for assertions-off runs. The only open
question either review raised — whether the new
checkFixedWidth hook on the primitive write paths costs throughput — is
now settled by measurement: it is statistically
zero-cost in production. The branch is ready to merge.
Signed off by: Claude Opus 4.8 (model ID claude-opus-4-8, 1M-token
context), Anthropic's Claude Code CLI.
```
## Does this PR introduce any user-facing change?
Minor change to row-format custom codec handling around Optional types.
The behavior before was broken, so this should be a strict improvement.
- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?
## Benchmark
```
A focused JMH harness in the existing benchmarks/java module, built
against this branch's fory-format, run on JDK 26
with assertions off (the production and default config). I isolated the
hook by pairing the real hooked
CompactBinaryRowWriter.write(int, long) against an identical hook-free
store on the same buffer and offset, with both
the eager and compact array writers loaded so the checkFixedWidth call
site is bimorphic — the exact dispatch the review
worried about. 3 forks × 6 iterations, 18 measurements per benchmark:
- hook-free store: 10.954 ± 0.112 ns/op (per 256-row batch)
- real hooked write: 11.312 ± 0.468 ns/op
The delta is ~0.36 ns across a 256-row batch and the confidence intervals
overlap, so with assertions off the hook is
statistically indistinguishable from zero cost. A negative control with
-ea on showed a small but real gap (11.690 vs
11.087), which confirms the benchmark genuinely exercises the hook rather
than measuring a no-op — the cost simply
vanishes once $assertionsDisabled folds the guard and dead-code
elimination drops the now-unused array load, exactly as
the bytecode predicted.
```
Co-authored-by: Claude (on behalf of Steven Schlansker)
<[email protected]>
---
.../fory/format/encoder/ArrayDataForEach.java | 10 +-
.../fory/format/encoder/ArrayEncoderBuilder.java | 6 +-
.../format/encoder/BaseBinaryEncoderBuilder.java | 46 ++-
.../apache/fory/format/encoder/CustomCodec.java | 19 +-
.../org/apache/fory/format/encoder/Encoders.java | 18 +
.../apache/fory/format/encoder/LazyArrayData.java | 10 +-
.../fory/format/encoder/MapEncoderBuilder.java | 5 +-
.../fory/format/encoder/RowEncoderBuilder.java | 9 +-
.../row/binary/writer/BinaryArrayWriter.java | 5 +
.../format/row/binary/writer/BinaryWriter.java | 12 +
.../binary/writer/CompactBinaryArrayWriter.java | 44 +++
.../row/binary/writer/CompactBinaryRowWriter.java | 26 +-
.../org/apache/fory/format/type/TypeInference.java | 27 +-
.../fory/format/encoder/CompactCodecTest.java | 126 +++++++
.../fory/format/encoder/CustomCodecTest.java | 405 +++++++++++++++++++++
15 files changed, 743 insertions(+), 25 deletions(-)
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayDataForEach.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayDataForEach.java
index 13e1e3d52..26f4adeb6 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayDataForEach.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayDataForEach.java
@@ -61,9 +61,10 @@ public class ArrayDataForEach extends AbstractExpression {
*/
public ArrayDataForEach(
Expression inputArrayData,
+ Class<?> enclosingType,
TypeRef<?> elemType,
SerializableBiFunction<Expression, Expression, Expression>
notNullAction) {
- this(inputArrayData, elemType, notNullAction, null);
+ this(inputArrayData, enclosingType, elemType, notNullAction, null);
}
/**
@@ -72,6 +73,7 @@ public class ArrayDataForEach extends AbstractExpression {
*/
public ArrayDataForEach(
Expression inputArrayData,
+ Class<?> enclosingType,
TypeRef<?> elemType,
SerializableBiFunction<Expression, Expression, Expression> notNullAction,
SerializableFunction<Expression, Expression> nullAction) {
@@ -79,8 +81,12 @@ public class ArrayDataForEach extends AbstractExpression {
Preconditions.checkArgument(getRawType(inputArrayData.type()) ==
BinaryArray.class);
this.inputArrayData = inputArrayData;
CustomTypeHandler customTypeHandler =
CustomTypeEncoderRegistry.customTypeHandler();
+ // Resolve the element codec scoped to the same enclosing type the schema
used: the bean for a
+ // bean's collection field, Object for a top-level collection, falling
back to an Object-scoped
+ // registration. A bean-scoped codec applies to that bean's collection
elements, not only its
+ // direct fields, but does not bind to a top-level collection the schema
never applied it to.
CustomCodec<?, ?> customEncoder =
- customTypeHandler.findCodec(BinaryArray.class, elemType.getRawType());
+ customTypeHandler.findCodec(enclosingType, elemType.getRawType());
TypeRef<?> accessType;
if (customEncoder == null) {
accessType = elemType;
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayEncoderBuilder.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayEncoderBuilder.java
index c24611cd8..a5ec4715b 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayEncoderBuilder.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/ArrayEncoderBuilder.java
@@ -54,7 +54,10 @@ public class ArrayEncoderBuilder extends
BaseBinaryEncoderBuilder {
}
public ArrayEncoderBuilder(TypeRef<?> clsType, TypeRef<?> beanType) {
- super(new CodegenContext(), beanType);
+ // A top-level collection has no enclosing bean, so scope element-codec
resolution to Object to
+ // match TypeInference's empty-path enclosing type; beanType still names
the element type for
+ // class naming and the empty-array template below.
+ super(new CodegenContext(), beanType, Object.class);
arrayToken = clsType;
ctx.reserveName(ROOT_ARRAY_WRITER_NAME);
ctx.reserveName(ROOT_ARRAY_NAME);
@@ -180,6 +183,7 @@ public class ArrayEncoderBuilder extends
BaseBinaryEncoderBuilder {
ArrayDataForEach addElemsOp =
new ArrayDataForEach(
arrayData,
+ codecEnclosingType,
elemType,
(i, value) ->
new Expression.Invoke(
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/BaseBinaryEncoderBuilder.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/BaseBinaryEncoderBuilder.java
index a46d8585f..530bf5a93 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/BaseBinaryEncoderBuilder.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/BaseBinaryEncoderBuilder.java
@@ -107,12 +107,26 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
protected final TypeResolutionContext typeCtx;
protected final Reference foryRef = new Reference(FORY_NAME, FORY_TYPE,
false);
+ // Enclosing type for custom-codec resolution (findCodec(enclosing,
target)), matching how
+ // TypeInference keys the inferred schema. A row encoder uses the bean being
generated; a
+ // standalone array/map encoder has no enclosing bean and scopes to Object,
matching
+ // TypeInference's empty-path enclosing type for a top-level collection.
Keeping these in
+ // agreement prevents a narrowly registered codec from binding to elements
the schema never
+ // applied it to.
+ protected final Class<?> codecEnclosingType;
+
public BaseBinaryEncoderBuilder(CodegenContext context, Class<?> beanClass) {
this(context, TypeRef.of(beanClass));
}
public BaseBinaryEncoderBuilder(CodegenContext context, TypeRef<?> beanType)
{
+ this(context, beanType, beanType.getRawType());
+ }
+
+ public BaseBinaryEncoderBuilder(
+ CodegenContext context, TypeRef<?> beanType, Class<?>
codecEnclosingType) {
super(context, beanType);
+ this.codecEnclosingType = codecEnclosingType;
ctx.reserveName(REFERENCES_NAME);
ctx.reserveName(FORY_NAME);
@@ -172,16 +186,31 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
Expression arrowField,
Set<TypeRef<?>> visitedCustomTypes) {
Class<?> rawType = getRawType(typeRef);
- TypeRef<?> rewrittenType =
customTypeHandler.replacementTypeFor(beanType.getRawType(), rawType);
+ TypeRef<?> rewrittenType =
customTypeHandler.replacementTypeFor(codecEnclosingType, rawType);
if (rewrittenType != null
&& !visitedCustomTypes.contains(typeRef)
&& !typeRef.equals(rewrittenType)) {
- Expression newInputObject = customEncode(inputObject, rewrittenType);
visitedCustomTypes.add(typeRef);
+ if (rawType == Optional.class) {
+ // Codec keyed on Optional owns absence (canonical ordering, see
TypeInference.inferField).
+ // Its column is non-nullable, so a null reference must reach the
codec as Optional.empty()
+ // rather than a column null bit. Other codecs keep the column-null
mapping below.
+ Expression empty = new Expression.StaticInvoke(Optional.class,
"empty", "", typeRef, false);
+ Expression normalized =
+ new If(new Expression.IsNull(inputObject), empty, inputObject,
false, typeRef);
+ return serializeFor(
+ ordinal,
+ customEncode(normalized, rewrittenType),
+ writer,
+ rewrittenType,
+ fieldIfKnown,
+ arrowField,
+ visitedCustomTypes);
+ }
Expression doSerialize =
serializeFor(
ordinal,
- newInputObject,
+ customEncode(inputObject, rewrittenType),
writer,
rewrittenType,
fieldIfKnown,
@@ -616,11 +645,13 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
TypeResolutionContext ctx,
Set<TypeRef<?>> visitedCustomTypes) {
Class<?> rawType = getRawType(typeRef);
- TypeRef<?> rewrittenType =
customTypeHandler.replacementTypeFor(beanType.getRawType(), rawType);
+ TypeRef<?> rewrittenType =
customTypeHandler.replacementTypeFor(codecEnclosingType, rawType);
if (rewrittenType != null
&& !visitedCustomTypes.contains(typeRef)
&& !typeRef.equals(rewrittenType)) {
visitedCustomTypes.add(typeRef);
+ // Canonical ordering (see TypeInference.inferField): an Optional-keyed
codec reconstructs the
+ // Optional in customDecode; the unwrap below is only reached when no
codec owns the field.
final Expression deserializedValue =
deserializeFor(value, rewrittenType, ctx, visitedCustomTypes);
return customDecode(typeRef, deserializedValue);
@@ -727,6 +758,7 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
if (typeRef.getRawType() == List.class) {
return new LazyArrayData(
arrayData,
+ codecEnclosingType,
elemType,
(i, value) -> deserializeFor(value, elemType, typeCtx, new
HashSet<>()),
ExpressionUtils.nullValue(elemType));
@@ -736,6 +768,7 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
ArrayDataForEach addElemsOp =
new ArrayDataForEach(
arrayData,
+ codecEnclosingType,
elemType,
(i, value) ->
new Invoke(
@@ -831,6 +864,7 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
if (numDimensions == 2) {
return new ArrayDataForEach(
arrayData,
+ codecEnclosingType,
elemType,
(i, value) -> {
Expression[] newIndexes = Arrays.copyOf(indexes, indexes.length +
1);
@@ -842,6 +876,7 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
} else {
return new ArrayDataForEach(
arrayData,
+ codecEnclosingType,
elemType,
(i, value) -> {
Expression[] newIndexes = Arrays.copyOf(indexes, indexes.length +
1);
@@ -905,6 +940,7 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
ArrayDataForEach op =
new ArrayDataForEach(
arrayData,
+ codecEnclosingType,
elemType,
(i, value) -> {
Expression elemValue = deserializeFor(value, elemType,
typeCtx, new HashSet<>());
@@ -945,7 +981,7 @@ public abstract class BaseBinaryEncoderBuilder extends
CodecBuilder {
false,
false,
false,
- Literal.ofClass(beanType.getRawType()),
+ Literal.ofClass(codecEnclosingType),
Literal.ofClass(ft.getRawType()));
ctx.addField(true, true, ctx.type(CustomCodec.class), name, init);
return new Reference(name, TypeRef.of(CustomCodec.class));
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/CustomCodec.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/CustomCodec.java
index 9e1f74ee5..81c038418 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/CustomCodec.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/CustomCodec.java
@@ -35,7 +35,17 @@ import org.apache.fory.reflect.TypeRef;
public interface CustomCodec<T, E> {
/**
- * Returns the Fory Field for the given field name.
+ * Returns the Fory Field for the given field name, or null to infer it from
{@link
+ * #encodedType()}.
+ *
+ * <p>The returned field overrides only the column's logical type and
nullability; its physical
+ * row representation must match what the codec reads and writes through
{@link #encodedType()}.
+ * In particular, a fixed-width column (for example {@code int64} or {@code
fixedWidthBinary(16)})
+ * must equal the exact byte width {@link #encode} produces, because the
compact layout sizes the
+ * column's slot from this declaration. Declaring a width the codec does not
honor corrupts the
+ * compact row (the eager row format word-pads fixed slots and can mask it).
For example {@link
+ * MemoryBufferCodec} declares a variable {@code binary} column, while a
UUID codec writing a
+ * 16-byte buffer may declare {@code fixedWidthBinary(16)}.
*
* @param fieldName the name of the field
* @return the Fory field definition, or null to use default inference
@@ -46,6 +56,13 @@ public interface CustomCodec<T, E> {
E encode(T value);
+ /**
+ * Reconstructs the value from its encoded representation. A codec keyed on
{@link
+ * java.util.Optional} owns the field's absence and must return a non-null
{@code Optional} (use
+ * {@link java.util.Optional#empty()} for absence); the decoded value is
assigned straight to the
+ * Optional field, so returning {@code null} would surface later as a {@code
+ * NullPointerException}.
+ */
T decode(E value);
/** Specialized codec base for encoding and decoding to/from {@link
MemoryBuffer}. */
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/Encoders.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/Encoders.java
index 4a8c45021..e8ab49cc1 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/Encoders.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/Encoders.java
@@ -110,6 +110,24 @@ public class Encoders {
/**
* Register a custom codec handling a given type, when it is enclosed in the
given beanType.
*
+ * <p>A codec may be keyed on {@link java.util.Optional} itself, not only on
an element type. The
+ * row format normally unwraps an {@code Optional<X>} field to a nullable
{@code X}, mapping
+ * emptiness to the column's null bit; a codec keyed on {@code X} sees only
the present value. A
+ * codec keyed on {@code Optional} instead receives the present and empty
cases, so it can encode
+ * the present-vs-empty distinction in-band (for example a sentinel value)
into a single
+ * non-nullable column to match an external wire format. Key on the element
type unless you need
+ * to control how emptiness itself is encoded.
+ *
+ * <p>For an {@code Optional}-keyed codec a {@code null} field reference is
passed to the codec as
+ * {@code Optional.empty()}, so the codec cannot distinguish a null
reference from an empty
+ * Optional; both round-trip as empty. The codec's {@code decode} must
return a non-null {@code
+ * Optional} for the same reason.
+ *
+ * <p>A {@code beanType}-scoped codec applies to the bean's direct fields
and to the elements of
+ * its collection and map fields. It does not apply to the elements of a
top-level collection or
+ * map encoder, which has no enclosing bean; register against {@code
Object.class} (the
+ * two-argument overload) to handle a type everywhere, including top-level
collection elements.
+ *
* @param beanType the enclosing type to limit this custom codec to
* @param type the type of field to handle
* @param codec the codec to use
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/LazyArrayData.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/LazyArrayData.java
index c75f5b222..46d6ac95d 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/LazyArrayData.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/LazyArrayData.java
@@ -61,9 +61,10 @@ public class LazyArrayData extends AbstractExpression {
*/
public LazyArrayData(
Expression inputArrayData,
+ Class<?> enclosingType,
TypeRef<?> elemType,
SerializableBiFunction<Expression, Expression, Expression>
notNullAction) {
- this(inputArrayData, elemType, notNullAction, null);
+ this(inputArrayData, enclosingType, elemType, notNullAction, null);
}
/**
@@ -72,6 +73,7 @@ public class LazyArrayData extends AbstractExpression {
*/
public LazyArrayData(
Expression inputArrayData,
+ Class<?> enclosingType,
TypeRef<?> elemType,
SerializableBiFunction<Expression, Expression, Expression> notNullAction,
Expression nullValue) {
@@ -79,8 +81,12 @@ public class LazyArrayData extends AbstractExpression {
Preconditions.checkArgument(getRawType(inputArrayData.type()) ==
BinaryArray.class);
this.inputArrayData = inputArrayData;
CustomTypeHandler customTypeHandler =
CustomTypeEncoderRegistry.customTypeHandler();
+ // Resolve the element codec scoped to the same enclosing type the schema
used: the bean for a
+ // bean's collection field, Object for a top-level collection, falling
back to an Object-scoped
+ // registration. A bean-scoped codec applies to that bean's collection
elements, not only its
+ // direct fields, but does not bind to a top-level collection the schema
never applied it to.
CustomCodec<?, ?> customEncoder =
- customTypeHandler.findCodec(BinaryArray.class, elemType.getRawType());
+ customTypeHandler.findCodec(enclosingType, elemType.getRawType());
TypeRef<?> accessType;
if (customEncoder == null) {
accessType = elemType;
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/MapEncoderBuilder.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/MapEncoderBuilder.java
index fa8494418..18abad605 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/MapEncoderBuilder.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/MapEncoderBuilder.java
@@ -58,7 +58,10 @@ public class MapEncoderBuilder extends
BaseBinaryEncoderBuilder {
}
public MapEncoderBuilder(TypeRef<?> clsType, TypeRef<?> beanType) {
- super(new CodegenContext(), beanType);
+ // A top-level map has no enclosing bean, so scope key/value-codec
resolution to Object to match
+ // TypeInference's empty-path enclosing type; beanType still names the
key/value bean for class
+ // naming and schema generation.
+ super(new CodegenContext(), beanType, Object.class);
mapToken = clsType;
ctx.reserveName(ROOT_KEY_WRITER_NAME);
ctx.reserveName(ROOT_VALUE_WRITER_NAME);
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/RowEncoderBuilder.java
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/RowEncoderBuilder.java
index ea7dc25ec..fe1fab919 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/encoder/RowEncoderBuilder.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/encoder/RowEncoderBuilder.java
@@ -306,12 +306,15 @@ class RowEncoderBuilder extends BaseBinaryEncoderBuilder {
Descriptor d = getDescriptorByFieldName(schema.field(i).name());
TypeRef<?> fieldType = d.getTypeRef();
Class<?> rawFieldType = fieldType.getRawType();
+ // Resolve a codec on the raw field type before any Optional unwrap;
keep in lockstep with the
+ // canonical ordering in TypeInference.inferField.
TypeRef<?> columnAccessType = fieldType;
- if (rawFieldType == Optional.class) {
+ TypeRef<?> replacementType =
customTypeHandler.replacementTypeFor(beanClass, rawFieldType);
+ if (replacementType == null && rawFieldType == Optional.class) {
columnAccessType = TypeUtils.getTypeArguments(fieldType).get(0);
+ replacementType =
+ customTypeHandler.replacementTypeFor(beanClass,
columnAccessType.getRawType());
}
- TypeRef<?> replacementType =
- customTypeHandler.replacementTypeFor(beanClass,
columnAccessType.getRawType());
if (replacementType != null) {
columnAccessType = replacementType;
}
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryArrayWriter.java
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryArrayWriter.java
index 7aff4a387..2640bb5aa 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryArrayWriter.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryArrayWriter.java
@@ -148,30 +148,35 @@ public class BinaryArrayWriter extends BinaryWriter {
@Override
public void write(int ordinal, byte value) {
+ checkFixedWidth(ordinal, 1);
setNotNullAt(ordinal);
buffer.putByte(getOffset(ordinal), value);
}
@Override
public void write(int ordinal, boolean value) {
+ checkFixedWidth(ordinal, 1);
setNotNullAt(ordinal);
buffer.putBoolean(getOffset(ordinal), value);
}
@Override
public void write(int ordinal, short value) {
+ checkFixedWidth(ordinal, 2);
setNotNullAt(ordinal);
buffer.putInt16(getOffset(ordinal), value);
}
@Override
public void write(int ordinal, int value) {
+ checkFixedWidth(ordinal, 4);
setNotNullAt(ordinal);
buffer.putInt32(getOffset(ordinal), value);
}
@Override
public void write(int ordinal, float value) {
+ checkFixedWidth(ordinal, 4);
setNotNullAt(ordinal);
buffer.putFloat32(getOffset(ordinal), value);
}
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryWriter.java
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryWriter.java
index 796e95fa5..c580867a3 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryWriter.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/BinaryWriter.java
@@ -148,10 +148,12 @@ public abstract class BinaryWriter {
public abstract void write(int ordinal, BigDecimal input);
public final void write(int ordinal, long value) {
+ checkFixedWidth(ordinal, 8);
buffer.putInt64(getOffset(ordinal), value);
}
public final void write(int ordinal, double value) {
+ checkFixedWidth(ordinal, 8);
buffer.putFloat64(getOffset(ordinal), value);
}
@@ -180,6 +182,16 @@ public abstract class BinaryWriter {
writeAlignedBytes(ordinal, array.getBuffer(), array.getBaseOffset(),
array.getSizeInBytes());
}
+ /**
+ * Hook for layouts that size a fixed-width column's slot from its declared
schema width: verify
+ * the codec wrote exactly that many bytes. A mismatch means a custom
codec's {@code getForyField}
+ * declares a width its {@code encodedType} does not honor, which would
overrun the slot or leave
+ * a schema-trusting reader to read stale bytes. The default layouts pad
every slot to a word and
+ * have no per-field width to check, so this does nothing; the compact row
and array writers
+ * override it.
+ */
+ protected void checkFixedWidth(int ordinal, int writtenBytes) {}
+
/** This operation will increase writerIndex by aligned 8-byte. */
public void writeUnaligned(int ordinal, byte[] input, int offset, int
numBytes) {
final int roundedSize = roundNumberOfBytesToNearestWord(numBytes);
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryArrayWriter.java
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryArrayWriter.java
index 3d93881ab..4729689ee 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryArrayWriter.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryArrayWriter.java
@@ -70,6 +70,21 @@ public class CompactBinaryArrayWriter extends
BinaryArrayWriter {
}
}
+ // The compact array sizes every element slot from the element's declared
schema width, so a
+ // fixed-slot write must match that width exactly; see
BinaryWriter#checkFixedWidth.
+ // Variable-width elements (fixedWidth < 0) store an offset/length pointer
and are not checked.
+ @Override
+ protected void checkFixedWidth(final int ordinal, final int written) {
+ assert fixedWidth < 0 || fixedWidth == written
+ : "element "
+ + ((DataTypes.ListType) field.type()).valueField()
+ + " has a "
+ + fixedWidth
+ + "-byte slot but the codec wrote "
+ + written
+ + " bytes; getForyField width must match encodedType";
+ }
+
@Override
protected int writeNumElements() {
buffer.putInt32(startIndex, numElements);
@@ -81,6 +96,35 @@ public class CompactBinaryArrayWriter extends
BinaryArrayWriter {
return CompactBinaryArray.calculateHeaderInBytes(fixedWidth, numElements,
elementNullable);
}
+ // Binary-valued element codecs write through these paths rather than
write(long), so the
+ // fixed-slot width check must live here too; see CompactBinaryRowWriter for
the row analogue.
+ @Override
+ public void writeUnaligned(
+ final int ordinal, final byte[] input, final int offset, final int
numBytes) {
+ if (fixedWidth > 0) {
+ checkFixedWidth(ordinal, numBytes);
+ }
+ super.writeUnaligned(ordinal, input, offset, numBytes);
+ }
+
+ @Override
+ public void writeUnaligned(
+ final int ordinal, final MemoryBuffer input, final int offset, final int
numBytes) {
+ if (fixedWidth > 0) {
+ checkFixedWidth(ordinal, numBytes);
+ }
+ super.writeUnaligned(ordinal, input, offset, numBytes);
+ }
+
+ @Override
+ public void writeAlignedBytes(
+ final int ordinal, final MemoryBuffer input, final int baseOffset, final
int numBytes) {
+ if (fixedWidth > 0) {
+ checkFixedWidth(ordinal, numBytes);
+ }
+ super.writeAlignedBytes(ordinal, input, baseOffset, numBytes);
+ }
+
@Override
protected void primitiveArrayAdvance(final int size) {
buffer._increaseWriterIndexUnsafe(size);
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryRowWriter.java
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryRowWriter.java
index ed65775a7..c91586675 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryRowWriter.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/row/binary/writer/CompactBinaryRowWriter.java
@@ -245,32 +245,54 @@ public class CompactBinaryRowWriter extends
BaseBinaryRowWriter {
return super.isNullAt(ordinal);
}
+ // The compact layout sizes each fixed slot from its declared schema width,
so a fixed-slot write
+ // must match that width exactly. This cannot be checked statically -- a
MemoryBuffer's length is
+ // not in the Java type system -- so assert it here. Variable-length columns
(fixedWidths == -1)
+ // store an offset/length pointer and are not checked.
+ @Override
+ protected void checkFixedWidth(final int ordinal, final int written) {
+ final int slot = layout.fixedWidths[ordinal];
+ assert slot < 0 || slot == written
+ : "field "
+ + getSchema().field(ordinal)
+ + " has a "
+ + slot
+ + "-byte slot but the codec wrote "
+ + written
+ + " bytes; getForyField width must match encodedType";
+ }
+
@Override
public void write(final int ordinal, final byte value) {
+ checkFixedWidth(ordinal, 1);
final int offset = getOffset(ordinal);
buffer.putByte(offset, value);
}
@Override
public void write(final int ordinal, final boolean value) {
+ checkFixedWidth(ordinal, 1);
final int offset = getOffset(ordinal);
buffer.putBoolean(offset, value);
}
@Override
public void write(final int ordinal, final short value) {
+ checkFixedWidth(ordinal, 2);
final int offset = getOffset(ordinal);
buffer.putInt16(offset, value);
}
@Override
public void write(final int ordinal, final int value) {
+ checkFixedWidth(ordinal, 4);
final int offset = getOffset(ordinal);
buffer.putInt32(offset, value);
}
@Override
public void write(final int ordinal, final float value) {
+ checkFixedWidth(ordinal, 4);
final int offset = getOffset(ordinal);
buffer.putFloat32(offset, value);
}
@@ -280,6 +302,7 @@ public class CompactBinaryRowWriter extends
BaseBinaryRowWriter {
final int ordinal, final byte[] input, final int offset, final int
numBytes) {
final int inlineWidth = layout.fixedWidths[ordinal];
if (inlineWidth > 0) {
+ checkFixedWidth(ordinal, numBytes);
buffer.put(getOffset(ordinal), input, offset, numBytes);
} else {
super.writeUnaligned(ordinal, input, offset, numBytes);
@@ -291,7 +314,7 @@ public class CompactBinaryRowWriter extends
BaseBinaryRowWriter {
final int ordinal, final MemoryBuffer input, final int offset, final int
numBytes) {
final int inlineWidth = layout.fixedWidths[ordinal];
if (inlineWidth > 0) {
- assert inlineWidth == numBytes;
+ checkFixedWidth(ordinal, numBytes);
buffer.copyFrom(getOffset(ordinal), input, offset, numBytes);
} else {
super.writeUnaligned(ordinal, input, offset, numBytes);
@@ -303,6 +326,7 @@ public class CompactBinaryRowWriter extends
BaseBinaryRowWriter {
final int ordinal, final MemoryBuffer input, final int baseOffset, final
int numBytes) {
final int inlineWidth = layout.fixedWidths[ordinal];
if (inlineWidth > 0) {
+ checkFixedWidth(ordinal, numBytes);
buffer.copyFrom(getOffset(ordinal), input, baseOffset, numBytes);
} else {
super.writeAlignedBytes(ordinal, input, baseOffset, numBytes);
diff --git
a/java/fory-format/src/main/java/org/apache/fory/format/type/TypeInference.java
b/java/fory-format/src/main/java/org/apache/fory/format/type/TypeInference.java
index 4617f04fa..1f3843f43 100644
---
a/java/fory-format/src/main/java/org/apache/fory/format/type/TypeInference.java
+++
b/java/fory-format/src/main/java/org/apache/fory/format/type/TypeInference.java
@@ -148,15 +148,16 @@ public class TypeInference {
Class<?> enclosingType = ctx.getEnclosingType().getRawType();
CustomCodec<?, ?> customEncoder =
((CustomTypeHandler)
ctx.getCustomTypeRegistry()).findCodec(enclosingType, rawType);
- if (rawType == Optional.class) {
- TypeRef<?> elemType = TypeUtils.getTypeArguments(typeRef).get(0);
- Field result = inferField(name, elemType, ctx);
- if (result.nullable()) {
- return result;
- }
- // Make it nullable
- return result.withNullable(true);
- } else if (customEncoder != null) {
+ // A codec keyed on the field's raw type wins before structural
unwrapping. For Optional this
+ // means a codec registered on Optional itself owns the whole field,
including absence, and
+ // encodes it in-band into a non-nullable column; only when no such codec
exists does Optional
+ // unwrap to a nullable element. A codec keyed on the element type X (not
Optional) is found
+ // below, after the unwrap, so element codecs keep working unchanged
inside an Optional.
+ //
+ // This is the canonical ordering. RowEncoderBuilder, serializeFor, and
deserializeFor mirror
+ // it; keep all four in lockstep, since they each resolve the codec on a
different
+ // representation (inferred type, field walk, encode codegen, decode
codegen).
+ if (customEncoder != null) {
Field replacementField = customEncoder.getForyField(name);
if (replacementField != null) {
return replacementField;
@@ -165,6 +166,14 @@ public class TypeInference {
if (replacementType != null && !typeRef.equals(replacementType)) {
return inferField(name, replacementType, ctx);
}
+ } else if (rawType == Optional.class) {
+ TypeRef<?> elemType = TypeUtils.getTypeArguments(typeRef).get(0);
+ Field result = inferField(name, elemType, ctx);
+ if (result.nullable()) {
+ return result;
+ }
+ // Make it nullable
+ return result.withNullable(true);
}
if (rawType == boolean.class) {
return DataTypes.notNullField(name, DataTypes.bool());
diff --git
a/java/fory-format/src/test/java/org/apache/fory/format/encoder/CompactCodecTest.java
b/java/fory-format/src/test/java/org/apache/fory/format/encoder/CompactCodecTest.java
index 60c3f0719..763c91518 100644
---
a/java/fory-format/src/test/java/org/apache/fory/format/encoder/CompactCodecTest.java
+++
b/java/fory-format/src/test/java/org/apache/fory/format/encoder/CompactCodecTest.java
@@ -48,6 +48,7 @@ import org.apache.fory.memory.MemoryBuffer;
import org.apache.fory.memory.MemoryUtils;
import org.apache.fory.reflect.TypeRef;
import org.testng.Assert;
+import org.testng.SkipException;
import org.testng.annotations.Test;
public class CompactCodecTest {
@@ -55,6 +56,131 @@ public class CompactCodecTest {
static {
Encoders.registerCustomCodec(UUID.class, new CompactUUIDCodec());
Encoders.registerCustomCodec(NotNullByte.class, new NotNullByteCodec());
+ Encoders.registerCustomCodec(WidthMismatchType.class, Instant.class, new
WidthMismatchCodec());
+ Encoders.registerCustomCodec(
+ WidthMismatchCollection.class, Instant.class, new
WidthMismatchCodec());
+ Encoders.registerCustomCodec(
+ WidthMismatchBinaryCollection.class, UUID.class, new
WidthMismatchBinaryCodec());
+ }
+
+ @Data
+ public static class WidthMismatchType {
+ public Instant t;
+
+ public WidthMismatchType() {}
+ }
+
+ @Data
+ public static class WidthMismatchCollection {
+ public List<Instant> items;
+
+ public WidthMismatchCollection() {}
+ }
+
+ @Data
+ public static class WidthMismatchBinaryCollection {
+ public List<UUID> items;
+
+ public WidthMismatchBinaryCollection() {}
+ }
+
+ // A misconfigured fixed-width-binary element codec: getForyField declares a
16-byte slot but
+ // encode() writes an 8-byte MemoryBuffer, so it routes through the binary
write path
+ // (writeUnaligned(MemoryBuffer)) rather than write(long). The compact array
writer's
+ // checkFixedWidth must catch this on the binary path too, not only on the
primitive path.
+ static class WidthMismatchBinaryCodec implements
CustomCodec.MemoryBufferCodec<UUID> {
+ @Override
+ public Field getForyField(final String fieldName) {
+ return DataTypes.field(fieldName, DataTypes.fixedWidthBinary(16));
+ }
+
+ @Override
+ public MemoryBuffer encode(final UUID value) {
+ final MemoryBuffer result = MemoryBuffer.newHeapBuffer(8);
+ result.putInt64(0, value.getMostSignificantBits());
+ return result;
+ }
+
+ @Override
+ public UUID decode(final MemoryBuffer value) {
+ return new UUID(value.readInt64(), 0);
+ }
+ }
+
+ // A misconfigured codec: getForyField declares int32 (a 4-byte compact
slot) but encodes a Long,
+ // so encode() writes 8 bytes into a 4-byte slot. The compact writer's
checkFixedWidth must catch
+ // this rather than silently overrunning the slot.
+ static class WidthMismatchCodec implements CustomCodec<Instant, Long> {
+ @Override
+ public Field getForyField(final String fieldName) {
+ return DataTypes.field(fieldName, DataTypes.int32());
+ }
+
+ @Override
+ public Long encode(final Instant value) {
+ return value.toEpochMilli();
+ }
+
+ @Override
+ public Instant decode(final Long value) {
+ return Instant.ofEpochMilli(value);
+ }
+
+ @Override
+ public TypeRef<Long> encodedType() {
+ return TypeRef.of(Long.class);
+ }
+ }
+
+ // The width traps are asserts (zero-cost in production), so they only fire
with -ea. Surefire
+ // enables assertions by default; skip rather than spuriously fail if run
without them.
+ private static void requireAssertions() {
+ boolean assertionsEnabled = false;
+ assert assertionsEnabled = true;
+ if (!assertionsEnabled) {
+ throw new SkipException("requires -ea: width trap is assertion-based");
+ }
+ }
+
+ // The width mismatch corrupts the compact layout (the slot is sized from
the declared width);
+ // checkFixedWidth asserts the codec wrote exactly the declared width.
+ @Test(expectedExceptions = AssertionError.class)
+ public void widthMismatchOnCompactWrite() {
+ requireAssertions();
+ final WidthMismatchType bean = new WidthMismatchType();
+ bean.t = Instant.ofEpochMilli(1_700_000_000_000L);
+
Encoders.buildBeanCodec(WidthMismatchType.class).compactEncoding().build().get().toRow(bean);
+ }
+
+ // The same guard must cover compact collection elements, where each slot is
sized from the
+ // element's declared width. A misconfigured element codec writing the wrong
width must trip the
+ // assert on the array write path, not just the row write path.
+ @Test(expectedExceptions = AssertionError.class)
+ public void widthMismatchOnCompactArrayElement() {
+ requireAssertions();
+ final WidthMismatchCollection bean = new WidthMismatchCollection();
+ bean.items = Arrays.asList(Instant.ofEpochMilli(1_700_000_000_000L));
+ Encoders.buildBeanCodec(WidthMismatchCollection.class)
+ .compactEncoding()
+ .build()
+ .get()
+ .toRow(bean);
+ }
+
+ // Same guard on the binary write path: a fixed-width-binary element codec
that encodes the wrong
+ // number of bytes routes through writeUnaligned(MemoryBuffer), not
write(long). The compact array
+ // writer must check the width there too, otherwise an 8-byte write silently
underfills a 16-byte
+ // element slot.
+ @Test(expectedExceptions = AssertionError.class)
+ public void widthMismatchOnCompactArrayBinaryElement() {
+ requireAssertions();
+ final WidthMismatchBinaryCollection bean = new
WidthMismatchBinaryCollection();
+ bean.items = Arrays.asList(new UUID(1, 2));
+ Encoders.buildBeanCodec(WidthMismatchBinaryCollection.class)
+ .compactEncoding()
+ .build()
+ .get()
+ .toRow(bean);
}
@Data
diff --git
a/java/fory-format/src/test/java/org/apache/fory/format/encoder/CustomCodecTest.java
b/java/fory-format/src/test/java/org/apache/fory/format/encoder/CustomCodecTest.java
index 7ea0d0d36..247dd9018 100644
---
a/java/fory-format/src/test/java/org/apache/fory/format/encoder/CustomCodecTest.java
+++
b/java/fory-format/src/test/java/org/apache/fory/format/encoder/CustomCodecTest.java
@@ -20,10 +20,18 @@
package org.apache.fory.format.encoder;
import java.nio.charset.StandardCharsets;
+import java.time.Instant;
import java.time.ZoneId;
import java.util.Arrays;
+import java.util.Collections;
import java.util.Comparator;
+import java.util.HashSet;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
import java.util.SortedSet;
import java.util.TreeSet;
import java.util.UUID;
@@ -32,6 +40,7 @@ import org.apache.fory.format.row.binary.BinaryArray;
import org.apache.fory.format.row.binary.BinaryRow;
import org.apache.fory.format.type.DataTypes;
import org.apache.fory.format.type.Field;
+import org.apache.fory.format.type.TypeInference;
import org.apache.fory.memory.MemoryBuffer;
import org.apache.fory.memory.MemoryUtils;
import org.apache.fory.reflect.TypeRef;
@@ -42,11 +51,25 @@ public class CustomCodecTest {
static {
Encoders.registerCustomCodec(CustomType.class, ZoneId.class, new
ZoneIdEncoder());
+ Encoders.registerCustomCodec(OptionalCodecType.class, ZoneId.class, new
ZoneIdEncoder());
+ Encoders.registerCustomCodec(InstantHolder.class, Optional.class, new
InstantOptionalCodec());
+ Encoders.registerCustomCodec(
+ InstantHolderIface.class, Optional.class, new InstantOptionalCodec());
+ Encoders.registerCustomCodec(
+ InstantCollections.class, Optional.class, new InstantOptionalCodec());
+ Encoders.registerCustomCodec(
+ InstantNoteHolder.class, Optional.class, new InstantOptionalCodec());
Encoders.registerCustomCodec(CustomByteBuf.class, new
CustomByteBufEncoder());
Encoders.registerCustomCodec(CustomByteBuf2.class, new
CustomByteBuf2Encoder());
Encoders.registerCustomCodec(CustomByteBuf3.class, new
CustomByteBuf3Encoder());
Encoders.registerCustomCodec(UUID.class, new UuidEncoder());
Encoders.registerCustomCodec(InterceptedType.class, new
InterceptedTypeEncoder());
+ // Scoped only to its enclosing types, never to Object, so a top-level
List<ScopedElement> must
+ // not pick it up while a ScopedElement bean field does.
+ Encoders.registerCustomCodec(
+ ScopedElement.class, ScopedElement.class, new ScopedElementEncoder());
+ Encoders.registerCustomCodec(
+ ScopedElementHolder.class, ScopedElement.class, new
ScopedElementEncoder());
Encoders.registerCustomCollectionFactory(
SortedSet.class, UUID.class, new SortedSetOfUuidDecoder());
Encoders.registerCustomCollectionFactory(
@@ -107,6 +130,306 @@ public class CustomCodecTest {
public TwoSortedSetsType() {}
}
+ /** A custom-codec'd element type wrapped in Optional, the untested
intersection. */
+ @Data
+ public static class OptionalCodecType {
+ public Optional<ZoneId> zone;
+
+ public OptionalCodecType() {}
+ }
+
+ /** An Optional field whose codec is keyed on Optional itself, owning the
absence encoding. */
+ @Data
+ public static class InstantHolder {
+ public Optional<Instant> at;
+
+ public InstantHolder() {}
+ }
+
+ /** Interface form of {@link InstantHolder}, decoded through the lazy
interface-impl path. */
+ public interface InstantHolderIface {
+ Optional<Instant> getAt();
+ }
+
+ /**
+ * An Optional-keyed non-nullable column ({@code at}) beside a genuinely
nullable column ({@code
+ * note}). Because not every field is non-nullable, the compact layout's
{@code
+ * allFieldsNotNullable} shortcut is off and {@code setNullAt} no longer
throws on any null: a
+ * null {@code at} reference must be caught by serialize-side normalization
to Optional.empty(),
+ * not by the writer's all-not-nullable guard.
+ */
+ @Data
+ public static class InstantNoteHolder {
+ public Optional<Instant> at;
+ public String note;
+
+ public InstantNoteHolder() {}
+ }
+
+ /**
+ * A bean-scoped Optional codec applied to collection elements: a {@code
List} (lazy decode), a
+ * {@code Set} (eager {@code ArrayDataForEach} decode), a nested list, and a
{@code Map} whose
+ * keys and values are both codec-owned (key/value arrays decode through the
same path).
+ */
+ @Data
+ public static class InstantCollections {
+ public List<Optional<Instant>> list;
+ public Set<Optional<Instant>> set;
+ public List<List<Optional<Instant>>> nested;
+ public Map<Optional<Instant>, Optional<Instant>> map;
+
+ public InstantCollections() {}
+ }
+
+ // Element-codec-inside-Optional (codec keyed on ZoneId, not Optional). This
already worked
+ // before Optional-keyed support; these two guard that the inferField
reorder keeps it working.
+ // The Optional-keyed behavior itself is covered by the optionalKeyedCodec*
tests below.
+ @Test
+ public void optionalCustomCodecPresent() {
+ final OptionalCodecType bean = new OptionalCodecType();
+ bean.zone = Optional.of(ZoneId.of("America/Los_Angeles"));
+ final RowEncoder<OptionalCodecType> encoder =
Encoders.bean(OptionalCodecType.class);
+ final BinaryRow row = encoder.toRow(bean);
+ final MemoryBuffer buffer = MemoryUtils.wrap(row.toBytes());
+ row.pointTo(buffer, 0, buffer.size());
+ final OptionalCodecType out = encoder.fromRow(row);
+ Assert.assertEquals(out.zone,
Optional.of(ZoneId.of("America/Los_Angeles")));
+ }
+
+ @Test
+ public void optionalCustomCodecEmpty() {
+ final OptionalCodecType bean = new OptionalCodecType();
+ bean.zone = Optional.empty();
+ final RowEncoder<OptionalCodecType> encoder =
Encoders.bean(OptionalCodecType.class);
+ final BinaryRow row = encoder.toRow(bean);
+ final MemoryBuffer buffer = MemoryUtils.wrap(row.toBytes());
+ row.pointTo(buffer, 0, buffer.size());
+ final OptionalCodecType out = encoder.fromRow(row);
+ Assert.assertEquals(out.zone, Optional.empty());
+ }
+
+ // A codec keyed on Optional itself owns the whole field, including absence,
and encodes
+ // optionality in-band into a single non-nullable column. Here
Optional<Instant> maps to one
+ // int64: empty -> Long.MIN_VALUE, present -> epoch millis. An element codec
on Instant could
+ // not express this, because it never sees the empty case nor controls the
absence encoding.
+ @Test
+ public void optionalKeyedCodecPresent() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = Optional.of(Instant.ofEpochMilli(1_700_000_000_000L));
+ final RowEncoder<InstantHolder> encoder =
Encoders.bean(InstantHolder.class);
+ final InstantHolder out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.at,
Optional.of(Instant.ofEpochMilli(1_700_000_000_000L)));
+ }
+
+ @Test
+ public void optionalKeyedCodecEmpty() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = Optional.empty();
+ final RowEncoder<InstantHolder> encoder =
Encoders.bean(InstantHolder.class);
+ final InstantHolder out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.at, Optional.empty());
+ }
+
+ /**
+ * A null Optional reference (not Optional.empty()) is normalized to empty
before the codec runs,
+ * so it round-trips as empty: the codec cannot distinguish the two.
+ */
+ @Test
+ public void optionalKeyedCodecNullReference() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = null;
+ final RowEncoder<InstantHolder> encoder =
Encoders.bean(InstantHolder.class);
+ final InstantHolder out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.at, Optional.empty());
+ }
+
+ /** The in-band optionality column is non-nullable: the codec, not the
format, owns absence. */
+ @Test
+ public void optionalKeyedCodecColumnNullability() {
+ final Field at =
TypeInference.inferSchema(InstantHolder.class).getFieldByName("at");
+ Assert.assertNotNull(at);
+ Assert.assertFalse(
+ at.nullable(), "Optional-keyed in-band codec must produce a
non-nullable column");
+ }
+
+ // The Optional-keyed codec owns a non-nullable column, so a null reference
cannot be encoded as a
+ // column null bit. The compact writer enforces this by rejecting setNullAt
on a non-nullable
+ // column; these cases also exercise the lazy interface-impl decode path
alongside the default
+ // eager-row class path.
+
+ @Test
+ public void optionalKeyedCodecCompactPresent() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = Optional.of(Instant.ofEpochMilli(1_700_000_000_000L));
+ final RowEncoder<InstantHolder> encoder =
+
Encoders.buildBeanCodec(InstantHolder.class).compactEncoding().build().get();
+ final InstantHolder out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.at,
Optional.of(Instant.ofEpochMilli(1_700_000_000_000L)));
+ }
+
+ @Test
+ public void optionalKeyedCodecCompactEmpty() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = Optional.empty();
+ final RowEncoder<InstantHolder> encoder =
+
Encoders.buildBeanCodec(InstantHolder.class).compactEncoding().build().get();
+ final InstantHolder out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.at, Optional.empty());
+ }
+
+ @Test
+ public void optionalKeyedCodecCompactNullReference() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = null;
+ final RowEncoder<InstantHolder> encoder =
+
Encoders.buildBeanCodec(InstantHolder.class).compactEncoding().build().get();
+ final InstantHolder out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.at, Optional.empty());
+ }
+
+ /**
+ * A null Optional-keyed reference in a bean that also has a nullable
column. The nullable {@code
+ * note} field turns off the compact writer's all-not-nullable guard, so
{@code setNullAt} no
+ * longer throws on the non-nullable {@code at} column. The assertion is on
the encoded row's null
+ * bits, not the decoded value: routing absence through the column null bit
also decodes back to
+ * Optional.empty(), so only the wire state distinguishes correct in-band
normalization (the
+ * {@code at} column is not null and carries the codec's empty sentinel)
from the regression
+ * ({@code at} set to the column null bit). {@code note} is genuinely null,
confirming the guard
+ * is off.
+ */
+ @Test
+ public void optionalKeyedCodecCompactMixedNullability() {
+ final InstantNoteHolder bean = new InstantNoteHolder();
+ bean.at = null;
+ bean.note = null;
+ final RowEncoder<InstantNoteHolder> encoder =
+
Encoders.buildBeanCodec(InstantNoteHolder.class).compactEncoding().build().get();
+ final BinaryRow row = encoder.toRow(bean);
+ final int atOrdinal = encoder.schema().getFieldIndex("at");
+ final int noteOrdinal = encoder.schema().getFieldIndex("note");
+ Assert.assertFalse(
+ row.isNullAt(atOrdinal),
+ "Optional-keyed absence must be encoded in-band, not via the column
null bit");
+ Assert.assertTrue(row.isNullAt(noteOrdinal), "nullable note must use the
column null bit");
+ final InstantNoteHolder out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.at, Optional.empty());
+ Assert.assertNull(out.note);
+ }
+
+ @Test
+ public void optionalKeyedCodecInterfacePresent() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = Optional.of(Instant.ofEpochMilli(1_700_000_000_000L));
+ final InstantHolderIface out = encodeAsIface(bean);
+ Assert.assertEquals(out.getAt(),
Optional.of(Instant.ofEpochMilli(1_700_000_000_000L)));
+ }
+
+ @Test
+ public void optionalKeyedCodecInterfaceEmpty() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = Optional.empty();
+ final InstantHolderIface out = encodeAsIface(bean);
+ Assert.assertEquals(out.getAt(), Optional.empty());
+ }
+
+ @Test
+ public void optionalKeyedCodecInterfaceNullReference() {
+ final InstantHolder bean = new InstantHolder();
+ bean.at = null;
+ final InstantHolderIface out = encodeAsIface(bean);
+ Assert.assertEquals(out.getAt(), Optional.empty());
+ }
+
+ // A bean-scoped codec applies to the bean's collection elements, not only
its direct fields.
+ // The List path decodes lazily (LazyArrayData); the Set path decodes
eagerly (ArrayDataForEach).
+
+ @Test
+ public void optionalKeyedCodecInList() {
+ final InstantCollections bean = new InstantCollections();
+ bean.list =
+ Arrays.asList(
+ Optional.of(Instant.ofEpochMilli(7)), Optional.empty(),
Optional.of(Instant.EPOCH));
+ final RowEncoder<InstantCollections> encoder =
Encoders.bean(InstantCollections.class);
+ final InstantCollections out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.list, bean.list);
+ }
+
+ @Test
+ public void optionalKeyedCodecInSet() {
+ final InstantCollections bean = new InstantCollections();
+ bean.set = new
HashSet<>(Arrays.asList(Optional.of(Instant.ofEpochMilli(9)),
Optional.empty()));
+ final RowEncoder<InstantCollections> encoder =
Encoders.bean(InstantCollections.class);
+ final InstantCollections out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.set, bean.set);
+ }
+
+ @Test
+ public void optionalKeyedCodecInNestedList() {
+ final InstantCollections bean = new InstantCollections();
+ bean.nested =
+ Arrays.asList(
+ Arrays.asList(Optional.of(Instant.ofEpochMilli(1)),
Optional.empty()),
+ Collections.singletonList(Optional.empty()));
+ final RowEncoder<InstantCollections> encoder =
Encoders.bean(InstantCollections.class);
+ final InstantCollections out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.nested, bean.nested);
+ }
+
+ @Test
+ public void optionalKeyedCodecInMap() {
+ final InstantCollections bean = new InstantCollections();
+ // A map key cannot be empty (keys are non-nullable), but a value can.
+ bean.map = new LinkedHashMap<>();
+ bean.map.put(Optional.of(Instant.ofEpochMilli(3)),
Optional.of(Instant.ofEpochMilli(4)));
+ bean.map.put(Optional.of(Instant.ofEpochMilli(5)), Optional.empty());
+ final RowEncoder<InstantCollections> encoder =
Encoders.bean(InstantCollections.class);
+ final InstantCollections out = roundTrip(encoder, bean);
+ Assert.assertEquals(out.map, bean.map);
+ }
+
+ // Codec-resolution scope for collection elements. A codec scoped to an
enclosing bean reaches
+ // that bean's collection/field elements, but a top-level collection encoder
has no enclosing
+ // bean,
+ // so it resolves elements against Object (matching the inferred schema) and
a narrowly-scoped
+ // codec must not bind. Otherwise codegen would apply a codec the schema
never did, corrupting the
+ // row.
+
+ @Test
+ public void scopedCodecAppliesToBeanField() {
+ final ScopedElementHolder bean = new ScopedElementHolder();
+ bean.e = new ScopedElement(1);
+ final RowEncoder<ScopedElementHolder> encoder =
Encoders.bean(ScopedElementHolder.class);
+ final ScopedElementHolder out = roundTrip(encoder, bean);
+ // encode adds 100, decode is identity: the codec ran.
+ Assert.assertEquals(out.e.v, 101);
+ }
+
+ @Test
+ public void scopedCodecSkipsTopLevelCollection() {
+ final ArrayEncoder<List<ScopedElement>> encoder =
+ Encoders.arrayEncoder(new TypeRef<List<ScopedElement>>() {});
+ final List<ScopedElement> in = Arrays.asList(new ScopedElement(1), new
ScopedElement(2));
+ final BinaryArray array = encoder.toArray(in);
+ final List<ScopedElement> out = encoder.fromArray(array);
+ // The narrowly-scoped codec is not enclosed by any bean here, so elements
round-trip unchanged.
+ Assert.assertEquals(out, in);
+ }
+
+ /** Encode a concrete {@link InstantHolder} and decode it through the
interface impl path. */
+ private static InstantHolderIface encodeAsIface(InstantHolder bean) {
+ final BinaryRow row = Encoders.bean(InstantHolder.class).toRow(bean);
+ final MemoryBuffer buffer = MemoryUtils.wrap(row.toBytes());
+ row.pointTo(buffer, 0, buffer.size());
+ return Encoders.bean(InstantHolderIface.class).fromRow(row);
+ }
+
+ private static <T> T roundTrip(RowEncoder<T> encoder, T bean) {
+ final BinaryRow row = encoder.toRow(bean);
+ final MemoryBuffer buffer = MemoryUtils.wrap(row.toBytes());
+ row.pointTo(buffer, 0, buffer.size());
+ return encoder.fromRow(row);
+ }
+
@Test
public void testCustomTypes() {
final CustomType bean = new CustomType();
@@ -187,6 +510,36 @@ public class CustomCodecTest {
}
}
+ /**
+ * Encodes {@code Optional<Instant>} in-band: empty becomes {@code
Long.MIN_VALUE}, present
+ * becomes epoch millis, in a single non-nullable int64 column. Keyed on
Optional so it owns
+ * absence; an element codec on Instant could not, as it never sees the
empty case.
+ */
+ @SuppressWarnings("rawtypes") // keyed on Optional.class, whose
Class<Optional> is raw
+ static class InstantOptionalCodec implements CustomCodec<Optional, Long> {
+ private static final long ABSENT = Long.MIN_VALUE;
+
+ @Override
+ public Field getForyField(final String fieldName) {
+ return DataTypes.notNullField(fieldName, DataTypes.int64());
+ }
+
+ @Override
+ public Long encode(final Optional value) {
+ return value.isPresent() ? ((Instant) value.get()).toEpochMilli() :
ABSENT;
+ }
+
+ @Override
+ public Optional<Instant> decode(final Long value) {
+ return value == ABSENT ? Optional.empty() :
Optional.of(Instant.ofEpochMilli(value));
+ }
+
+ @Override
+ public TypeRef<Long> encodedType() {
+ return TypeRef.of(Long.class);
+ }
+ }
+
static class CustomByteBufEncoder implements
CustomCodec.MemoryBufferCodec<CustomByteBuf> {
@Override
public MemoryBuffer encode(final CustomByteBuf value) {
@@ -271,6 +624,58 @@ public class CustomCodecTest {
}
}
+ /**
+ * A bean-scoped intercepting codec used to prove codec-resolution scope.
Registered only on
+ * enclosing type {@link ScopedElement} itself, never on {@code Object}, so
it must reach a {@code
+ * ScopedElement} field of that bean but must NOT reach the elements of a
top-level {@code
+ * List<ScopedElement>} encoder, which has no enclosing bean.
+ */
+ @Data
+ public static class ScopedElement {
+ public int v;
+
+ public ScopedElement() {}
+
+ public ScopedElement(final int v) {
+ this.v = v;
+ }
+ }
+
+ /** A bean carrying a {@link ScopedElement} field, so the codec's enclosing
type is itself. */
+ @Data
+ public static class ScopedElementHolder {
+ public ScopedElement e;
+
+ public ScopedElementHolder() {}
+ }
+
+ // Encodes to a terminal Integer (not back to the bean, which would make
schema inference
+ // recurse).
+ // encode adds 100 and decode does not subtract, so a round trip nets +100
IFF the codec was
+ // applied; when it is not applied ScopedElement round-trips structurally
with v unchanged. This
+ // makes codec-resolution scope observable from behavior.
+ static class ScopedElementEncoder implements CustomCodec<ScopedElement,
Integer> {
+ @Override
+ public Field getForyField(final String fieldName) {
+ return DataTypes.field(fieldName, DataTypes.int32());
+ }
+
+ @Override
+ public TypeRef<Integer> encodedType() {
+ return TypeRef.of(Integer.class);
+ }
+
+ @Override
+ public Integer encode(final ScopedElement value) {
+ return value.v + 100;
+ }
+
+ @Override
+ public ScopedElement decode(final Integer value) {
+ return new ScopedElement(value);
+ }
+ }
+
public interface InterceptedType {
int f1();
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]