fresh-borzoni commented on code in PR #365:
URL: https://github.com/apache/fluss-rust/pull/365#discussion_r2840898915
##########
crates/fluss/src/row/mod.rs:
##########
@@ -63,58 +65,58 @@ pub trait InternalRow: Send + Sync {
fn is_null_at(&self, pos: usize) -> bool;
/// Returns the boolean value at the given position
- fn get_boolean(&self, pos: usize) -> bool;
+ fn get_boolean(&self, pos: usize) -> Result<bool>;
Review Comment:
I don't actually see the benefit for this, so we save on `?`, but at the
same time here it's not the same as in rust-postgeres, as in there you can
always trust the row - `Row` is a single concrete type with always-trusted
server data.
in fluss-rust, users input constructs GenericRow- so it's susceptible to
panics, so it makes sense to return Result for this and not have mirror surface.
##########
crates/fluss/src/row/column.rs:
##########
@@ -187,102 +190,130 @@ impl InternalRow for ColumnarRow {
self.record_batch.column(pos).is_null(self.row_id)
}
- fn get_boolean(&self, pos: usize) -> bool {
- self.record_batch
+ fn get_boolean(&self, pos: usize) -> Result<bool> {
+ Ok(self
+ .record_batch
.column(pos)
.as_boolean()
Review Comment:
`as_boolean_opt()` with `.ok_or_else` following the pattern as other getters?
##########
crates/fluss/src/row/mod.rs:
##########
@@ -127,98 +129,149 @@ pub struct GenericRow<'a> {
pub values: Vec<Datum<'a>>,
}
+impl<'a> GenericRow<'a> {
+ fn get_value(&self, pos: usize) -> Result<&Datum<'a>> {
+ self.values.get(pos).ok_or_else(|| IllegalArgument {
+ message: format!(
+ "position {pos} out of bounds (row has {} fields)",
+ self.values.len()
+ ),
+ })
+ }
+
+ fn try_convert<T: TryFrom<&'a Datum<'a>>>(
+ &'a self,
+ pos: usize,
+ expected_type: &str,
+ ) -> Result<T> {
+ let datum = self.get_value(pos)?;
+ T::try_from(datum).map_err(|_| IllegalArgument {
+ message: format!(
+ "type mismatch at position {pos}: expected {expected_type},
got {datum:?}"
+ ),
+ })
+ }
+}
+
impl<'a> InternalRow for GenericRow<'a> {
fn get_field_count(&self) -> usize {
self.values.len()
}
fn is_null_at(&self, pos: usize) -> bool {
- self.values
- .get(pos)
- .expect("position out of bounds")
- .is_null()
+ self.values.get(pos).is_none_or(|v| v.is_null())
Review Comment:
+1 as the result of this change we would have inconsistent behavious across
different implementations of InternalRow:
- GenericRow::is_null_at(OOB) -> true
- ColumnarRow::is_null_at(OOB) -> panics (Arrow's column(pos) panics)
- CompactedRow::is_null_at(OOB) -> panics (slice indexing fields()[pos])
and this issue, we basically produce Null for OOB:
```rust
impl FieldGetter {
pub fn get_field<'a>(&self, row: &'a dyn InternalRow) -> Datum<'a> {
match self {
FieldGetter::Nullable(getter) => {
if row.is_null_at(getter.pos()) {
Datum::Null
} else {
getter.get_field(row)
}
}
FieldGetter::NonNullable(getter) => getter.get_field(row),
}
}
```
While `check_field_count` guards the `append()/upsert()` paths, is_null_at
is a public trait method. Users calling it directly get a lie instead of an
error.
I would make get_ to return `Result`, but is_null_at leave as strict - it's
programming error, so it's fine to panic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]