alamb commented on code in PR #19389:
URL: https://github.com/apache/datafusion/pull/19389#discussion_r2640017738


##########
datafusion/functions-nested/src/planner.rs:
##########
@@ -148,7 +148,22 @@ impl ExprPlanner for FieldAccessPlanner {
 
         match field_access {
             // expr["field"] => get_field(expr, "field")
+            // If expr is already a get_field call, merge to: get_field(base, 
field1, field2, ...)
             GetFieldAccess::NamedStructField { name } => {
+                // Check if expr is already a get_field call - if so, merge 
field names
+                if let Expr::ScalarFunction(ScalarFunction { func, args }) = 
expr {
+                    if func.name() == "get_field" {

Review Comment:
   since this is in the nested expr, I think you could potentially us `as_any` 
/ `downcast` matching here instead of name matching
   
   I don't think it will cause a problem in practice though, as if this code is 
running, it likely means people are using this libraray anyways so `get_field` 
will match the same



##########
datafusion/functions/src/core/getfield.rs:
##########
@@ -58,21 +60,21 @@ use std::sync::Arc;
 | data                  |
 | apache                |
 +-----------------------+
-> select get_field((select struct(idx, v) from t), 'c1');
-+-----------------------+
-| struct(t.idx,t.v)[c1] |
-+-----------------------+
-| fusion                |
-| arrow                 |
-+-----------------------+
+> -- Nested field access

Review Comment:
   I think the old example was showing selecting a field other than fir first 
-- perhaps we can either leave it (and add the new one), 
   
   Also for the new example, I think it would help also to show what 
`struct(struct(1 as inner_val) as outer)` actually is 🤔 
   
   I think DataFusion now supports some nicer syntax, so we could also rewrite 
the examples using that, somethign like
   
   ```sql
   > create table t as values ({c0: 'data', c1: 'fusion'}), ({c0: 'apache', c1: 
'arrow'});;
   0 row(s) fetched.
   Elapsed 0.006 seconds.
   
   > select * from t;
   +-------------------------+
   | column1                 |
   +-------------------------+
   | {c0: data, c1: fusion}  |
   | {c0: apache, c1: arrow} |
   +-------------------------+
   2 row(s) fetched.
   Elapsed 0.002 seconds.
   
   > select column1['c0'] from t;
   +---------------+
   | t.column1[c0] |
   +---------------+
   | data          |
   | apache        |
   +---------------+
   2 row(s) fetched.
   Elapsed 0.006 seconds.
   ```



##########
datafusion/functions-nested/src/planner.rs:
##########
@@ -148,7 +148,22 @@ impl ExprPlanner for FieldAccessPlanner {
 
         match field_access {
             // expr["field"] => get_field(expr, "field")
+            // If expr is already a get_field call, merge to: get_field(base, 
field1, field2, ...)
             GetFieldAccess::NamedStructField { name } => {
+                // Check if expr is already a get_field call - if so, merge 
field names
+                if let Expr::ScalarFunction(ScalarFunction { func, args }) = 
expr {
+                    if func.name() == "get_field" {
+                        // Merge: get_field(base, f1, f2) + f3 => 
get_field(base, f1, f2, f3)
+                        let mut new_args = args;

Review Comment:
   I think you could avoid this copy/clone by reusing `args` 🤔 
   
   



##########
datafusion/functions-nested/src/planner.rs:
##########
@@ -148,7 +148,22 @@ impl ExprPlanner for FieldAccessPlanner {
 
         match field_access {
             // expr["field"] => get_field(expr, "field")
+            // If expr is already a get_field call, merge to: get_field(base, 
field1, field2, ...)

Review Comment:
   I think initially, this is fine to simplify in the planning, but it doesn't 
handle:
   1. When the fields are programatically created (like get_field(get_field(x, 
'foo'), 'bar')
   2. When constant folding could allow things to work (for example, 
`get_field(x, 'foo'||'bar')`
   
   
   If you change this folding into the `simplify` method on `GetField`  I think 
you could also handle the other two cases



##########
datafusion/functions/src/core/getfield.rs:
##########
@@ -86,10 +88,139 @@ impl Default for GetFieldFunc {
     }
 }
 
+/// Extract a single field from a struct or map array
+fn extract_single_field(base: ColumnarValue, name: ScalarValue) -> 
Result<ColumnarValue> {
+    let arrays = ColumnarValue::values_to_arrays(&[base])?;
+    let array = Arc::clone(&arrays[0]);
+
+    fn process_map_array(

Review Comment:
   Unless there is a good reason,  suggest making these their own free 
functions rather than inner functions - it will make them easier to find as 
well potentially indent level



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to