[ https://issues.apache.org/jira/browse/ARROW-17915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Kietzman resolved ARROW-17915. ---------------------------------- Fix Version/s: 10.0.0 Resolution: Fixed Issue resolved by pull request 14295 [https://github.com/apache/arrow/pull/14295] > [C++] Error when using Substrait ProjectRel > ------------------------------------------- > > Key: ARROW-17915 > URL: https://issues.apache.org/jira/browse/ARROW-17915 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Dewey Dunnington > Assignee: Vibhatha Lakmal Abeykoon > Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > After ARROW-16989 and ARROW-15584, there is new behaviour with ProjectRel. I > implemented a solution that worked with DuckDB's consumer in > https://github.com/voltrondata/substrait-r/pull/181, but when I try with > Arrow's compiler I get an error: > {code:R} > library(arrow, warn.conflicts = FALSE) > #> Some features are not enabled in this build of Arrow. Run `arrow_info()` > for more information. > plan_as_json <- '{ > "extensionUris": [ > { > "extensionUriAnchor": 1, > "uri": > "https://github.com/apache/arrow/blob/master/format/substrait/extension_types.yaml" > } > ], > "relations": [ > { > "rel": { > "project": { > "common": {"emit": {"outputMapping": [2, 3]}}, > "input": { > "read": { > "baseSchema": { > "names": ["int", "dbl"], > "struct": {"types": [{"i32": {}}, {"fp64": {}}]} > }, > "localFiles": { > "items": [ > { > "uriFile": "file://THIS_IS_THE_TEMP_FILE", > "parquet": {} > } > ] > } > } > }, > "expressions": [ > {"selection": {"directReference": {"structField": {"field": 1}}}}, > {"selection": {"directReference": {"structField": {"field": 0}}}} > ] > } > } > } > ] > }' > temp_parquet <- tempfile() > write_parquet(data.frame(int = integer(), dbl = double()), temp_parquet) > plan_as_json <- gsub("THIS_IS_THE_TEMP_FILE", temp_parquet, plan_as_json) > arrow:::do_exec_plan_substrait(plan_as_json) > #> Error: Invalid: Invalid column index to add field. > #> > /Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/relation_internal.cc:338 > project_schema->AddField( num_columns + > static_cast<int>(project.expressions().size()) - 1, std::move(project_field)) > #> > /Users/dewey/Desktop/rscratch/arrow/cpp/src/arrow/engine/substrait/serde.cc:156 > FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(), > ext_set, conversion_options) > {code} > It's admittedly a goofy thing to do: to compute a new column that is an > identical copy of an existing column and then discard the original. I can and > should simplify the substrait that I'm generating, but maybe this is also > valid substrait that should be accepted? -- This message was sent by Atlassian Jira (v8.20.10#820010)