ZihuanLing commented on PR #1302: URL: https://github.com/apache/datafusion-ballista/pull/1302#issuecomment-3239033243
@milenkovicm hi, I've applied this patch to my code, and tried to run tpcds query 44 and 54, I meet this error : Error: ArrowError(ExternalError(Execution("Job mAxRhlg failed: Job failed due to stage 18 failed: Task failed due to runtime execution error: DataFusionError(Internal(\"Invalid NestedLoopJoinExec, the output partition count of the left child must be 1,consider using CoalescePartitionsExec or the EnforceDistribution rule\"))\n")), None) here's some logs maybe helpful: (just part of logs) ``` =========UnResolvedStage[stage_id=18.0, children=2]========= Inputs{16: StageOutput { partition_locations: {}, complete: false }, 17: StageOutput { partition_locations: {}, complete: false }} ShuffleWriterExec: partitions:None NestedLoopJoinExec: join_type=Inner, filter=CAST(d_month_seq@0 AS Int64) >= date_dim.d_month_seq + Int64(1)@1, projection=[c_customer_sk@0, ss_ext_sales_price@1, d_month_seq@2] CoalescePartitionsExec UnresolvedShuffleExec: partitions=Hash([UnKnownColumn { name: "d_date_sk@3" }], 4) AggregateExec: mode=FinalPartitioned, gby=[date_dim.d_month_seq + Int64(1)@0 as date_dim.d_month_seq + Int64(1)], aggr=[] CoalesceBatchesExec: target_batch_size=4096 UnresolvedShuffleExec: partitions=Hash([Column { name: "date_dim.d_month_seq + Int64(1)", index: 0 }], 4) ... =========UnResolvedStage[stage_id=18.0, children=2]========= Inputs{17: StageOutput { partition_locations: {}, complete: false }, 16: StageOutput { partition_locations: {}, complete: false }} ShuffleWriterExec: partitions:None NestedLoopJoinExec: join_type=Inner, filter=CAST(d_month_seq@0 AS Int64) >= date_dim.d_month_seq + Int64(1)@1, projection=[c_customer_sk@0, ss_ext_sales_price@1, d_month_seq@2] CoalescePartitionsExec UnresolvedShuffleExec: partitions=Hash([UnKnownColumn { name: "d_date_sk@3" }], 4) AggregateExec: mode=FinalPartitioned, gby=[date_dim.d_month_seq + Int64(1)@0 as date_dim.d_month_seq + Int64(1)], aggr=[] CoalesceBatchesExec: target_batch_size=4096 UnresolvedShuffleExec: partitions=Hash([Column { name: "date_dim.d_month_seq + Int64(1)", index: 0 }], 4) ... =========ResolvedStage[stage_id=18.0, partitions=1]========= ShuffleWriterExec: partitions:Some(Hash([Column { name: "p_promo_sk", index: 0 }], 4)) DataSourceExec: file_groups={1 group: [[opt/lzh/10G/promotion.dat]]}, projection=[p_promo_sk], file_type=csv, has_header=false ... =========UnResolvedStage[stage_id=18.0, children=1]========= Inputs{17: StageOutput { partition_locations: {}, complete: false }} ShuffleWriterExec: partitions:None SortExec: expr=[return_ratio@1 ASC NULLS LAST], preserve_partitioning=[true] ProjectionExec: expr=[ss_item_sk@0 as item, CAST(sum(coalesce(sr.sr_return_quantity,Int64(0)))@1 AS Decimal128(15, 4)) / CAST(sum(coalesce(sts.ss_quantity,Int64(0)))@2 AS Decimal128(15, 4)) as return_ratio, CAST(sum(coalesce(sr.sr_return_amt,Int64(0)))@3 AS Decimal128(15, 4)) / CAST(sum(coalesce(sts.ss_net_paid,Int64(0)))@4 AS Decimal128(15, 4)) as currency_ratio] AggregateExec: mode=FinalPartitioned, gby=[ss_item_sk@0 as ss_item_sk], aggr=[sum(coalesce(sr.sr_return_quantity,Int64(0))), sum(coalesce(sts.ss_quantity,Int64(0))), sum(coalesce(sr.sr_return_amt,Int64(0))), sum(coalesce(sts.ss_net_paid,Int64(0)))] CoalesceBatchesExec: target_batch_size=4096 UnresolvedShuffleExec: partitions=Hash([Column { name: "ss_item_sk", index: 0 }], 4) ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org