p-powell commented on issue #5351: URL: https://github.com/apache/hudi/issues/5351#issuecomment-1105389992
@codope I built from master and takes 492 secs. Still seems slow. We have an internal file(2.6m rows ~300col) takes 16min to load into a new table(one partition). If we dump the same df to parquet(gzip) using pandas it takes 2m 4secs. should `df_id.write.format("parquet").mode(Overwrite).save(parquetBasePath)` times be similar to pandas parquet write times? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org