p-powell commented on issue #5351:
URL: https://github.com/apache/hudi/issues/5351#issuecomment-1105389992

   @codope  I built from master and takes 492 secs. Still seems slow.  
   
   We have an internal file(2.6m rows ~300col) takes 16min to load into a new 
table(one partition). If we dump the same df to parquet(gzip) using pandas it 
takes 2m 4secs. 
   
   should `df_id.write.format("parquet").mode(Overwrite).save(parquetBasePath)` 
times be similar to pandas parquet write times?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to