Hi I was reading following tutorial https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/08%20Write%20Output%20To%20S3.html
of streaming data to s3 of databricks_guide and it states that sometimes I need to do compaction of small files(e.g. from spark streaming) into compacted big file(I understand why - better read performance, to solve "many small files" problem etc) My questions are: 1. what happens when I have big parquet file partitioned by some field and I want to append new small files into this big file? Is spark overrides whole data or it can append the new data at the end? 2. while appending process happens - how can I ensure that readers of big parquet files are not blocked and won't get any errors?(i.e. are files are "available" when appending new data to them?) I will highly appreciate any pointers thanks in advance, Igor