Appending a static dataframe to a stream create Parquet file fails

eugen . wintersberger Thu, 02 Sep 2021 06:04:05 -0700

Hi all,
  I recently stumbled about a rather strange  problem with streaming
sources in one of my tests. I am writing a Parquet file from a
streaming source and subsequently try to append the same data but this
time from a static dataframe. Surprisingly, the number of rows in the
Parquet file remains the same after the append operation. 
Here is the relevant code


  "Appending data from static dataframe" must "produce twice as much data" in {
    logLinesStream.writeStream
      .format("parquet")
      .option("path", path.toString)
      .outputMode("append")
      .start()
      .processAllAvailable()
    spark.read.format("parquet").load(path.toString).count mustBe 1159

    logLinesDF.write.format("parquet").mode("append").save(path.toString)
    spark.read.format("parquet").load(path.toString).count mustBe 2*1159
  }

Does anyone have an idea what I am doing wrong here?

thanks in advance
 Eugen Wintersberger

Appending a static dataframe to a stream create Parquet file fails

Reply via email to