Hi,
I managed to make mine work using the *foreachBatch function *in
writeStream.
"foreach" performs custom write logic on each row and "foreachBatch"
performs custom write logic on each micro-batch through SendToBigQuery
function here
foreachBatch(SendToBigQuery) expects 2 parameters, first: mi
Thanks Jungtaek.
I am stuck on how to add rows to BigQuery. Spark API in PySpark does it
fine. However, we are talking about structured streaming with PySpark.
This is my code that reads and display data on the console fine
class MDStreaming:
def __init__(self, spark_session,spark_context):
If your code doesn't require "end to end exactly-once" then you could
leverage foreachBatch which enables you to use batch sink.
If your code requires "end to end exactly-once", then well, that's the
different story. I'm not familiar with BigQuery and even have no idea how
sink is implemented, but