Apologies if this has been discussed before, I searched but couldn’t find it.
What is the rationale behind picking backticks for identifier delimiters in
spark?
In the [SQL 92 spec](
https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), the delimited
identifier is unambiguously defined to
Thank you for the clarification, Jungtaek 🙏 Indeed, it doesn't sound like
a highly demanded feature from the end users, haven't seen that a lot on
StackOverflow or mailing lists. I was just curious about the reasons.
Using the arbitrary stateful processing could be indeed a workaround! But
IMHO it
We wouldn't like to expose the internal mechanism to the public.
As you are a very detail oriented engineer tracking major changes, you
might notice that we "changed" the definition of late record while fixing
late records. Previously the late record is defined as a record having
event time timest
slight correction/clarification: We now take the "previous" watermark to
determine the late record, because they are valid inputs for non-first
stateful operators dropping records based on the same criteria would drop
valid records from previous (upstream) stateful operators. Please look back
which
I like some way to expose watermarks to the user. It does affect
the processing of the records, so it is relevant for the users.
`current_watermark()` is a good option.
The implementation of this might be engine specific. But it is a very
relevant concept for authors of streaming pipelines.
Ideally