To be more explicit, the easiest thing to do in the short term is use
your own instance of KafkaConsumer to get the offsets for the
timestamps you're interested in, using offsetsForTimes, and use those
for the start / end offsets. See
https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/c
Hello,
sorry for my late answer.
You're right, what I'm doing is a one time query, not a structured
streaming. Probably it will be best to describe my use case:
I'd like to expose live data (via jdbc/odbc) residing in Kafka with the
power of spark's distributed sql engine. As jdbc server I use spa
Hey Tomas,
>From your description, you just ran a batch query rather than a Structured
Streaming query. The Kafka data source doesn't support filter push down
right now. But that's definitely doable. One workaround here is setting
proper "startingOffsets" and "endingOffsets" options when loading
Hi Tomas,
As a general note don't fully understand your use-case. You've mentioned
structured streaming but your query is more like a one-time SQL statement.
Kafka doesn't support predicates how it's integrated with spark. What can
be done from spark perspective is to look for an offset for a spec
Hello,
I'm trying to read Kafka via spark structured streaming. I'm trying to read
data within specific time range:
select count(*) from kafka_table where timestamp > cast('2019-01-23 1:00' as
TIMESTAMP) and timestamp < cast('2019-01-23 1:01' as TIMESTAMP);
The problem is that timestamp query i