Re: [Structured spak streaming] How does cassandra connector readstream deals with deleted record

Russell Spitzer Fri, 26 Jun 2020 20:06:40 -0700

The connector uses Java driver cql request under the hood which means it
responds to the changing database like a normal application would. This
means retries may result in a different set of data than the original
request if the underlying database changed.


On Fri, Jun 26, 2020, 9:42 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> I'm not sure how it is implemented, but in general I wouldn't expect such
> behavior on the connectors which read from non-streaming fashion storages.
> The query result may depend on "when" the records are fetched.
>
> If you need to reflect the changes in your query you'll probably want to
> find a way to retrieve "change logs" from your external storage (or how
> your system/product can also produce change logs if your external storage
> doesn't support it), and adopt it to your query. There's a keyword you can
> google to read further, "Change Data Capture".
>
> Otherwise, you can apply the traditional approach, run a batch query
> periodically and replace entire outputs.
>
> On Thu, Jun 25, 2020 at 1:26 PM Rahul Kumar <rk20.stor...@gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> I was wondering, how Cassandra spark connector deals with deleted/updated
>> record while readstream operation. If the record was already fetched in
>> spark memory, and it got updated or deleted in database, does it get
>> reflected in streaming join?
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

Re: [Structured spak streaming] How does cassandra connector readstream deals with deleted record

Reply via email to