Thanks, Konstantine. Overall, this KIP looks interesting and really useful,
and for the most part is spot on. I do have a number of questions/comments
about specifics:

   1. The topic records have a value that includes the connector name, task
   number that last reported the topic is used, and the topic name. There's no
   mention of record timestamps, but I wonder if it'd be useful to record
   this. One challenge might be that a connector does not write to a topic for
   a while or the task remains running for long periods of time and therefore
   the worker doesn't record that this topic has been newly written to since
   it the task was restarted. IOW, the semantics of the timestamp may be a bit
   murky. Have you thought about recording the timestamp, and if so what are
   the pros and cons?
   - The "Recording active topics" section says the following:
      "As soon as a worker detects the addition of a topic to a connector's
      set of active topics, all the connector's tasks that inspect
source or sink
      records will cease to post update messages to the status.storage.topic."
      This probably means the timestamp won't be very useful.
   2. The KIP says "the Kafka record value stores the ID of the task that
   succeeded to store a topic status record last." However, this is a bit
   unclear: is it really storing the last task that successfully wrote to that
   topic (as this would require very frequent writes to this topic), or is it
   more that this is the task that was last *recorded* as having written to
   the topic? (Here, "recorded" could be a bit of a gray area, since this
   would depend on the how the worker periodically records this information.)
   Any kind of clarity here might be helpful.
   3. In the "Recording active topics" section (and the surrounding
   sections), the "task" is used ambiguously. For example, "when its tasks
   start processing their first records ... these tasks will start inspecting
   which is the Kafka topic of each of these records". IIUC, the first "task"
   mentioned is the connector's task, and the second is the worker's task. Do
   we need to distinguish this more clearly?
   4. Maybe I missed it, but does this KIP explicitly say that the
   Connector API is unchanged? It's probably worth pointing out to help
   assuage any concerns that connector implementations have to change to make
   use of this feature.
   5. In the "Resetting a connector's set of active topics" section the
   behavior is not exactly clear. Consider a user running connector "A", the
   connector has been fully started and is processing records, and the worker
   has recorded topic usage records. Then the user resets the active topics
   for connector A while the connector is still running? If the connector
   writes to no new topics, before the tasks are rebalanced then is it correct
   that Connect would report no active topics? And after the tasks are
   rebalance, will the worker record any topics used by connector A?
   6. In the "Restaring" (misspelled) section: "Reconfiguring a source
   connector has also no altering effect for a source connector. However, when
   reconfiguring a sink connector if the new configuration no longer includes
   any of the previously tracked topics, these topics will be removed from the
   set of active topics for this sink connector by appending tombstone
   messages appropriately after the reconfiguration of the connector." Would
   it be better to not automatically reset connector's active topics when a
   sink connector is restarted? Isn't that more consistent with the
   "Resetting" behavior and the goals at the top of the KIP: "it'd be useful
   for users, operators and applications to know which are the topics that a
   connector has used since it was first created"?
   7. The `PUT /connectors/{name}/topics/reset` endpoint "this request can
   be reapplied after the deletion of the connector". IOW, even though
   connector with that name doesn't exist, we can still make this request? How
   does this compare with other methods such as "status"?
   8. What are the security implications of this proposal?

As you can see, most of these can probably be addressed without much work.

Best regards,

Randall

On Mon, Jan 13, 2020 at 11:05 PM Konstantine Karantasis <
konstant...@confluent.io> wrote:

> Hi all.
>
> I just posted KIP-558: Track the set of actively used topics by connectors
> in Kafka Connect
>
> Wiki link here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-558%3A+Track+the+set+of+actively+used+topics+by+connectors+in+Kafka+Connect
>
> I think it's a nice extension to follow up on KIP-158 and a useful feature
> to the ever increasing number of applications that are built around Kafka
> Connect.
> Would love to hear what you think.
>
> Best,
> Konstantine
>

Reply via email to