The JDBC connector I started implementing just handles this manually, and isn't much code (and could be made into a simple utility): https://github.com/confluentinc/copycat-jdbc/blob/master/src/main/java/io/confluent/copycat/jdbc/JdbcSourceTask.java#L152
Given the current APIs, sources can just handle this on their own if they want to because the expectation is that when we call `poll()` on them, they can hold on to control of the thread indefinitely. So I think this is mainly a question for sinks, like the Camus-like example you mentioned. And I definitely think this is a valid use case -- if I want hourly files in HDFS, it's probably better to just run the job once per hour and quickly dump all that data to HDFS than to stream it gradually. A different option from your suggestion would be to expose the upcoming pause/resume functionality of the consumer (assuming you agree with my analysis that this is primarily a sink connector issue). In that case, sink connectors could just pause their inputs and sleep during the time processing should not occur. I'm not sure if the batch mode or exposing pause/resume is better -- both add more API surface area. -Ewen On Thu, Aug 13, 2015 at 10:23 PM, Gwen Shapira <g...@confluent.io> wrote: > Hi Team Kafka, > > (sorry for the flood, this is last one! promise!) > > If you tried out PR-99, you know that CopyCat now does on-going > export/import. So it will continuously read data from a source and write it > to Kafka (or vice versa). This is great for tailing logs and replicating > from MySQL binlog. > > But, I'm wondering if there's a need for a batch-mode too. > This can be useful for: > * Camus-like thing. You can stream data to HDFS, but the benefits are > limited and there are some known issues there. > * Dump large parts of an RDBMS at once. > > Do you agree that this need exist? or is stream export/import good enough? > > Also, anyone has ideas how he would like the batch mode to work? > > Gwen > -- Thanks, Ewen