Re: Kafka Connect / Access to OffsetStorageReader from SourceConnector

2017-02-22 Thread Tianji Li
Hi Florian, Just curious, what 'shared storage' you guys use to keep the files before ingested into Kafka? In our case, we could not figure out such a nice distributed+shared file system that is NOT HDFS alike and runs before Kafka. So we use individual harddisks on connector machines and keep of

Re: Kafka Connect / Access to OffsetStorageReader from SourceConnector

2017-02-21 Thread Jason Gustafson
Hey Florian, It seems reasonable to me to let the connector track task progress through offsets. I recall there have been other use cases for communication between tasks and connectors (perhaps Ewen or someone else will jump in here and mention them), so I'm not sure if there if this could fall un

Re: Kafka Connect / Access to OffsetStorageReader from SourceConnector

2017-02-20 Thread Florian Hussonnois
Hi Jason, Yes, this is the idea. The connector assigns a subset of files to each task. A task stores the size of file, the bytes offset and the bytes size of the last sent record as a source offsets. A file is finished when recordBytesOffsets + recordBytesSize = fileBytesSize. The connector shou

Re: Kafka Connect / Access to OffsetStorageReader from SourceConnector

2017-02-17 Thread Jason Gustafson
Hey Florian, Can you explain a bit more how having access to the offset storage from the connector helps in your use case? I guess you are planning to use offsets to be able to tell when a task has finished a file? Thanks, Jason On Fri, Feb 17, 2017 at 4:45 AM, Florian Hussonnois wrote: > Hi K

Kafka Connect / Access to OffsetStorageReader from SourceConnector

2017-02-17 Thread Florian Hussonnois
Hi Kafka Team, I'm developping a connector which need to monitor the progress of its tasks in order to be able to request a tasks reconfiguration in some situations. Our connector is pretty simple. It's used to stream a thousands of files into Kafka. The connector scans directories then schedules