> On Jun 10, 2015, at 1:26 PM, Todd Palino <tpal...@gmail.com> wrote: > > For us, group ID is a configuration parameter of the application. So we > store it in configuration files (generally on disk) and maintain it there > through our configuration and deployment infrastructure. As you pointed > out, hard coding the group ID into the application is not usually a good > pattern. > > If you want to reset, you have a couple choices. One is that you can just > switch group names and start fresh. Another is that you can shut down the > consumer and delete the existing consumer group, then restart. You could > also stop, edit the offsets to set them to something specific (if you need > to roll back to a specific point, for example), and restart. >
Thanks Todd. That helps. The "on disk" storage doesn't work well if you are running consumers in ephemeral nodes like EC2 machines, but in that case, I guess you would save the group ID in some other data store ("on disk, but elsewhere") associated with your "application cluster" rather than any one node of the cluster. I often hear about people saving their offsets using the consumer, and monitoring offsets for lag. I don't hear much about people deleting or changing/setting offsets by other means. How is it usually done? Are there tools to change the offsets, or do people go into zookeeper to change them directly? Or, for broker-stored offsets, use the Kafka APIs? -James > -Todd > > > On Wed, Jun 10, 2015 at 1:20 PM, James Cheng <jch...@tivo.com> wrote: > >> Hi, >> >> How are people specifying/persisting/resetting the consumer group >> identifier ("group.id") when using the high-level consumer? >> >> I understand how it works. I specify some string and all consumers that >> use that same string will help consume a topic. The partitions will be >> distributed amongst them for consumption. And when they save their offsets, >> the offsets will be saved according to the consumer group. That all makes >> sense to me. >> >> What I don't understand is the best way to set and persist them, and reset >> them if needed. For example, do I simply hardcode the string in my code? If >> so, then all deployed instances will have the same value (that's good). If >> I want to bring up a test instance of that code, or a new installation, >> though, then it will also share the load (that's bad). >> >> If I pass in a value to my instances, that lets me have different test and >> production instances of the same code (that's good), but then I have to >> persist my consumer group id somewhere outside of the process (on disk, in >> zookeeper, etc). Which then means I need some way to manage *that* >> identifier (that's... just how it is?). >> >> What if I decide that I want my app to start over? In the case of >> log-compacted streams, I want to throw away any processing I did and start >> "from the beginning". Do I change my consumer group, which effective resets >> everything? Or do I delete my saved offsets, and then resume with the same >> consumer group? The latter is functionally equivalent to the former. >> >> Thanks, >> -James >> >>