Hi guys, We're getting ready to release our first Samza project into production soon, and after the development phase we're having a few questions about scalability, version upgrades, state management and recovering from failures, hopefully what I'm asking here is something you already have experienced with.
- We're using Key-Value Storage (backed by RockDB) to keep the track of requests made by a user, the value is a Serialized bean with the information needed to keep track of the user, for this scenario: - I assume caching will help a lot with serialization/deserialization of the Value, but have you guys used the value to be of type other than primitives? - In the event of stopping/starting a task what is the recommended way to restore the "Key-Value" storage state from the changelog stream (kafka), I'm seeing this issue in JIRA: https://issues.apache.org/jira/browse/SAMZA-625, but not sure what is the advised way to init a StreamTask to catch up with the changelog before start processing new events. In the "Containers and resource allocation" section ( http://samza.apache.org/learn/documentation/0.9/container/samza-container.html ) If we scale up / down, how do we make sure that the stream tasks as they are moved from one container to another, are init in the right state before they start processing events (sort of related to secon question above)? - When doing version upgrades of your tasks, what is the best way to change the version that has the less impact in delaying the event consuming part, this would include to recreate the old state in the new version of the task? Please let me know if these questions are too general and I'll be glad to add specifics in any of them. By the way I'm going this Tuesday to the Meetup, hope I can meet many of you there! José L. Barrueta Stormpath