Hi guys,

We're getting ready to release our first Samza project into production
soon, and after the development phase we're having a few questions about
scalability, version upgrades, state management and recovering from
failures, hopefully what I'm asking here is something you already have
experienced with.

- We're using Key-Value Storage (backed by RockDB) to keep the track of
requests made by a user, the value is a Serialized bean with the
information needed to keep track of the user, for this scenario:

  - I assume caching will help a lot with serialization/deserialization of
the Value, but have you guys used the value to be of type other than
primitives?
  - In the event of stopping/starting a task what is the recommended way to
restore the "Key-Value" storage state from the changelog stream (kafka),
I'm seeing this issue in JIRA:
https://issues.apache.org/jira/browse/SAMZA-625, but not sure what is the
advised way to init a StreamTask to catch up with the changelog before
start processing new events.

In the "Containers and resource allocation" section (
http://samza.apache.org/learn/documentation/0.9/container/samza-container.html
)

If we scale up / down, how do we make sure that the stream tasks as they
are moved from one container to another, are init in the right state before
they start processing events (sort of related to secon question above)?

- When doing version upgrades of your tasks, what is the best way to change
the version that has the less impact in delaying the event consuming part,
this would include to recreate the old state in the new version of the task?

Please let me know if these questions are too general and I'll be glad to
add specifics in any of them.

By the way I'm going this Tuesday to the Meetup, hope I can meet many of
you there!

José L. Barrueta
Stormpath

Reply via email to