One thing that is worth exploring is to have a transformation and ingestion library in Kafka but use the same framework for fault tolerance, resource isolation and management. The biggest difference I see in these two use cases is the API and data model.
> On Jul 2, 2015, at 8:59 AM, Jay Kreps <j...@confluent.io> wrote: > > Hey Garry, > > Yeah that's super frustrating. I'd be happy to chat more about this if > you'd be interested. I think Chris and I started with the idea of "what > would it take to make Samza a kick-ass ingestion tool" but ultimately we > kind of came around to the idea that ingestion and transformation had > pretty different needs and coupling the two made things hard. > > For what it's worth I think copycat (KIP-26) actually will do what you are > looking for. > > With regard to your point about slider, I don't necessarily disagree. But I > think getting good YARN support is quite doable and I think we can make > that work well. I think the issue this proposal solves is that technically > it is pretty hard to support multiple cluster management systems the way > things are now, you need to write an "app master" or "framework" for each > and they are all a little different so testing is really hard. In the > absence of this we have been stuck with just YARN which has fantastic > penetration in the Hadoopy part of the org, but zero penetration elsewhere. > Given the huge amount of work being put in to slider, marathon, aws > tooling, not to mention the umpteen related packaging technologies people > want to use (Docker, Kubernetes, various cloud-specific deploy tools, etc) > I really think it is important to get this right. > > -Jay > > On Thu, Jul 2, 2015 at 4:17 AM, Garry Turkington < > g.turking...@improvedigital.com> wrote: > >> Hi all, >> >> I think the question below re does Samza become a sub-project of Kafka >> highlights the broader point around migration. Chris mentions Samza's >> maturity is heading towards a v1 release but I'm not sure it feels right to >> launch a v1 then immediately plan to deprecate most of it. >> >> From a selfish perspective I have some guys who have started working with >> Samza and building some new consumers/producers was next up. Sounds like >> that is absolutely not the direction to go. I need to look into the KIP in >> more detail but for me the attractiveness of adding new Samza >> consumer/producers -- even if yes all they were doing was really getting >> data into and out of Kafka -- was to avoid having to worry about the >> lifecycle management of external clients. If there is a generic Kafka >> ingress/egress layer that I can plug a new connector into and have a lot of >> the heavy lifting re scale and reliability done for me then it gives me all >> the pushing new consumers/producers would. If not then it complicates my >> operational deployments. >> >> Which is similar to my other question with the proposal -- if we build a >> fully available/stand-alone Samza plus the requisite shims to integrate >> with Slider etc I suspect the former may be a lot more work than we think. >> We may make it much easier for a newcomer to get something running but >> having them step up and get a reliable production deployment may still >> dominate mailing list traffic, if for different reasons than today. >> >> Don't get me wrong -- I'm comfortable with making the Samza dependency on >> Kafka much more explicit and I absolutely see the benefits in the >> reduction of duplication and clashing terminologies/abstractions that >> Chris/Jay describe. Samza as a library would likely be a very nice tool to >> add to the Kafka ecosystem. I just have the concerns above re the >> operational side. >> >> Garry >> >> -----Original Message----- >> From: Gianmarco De Francisci Morales [mailto:g...@apache.org] >> Sent: 02 July 2015 12:56 >> To: dev@samza.apache.org >> Subject: Re: Thoughts and obesrvations on Samza >> >> Very interesting thoughts. >> From outside, I have always perceived Samza as a computing layer over >> Kafka. >> >> The question, maybe a bit provocative, is "should Samza be a sub-project >> of Kafka then?" >> Or does it make sense to keep it as a separate project with a separate >> governance? >> >> Cheers, >> >> -- >> Gianmarco >> >>> On 2 July 2015 at 08:59, Yan Fang <yanfang...@gmail.com> wrote: >>> >>> Overall, I agree to couple with Kafka more tightly. Because Samza de >>> facto is based on Kafka, and it should leverage what Kafka has. At the >>> same time, Kafka does not need to reinvent what Samza already has. I >>> also like the idea of separating the ingestion and transformation. >>> >>> But it is a little difficult for me to image how the Samza will look >> like. >>> And I feel Chris and Jay have a little difference in terms of how >>> Samza should look like. >>> >>> *** Will it look like what Jay's code shows (A client of Kakfa) ? And >>> user's application code calls this client? >>> >>> 1. If we make Samza be a library of Kafka (like what the code shows), >>> how do we implement auto-balance and fault-tolerance? Are they taken >>> care by the Kafka broker or other mechanism, such as "Samza worker" >>> (just make up the name) ? >>> >>> 2. What about other features, such as auto-scaling, shared state, >>> monitoring? >>> >>> >>> *** If we have Samza standalone, (is this what Chris suggests?) >>> >>> 1. we still need to ingest data from Kakfa and produce to it. Then it >>> becomes the same as what Samza looks like now, except it does not rely >>> on Yarn anymore. >>> >>> 2. if it is standalone, how can it leverage Kafka's metrics, logs, >>> etc? Use Kafka code as the dependency? >>> >>> >>> Thanks, >>> >>> Fang, Yan >>> yanfang...@gmail.com >>> >>>> On Wed, Jul 1, 2015 at 5:46 PM, Guozhang Wang <wangg...@gmail.com> >>> wrote: >>> >>>> Read through the code example and it looks good to me. A few >>>> thoughts regarding deployment: >>>> >>>> Today Samza deploys as executable runnable like: >>>> >>>> deploy/samza/bin/run-job.sh --config-factory=... >> --config-path=file://... >>>> >>>> And this proposal advocate for deploying Samza more as embedded >>>> libraries in user application code (ignoring the terminology since >>>> it is not the >>> same >>>> as the prototype code): >>>> >>>> StreamTask task = new MyStreamTask(configs); Thread thread = new >>>> Thread(task); thread.start(); >>>> >>>> I think both of these deployment modes are important for different >>>> types >>> of >>>> users. That said, I think making Samza purely standalone is still >>>> sufficient for either runnable or library modes. >>>> >>>> Guozhang >>>> >>>>> On Tue, Jun 30, 2015 at 11:33 PM, Jay Kreps <j...@confluent.io> wrote: >>>>> >>>>> Looks like gmail mangled the code example, it was supposed to look >>>>> like >>>>> this: >>>>> >>>>> Properties props = new Properties(); >>>>> props.put("bootstrap.servers", "localhost:4242"); StreamingConfig >>>>> config = new StreamingConfig(props); >>>>> config.subscribe("test-topic-1", "test-topic-2"); >>>>> config.processor(ExampleStreamProcessor.class); >>>>> config.serialization(new StringSerializer(), new >>>>> StringDeserializer()); KafkaStreaming container = new >>>>> KafkaStreaming(config); container.run(); >>>>> >>>>> -Jay >>>>> >>>>> On Tue, Jun 30, 2015 at 11:32 PM, Jay Kreps <j...@confluent.io> >> wrote: >>>>> >>>>>> Hey guys, >>>>>> >>>>>> This came out of some conversations Chris and I were having >>>>>> around >>>>> whether >>>>>> it would make sense to use Samza as a kind of data ingestion >>> framework >>>>> for >>>>>> Kafka (which ultimately lead to KIP-26 "copycat"). This kind of >>>> combined >>>>>> with complaints around config and YARN and the discussion around >>>>>> how >>> to >>>>>> best do a standalone mode. >>>>>> >>>>>> So the thought experiment was, given that Samza was basically >>>>>> already totally Kafka specific, what if you just embraced that >>>>>> and turned it >>>> into >>>>>> something less like a heavyweight framework and more like a >>>>>> third >>> Kafka >>>>>> client--a kind of "producing consumer" with state management >>>> facilities. >>>>>> Basically a library. Instead of a complex stream processing >>>>>> framework >>>>> this >>>>>> would actually be a very simple thing, not much more complicated >>>>>> to >>> use >>>>> or >>>>>> operate than a Kafka consumer. As Chris said we thought about it >>>>>> a >>> lot >>>> of >>>>>> what Samza (and the other stream processing systems were doing) >>> seemed >>>>> like >>>>>> kind of a hangover from MapReduce. >>>>>> >>>>>> Of course you need to ingest/output data to and from the stream >>>>>> processing. But when we actually looked into how that would >>>>>> work, >>> Samza >>>>>> isn't really an ideal data ingestion framework for a bunch of >>> reasons. >>>> To >>>>>> really do that right you need a pretty different internal data >>>>>> model >>>> and >>>>>> set of apis. So what if you split them and had an api for Kafka >>>>>> ingress/egress (copycat AKA KIP-26) and a separate api for Kafka >>>>>> transformation (Samza). >>>>>> >>>>>> This would also allow really embracing the same terminology and >>>>>> conventions. One complaint about the current state is that the >>>>>> two >>>>> systems >>>>>> kind of feel bolted on. Terminology like "stream" vs "topic" and >>>>> different >>>>>> config and monitoring systems means you kind of have to learn >>>>>> Kafka's >>>>> way, >>>>>> then learn Samza's slightly different way, then kind of >>>>>> understand >>> how >>>>> they >>>>>> map to each other, which having walked a few people through this >>>>>> is surprisingly tricky for folks to get. >>>>>> >>>>>> Since I have been spending a lot of time on airplanes I hacked >>>>>> up an ernest but still somewhat incomplete prototype of what >>>>>> this would >>> look >>>>>> like. This is just unceremoniously dumped into Kafka as it >>>>>> required a >>>> few >>>>>> changes to the new consumer. Here is the code: >>> https://github.com/jkreps/kafka/tree/streams/clients/src/main/java/org >>> /apache/kafka/clients/streaming >>>>>> >>>>>> For the purpose of the prototype I just liberally renamed >>>>>> everything >>> to >>>>>> try to align it with Kafka with no regard for compatibility. >>>>>> >>>>>> To use this would be something like this: >>>>>> Properties props = new Properties(); >>>>>> props.put("bootstrap.servers", "localhost:4242"); >>>>>> StreamingConfig config = new >>> StreamingConfig(props); >>>>> config.subscribe("test-topic-1", >>>>>> "test-topic-2"); config.processor(ExampleStreamProcessor.class); >>>>> config.serialization(new >>>>>> StringSerializer(), new StringDeserializer()); KafkaStreaming >>>> container = >>>>>> new KafkaStreaming(config); container.run(); >>>>>> >>>>>> KafkaStreaming is basically the SamzaContainer; StreamProcessor >>>>>> is basically StreamTask. >>>>>> >>>>>> So rather than putting all the class names in a file and then >>>>>> having >>>> the >>>>>> job assembled by reflection, you just instantiate the container >>>>>> programmatically. Work is balanced over however many instances >>>>>> of >>> this >>>>> are >>>>>> alive at any time (i.e. if an instance dies, new tasks are added >>>>>> to >>> the >>>>>> existing containers without shutting them down). >>>>>> >>>>>> We would provide some glue for running this stuff in YARN via >>>>>> Slider, Mesos via Marathon, and AWS using some of their tools >>>>>> but from the >>>> point >>>>> of >>>>>> view of these frameworks these stream processing jobs are just >>>> stateless >>>>>> services that can come and go and expand and contract at will. >>>>>> There >>> is >>>>> no >>>>>> more custom scheduler. >>>>>> >>>>>> Here are some relevant details: >>>>>> >>>>>> 1. It is only ~1300 lines of code, it would get larger if we >>>>>> productionized but not vastly larger. We really do get a ton >>>>>> of >>>>> leverage >>>>>> out of Kafka. >>>>>> 2. Partition management is fully delegated to the new consumer. >>> This >>>>>> is nice since now any partition management strategy available >>>>>> to >>>> Kafka >>>>>> consumer is also available to Samza (and vice versa) and with >>>>>> the >>>>> exact >>>>>> same configs. >>>>>> 3. It supports state as well as state reuse >>>>>> >>>>>> Anyhow take a look, hopefully it is thought provoking. >>>>>> >>>>>> -Jay >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 30, 2015 at 6:55 PM, Chris Riccomini < >>>> criccom...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hey all, >>>>>>> >>>>>>> I have had some discussions with Samza engineers at LinkedIn >>>>>>> and >>>>> Confluent >>>>>>> and we came up with a few observations and would like to >>>>>>> propose >>> some >>>>>>> changes. >>>>>>> >>>>>>> We've observed some things that I want to call out about >>>>>>> Samza's >>>> design, >>>>>>> and I'd like to propose some changes. >>>>>>> >>>>>>> * Samza is dependent upon a dynamic deployment system. >>>>>>> * Samza is too pluggable. >>>>>>> * Samza's SystemConsumer/SystemProducer and Kafka's consumer >>>>>>> APIs >>> are >>>>>>> trying to solve a lot of the same problems. >>>>>>> >>>>>>> All three of these issues are related, but I'll address them in >>> order. >>>>>>> >>>>>>> Deployment >>>>>>> >>>>>>> Samza strongly depends on the use of a dynamic deployment >>>>>>> scheduler >>>> such >>>>>>> as >>>>>>> YARN, Mesos, etc. When we initially built Samza, we bet that >>>>>>> there >>>> would >>>>>>> be >>>>>>> one or two winners in this area, and we could support them, and >>>>>>> the >>>> rest >>>>>>> would go away. In reality, there are many variations. >>>>>>> Furthermore, >>>> many >>>>>>> people still prefer to just start their processors like normal >>>>>>> Java processes, and use traditional deployment scripts such as >>>>>>> Fabric, >>>> Chef, >>>>>>> Ansible, etc. Forcing a deployment system on users makes the >>>>>>> Samza start-up process really painful for first time users. >>>>>>> >>>>>>> Dynamic deployment as a requirement was also a bit of a >>>>>>> mis-fire >>>> because >>>>>>> of >>>>>>> a fundamental misunderstanding between the nature of batch jobs >>>>>>> and >>>>> stream >>>>>>> processing jobs. Early on, we made conscious effort to favor >>>>>>> the >>>> Hadoop >>>>>>> (Map/Reduce) way of doing things, since it worked and was well >>>>> understood. >>>>>>> One thing that we missed was that batch jobs have a definite >>>> beginning, >>>>>>> and >>>>>>> end, and stream processing jobs don't (usually). This leads to >>>>>>> a >>> much >>>>>>> simpler scheduling problem for stream processors. You basically >>>>>>> just >>>>> need >>>>>>> to find a place to start the processor, and start it. The way >>>>>>> we run grids, at LinkedIn, there's no concept of a cluster >>>>>>> being "full". We always >>>> add >>>>>>> more machines. The problem with coupling Samza with a scheduler >>>>>>> is >>>> that >>>>>>> Samza (as a framework) now has to handle deployment. This pulls >>>>>>> in a >>>>> bunch >>>>>>> of things such as configuration distribution (config stream), >>>>>>> shell >>>>> scrips >>>>>>> (bin/run-job.sh, JobRunner), packaging (all the .tgz stuff), etc. >>>>>>> >>>>>>> Another reason for requiring dynamic deployment was to support >>>>>>> data locality. If you want to have locality, you need to put >>>>>>> your >>>> processors >>>>>>> close to the data they're processing. Upon further >>>>>>> investigation, >>>>> though, >>>>>>> this feature is not that beneficial. There is some good >>>>>>> discussion >>>> about >>>>>>> some problems with it on SAMZA-335. Again, we took the >>>>>>> Map/Reduce >>>> path, >>>>>>> but >>>>>>> there are some fundamental differences between HDFS and Kafka. >>>>>>> HDFS >>>> has >>>>>>> blocks, while Kafka has partitions. This leads to less >>>>>>> optimization potential with stream processors on top of Kafka. >>>>>>> >>>>>>> This feature is also used as a crutch. Samza doesn't have any >>>>>>> built >>> in >>>>>>> fault-tolerance logic. Instead, it depends on the dynamic >>>>>>> deployment scheduling system to handle restarts when a >>>>>>> processor dies. This has >>>>> made >>>>>>> it very difficult to write a standalone Samza container >> (SAMZA-516). >>>>>>> >>>>>>> Pluggability >>>>>>> >>>>>>> In some cases pluggability is good, but I think that we've gone >>>>>>> too >>>> far >>>>>>> with it. Currently, Samza has: >>>>>>> >>>>>>> * Pluggable config. >>>>>>> * Pluggable metrics. >>>>>>> * Pluggable deployment systems. >>>>>>> * Pluggable streaming systems (SystemConsumer, SystemProducer, >> etc). >>>>>>> * Pluggable serdes. >>>>>>> * Pluggable storage engines. >>>>>>> * Pluggable strategies for just about every component >>> (MessageChooser, >>>>>>> SystemStreamPartitionGrouper, ConfigRewriter, etc). >>>>>>> >>>>>>> There's probably more that I've forgotten, as well. Some of >>>>>>> these >>> are >>>>>>> useful, but some have proven not to be. This all comes at a cost: >>>>>>> complexity. This complexity is making it harder for our users >>>>>>> to >>> pick >>>> up >>>>>>> and use Samza out of the box. It also makes it difficult for >>>>>>> Samza developers to reason about what the characteristics of >>>>>>> the container (since the characteristics change depending on >>>>>>> which plugins are use). >>>>>>> >>>>>>> The issues with pluggability are most visible in the System APIs. >>> What >>>>>>> Samza really requires to be functional is Kafka as its >>>>>>> transport >>>> layer. >>>>>>> But >>>>>>> we've conflated two unrelated use cases into one API: >>>>>>> >>>>>>> 1. Get data into/out of Kafka. >>>>>>> 2. Process the data in Kafka. >>>>>>> >>>>>>> The current System API supports both of these use cases. The >>>>>>> problem >>>> is, >>>>>>> we >>>>>>> actually want different features for each use case. By papering >>>>>>> over >>>>> these >>>>>>> two use cases, and providing a single API, we've introduced a >>>>>>> ton of >>>>> leaky >>>>>>> abstractions. >>>>>>> >>>>>>> For example, what we'd really like in (2) is to have >>>>>>> monotonically increasing longs for offsets (like Kafka). This >>>>>>> would be at odds >>> with >>>>> (1), >>>>>>> though, since different systems have different >>>>> SCNs/Offsets/UUIDs/vectors. >>>>>>> There was discussion both on the mailing list and the SQL JIRAs >>> about >>>>> the >>>>>>> need for this. >>>>>>> >>>>>>> The same thing holds true for replayability. Kafka allows us to >>> rewind >>>>>>> when >>>>>>> we have a failure. Many other systems don't. In some cases, >>>>>>> systems >>>>> return >>>>>>> null for their offsets (e.g. WikipediaSystemConsumer) because >>>>>>> they >>>> have >>>>> no >>>>>>> offsets. >>>>>>> >>>>>>> Partitioning is another example. Kafka supports partitioning, >>>>>>> but >>> many >>>>>>> systems don't. We model this by having a single partition for >>>>>>> those systems. Still, other systems model partitioning >> differently (e.g. >>>>>>> Kinesis). >>>>>>> >>>>>>> The SystemAdmin interface is also a mess. Creating streams in a >>>>>>> system-agnostic way is almost impossible. As is modeling >>>>>>> metadata >>> for >>>>> the >>>>>>> system (replication factor, partitions, location, etc). The >>>>>>> list >>> goes >>>>> on. >>>>>>> >>>>>>> Duplicate work >>>>>>> >>>>>>> At the time that we began writing Samza, Kafka's consumer and >>> producer >>>>>>> APIs >>>>>>> had a relatively weak feature set. On the consumer-side, you >>>>>>> had two >>>>>>> options: use the high level consumer, or the simple consumer. >>>>>>> The >>>>> problem >>>>>>> with the high-level consumer was that it controlled your >>>>>>> offsets, partition assignments, and the order in which you >>>>>>> received messages. The >>> problem >>>>>>> with >>>>>>> the simple consumer is that it's not simple. It's basic. You >>>>>>> end up >>>>> having >>>>>>> to handle a lot of really low-level stuff that you shouldn't. >>>>>>> We >>>> spent a >>>>>>> lot of time to make Samza's KafkaSystemConsumer very robust. It >>>>>>> also allows us to support some cool features: >>>>>>> >>>>>>> * Per-partition message ordering and prioritization. >>>>>>> * Tight control over partition assignment to support joins, >>>>>>> global >>>> state >>>>>>> (if we want to implement it :)), etc. >>>>>>> * Tight control over offset checkpointing. >>>>>>> >>>>>>> What we didn't realize at the time is that these features >>>>>>> should >>>>> actually >>>>>>> be in Kafka. A lot of Kafka consumers (not just Samza stream >>>> processors) >>>>>>> end up wanting to do things like joins and partition >>>>>>> assignment. The >>>>> Kafka >>>>>>> community has come to the same conclusion. They're adding a ton >>>>>>> of upgrades into their new Kafka consumer implementation. To a >>>>>>> large extent, >>> it's >>>>>>> duplicate work to what we've already done in Samza. >>>>>>> >>>>>>> On top of this, Kafka ended up taking a very similar approach >>>>>>> to >>>> Samza's >>>>>>> KafkaCheckpointManager implementation for handling offset >>>> checkpointing. >>>>>>> Like Samza, Kafka's new offset management feature stores offset >>>>>>> checkpoints in a topic, and allows you to fetch them from the >>>>>>> broker. >>>>>>> >>>>>>> A lot of this seems like a waste, since we could have shared >>>>>>> the >>> work >>>> if >>>>>>> it >>>>>>> had been done in Kafka from the get-go. >>>>>>> >>>>>>> Vision >>>>>>> >>>>>>> All of this leads me to a rather radical proposal. Samza is >>> relatively >>>>>>> stable at this point. I'd venture to say that we're near a 1.0 >>>> release. >>>>>>> I'd >>>>>>> like to propose that we take what we've learned, and begin >>>>>>> thinking >>>>> about >>>>>>> Samza beyond 1.0. What would we change if we were starting from >>>> scratch? >>>>>>> My >>>>>>> proposal is to: >>>>>>> >>>>>>> 1. Make Samza standalone the *only* way to run Samza >>>>>>> processors, and eliminate all direct dependences on YARN, Mesos, >> etc. >>>>>>> 2. Make a definitive call to support only Kafka as the stream >>>> processing >>>>>>> layer. >>>>>>> 3. Eliminate Samza's metrics, logging, serialization, and >>>>>>> config >>>>> systems, >>>>>>> and simply use Kafka's instead. >>>>>>> >>>>>>> This would fix all of the issues that I outlined above. It >>>>>>> should >>> also >>>>>>> shrink the Samza code base pretty dramatically. Supporting only >>>>>>> a standalone container will allow Samza to be executed on YARN >>>>>>> (using Slider), Mesos (using Marathon/Aurora), or most other >>>>>>> in-house >>>>> deployment >>>>>>> systems. This should make life a lot easier for new users. >>>>>>> Imagine >>>>> having >>>>>>> the hello-samza tutorial without YARN. The drop in mailing list >>>> traffic >>>>>>> will be pretty dramatic. >>>>>>> >>>>>>> Coupling with Kafka seems long overdue to me. The reality is, >>> everyone >>>>>>> that >>>>>>> I'm aware of is using Samza with Kafka. We basically require it >>>> already >>>>> in >>>>>>> order for most features to work. Those that are using other >>>>>>> systems >>>> are >>>>>>> generally using it for ingest into Kafka (1), and then they do >>>>>>> the processing on top. There is already discussion ( >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851 >>> 767 >>>>>>> ) >>>>>>> in Kafka to make ingesting into Kafka extremely easy. >>>>>>> >>>>>>> Once we make the call to couple with Kafka, we can leverage a >>>>>>> ton of >>>>> their >>>>>>> ecosystem. We no longer have to maintain our own config, >>>>>>> metrics, >>> etc. >>>>> We >>>>>>> can all share the same libraries, and make them better. This >>>>>>> will >>> also >>>>>>> allow us to share the consumer/producer APIs, and will let us >>> leverage >>>>>>> their offset management and partition management, rather than >>>>>>> having >>>> our >>>>>>> own. All of the coordinator stream code would go away, as would >>>>>>> most >>>> of >>>>>>> the >>>>>>> YARN AppMaster code. We'd probably have to push some partition >>>>> management >>>>>>> features into the Kafka broker, but they're already moving in >>>>>>> that direction with the new consumer API. The features we have >>>>>>> for >>>> partition >>>>>>> assignment aren't unique to Samza, and seem like they should be >>>>>>> in >>>> Kafka >>>>>>> anyway. There will always be some niche usages which will >>>>>>> require >>>> extra >>>>>>> care and hence full control over partition assignments much >>>>>>> like the >>>>> Kafka >>>>>>> low level consumer api. These would continue to be supported. >>>>>>> >>>>>>> These items will be good for the Samza community. They'll make >>>>>>> Samza easier to use, and make it easier for developers to add >>>>>>> new features. >>>>>>> >>>>>>> Obviously this is a fairly large (and somewhat backwards >>> incompatible >>>>>>> change). If we choose to go this route, it's important that we >>> openly >>>>>>> communicate how we're going to provide a migration path from >>>>>>> the >>>>> existing >>>>>>> APIs to the new ones (if we make incompatible changes). I think >>>>>>> at a minimum, we'd probably need to provide a wrapper to allow >>>>>>> existing StreamTask implementations to continue running on the >> new container. >>>>> It's >>>>>>> also important that we openly communicate about timing, and >>>>>>> stages >>> of >>>>> the >>>>>>> migration. >>>>>>> >>>>>>> If you made it this far, I'm sure you have opinions. :) Please >>>>>>> send >>>> your >>>>>>> thoughts and feedback. >>>>>>> >>>>>>> Cheers, >>>>>>> Chris >>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>