Yi,

Thanks for your feedback. Replies inline.


>> in YARN, are you sure that we persist the JobModel to coordinator
stream? I checked the code and didn't find that. The JobModel was simply
generated and served from the memory in the JobCoordinator.

Yes, that’s right. Updated the SEP based upon this feedback.


>> In YARN, we don't run stream processor yet. We are still running
SamzaContainer as for now.


I think it’s simpler to represent it logically as processor for both yarn
and standalone. Choosing StreamProcessor(or SamzaContainer) for a
deployment model is an implementation detail which doesn’t have to be
mentioned in design doc. If you have strong opinions about this, let’s
discuss offline.


>>  When describing the differences between YARN and standalone model, item
3 and 4 are not basic facts this design depends on. Instead, those seem to
be detailed design choices already. Suggest to move to requirements in the
design, not the difference between YARN and standalone model

Done. Updated the SEP based upon this feedback.


>>  A further question on systems that provides VM and file system
isolation (Locker G3): in this case, even two processors are on a single
physical host, if they are in two different VMs, they can not share file
system, which means that their state stores can not be shared. And in
addition, if the system provides some common shared directory, then, what's
the locationId in such case?


LocationID represents the physical execution environment of the
StreamProcessor. All the processors which run from a LocationId should be
able to share(read/write)their local state stores. Any store created by a
processor running from a locationId should be readable/writable by other
processors running from the same locationId. If the containerized
environment support shared data volume between VM's on the host, then
locationId will be the physical hostname(Example: Kubernetes). If not,
locationId will be a uniqueID identifying the VM amongst all the
VM’s(Locker G3). For Locker G3, locationId will be the combination of
sliceID and sliceInstanceId.


I have updated the SEP with these details. Please take a look at this when
you get a chance.


Cancelling this vote for now.


Thanks.

Reply via email to