> On Oct. 7, 2016, 9:04 p.m., Jake Maes wrote:
> > samza-operator/src/main/java/org/apache/samza/operators/api/internal/Operators.java,
> >  line 61
> > <https://reviews.apache.org/r/47994/diff/3/?file=1522711#file1522711line61>
> >
> >     3 thoughts on this line:
> >     1. Why should this be static? Wouldn't this preclude you from having 
> > two tasks run the same operator DAG in the same container/process?
> >     
> >     2. And why here instead of the MessageStream or ChainedOperators 
> > classes? I would expect the topology to be an instantiated thing rather 
> > than a global map. At a minimum since this map and ChainedOperators encode 
> > similar information (subscribers to an operator or message stream) they 
> > should be consolidated to one source of truth for structural/topology info.
> >     
> >     3. Does the order of the Operators in the list have any meaning? e.g. 
> > does it implicitly define the order of processing, or is it just for 
> > consistency, or is the List used to allow duplicates?

Hi, Jake, thanks for the comments. Let me try to answer it one-by-one:
1. The key to this map is the MessageStream object, which will be separate 
instances for each input topic partition. Hence, two tasks w/ the same operator 
DAG will only share the SystemMessageStream and will have their own 
MessageStream and operator objects. Not sure why sharing the same topology info 
between two tasks is necessary.
2. The reason I put this map in Operators.java is due to packaging and access 
mode. In the implementation, I tried to achieve the following two goals: a) 
restrict the direct dependency from any operator.api class to operator.impl 
s.t. we can potentially package API classes separately. Hence, creating the 
operator map directly in ChainedOperators in impl class is not chosen; b) don't 
expose any internal classes (i.e. Operator class is not exposed to user at all) 
via public API classes and methods. Hence, recording the subscribers in 
MessageStream class is not chosen since it inevitably requires a public access 
method in this API class to get the list of operators, which should not be 
exposed/accessed by the programmer. The existance of the multiple layers of 
topology is strictly following the three-layers in the API design: programming 
layer (MessageStream/Windows/...), representation layer (Operators, etc. in 
operator.api.internal), and implementation layer (OperatorImpl, ChainedOper
 ators, etc.). In each layer, the map is the single source of truth. Classes in 
different layers only access the map in its own layer. A single consolidated 
source of truth will break the layering design and does not allow packaging the 
API-only classes separately. Hope this explains the motivation and thoughts 
behind the design choices. I am open to any better suggestion to achive the 
above two goals.
3. So far, I don't see a strong reason for or against a List vs Set. Maybe it 
would be better to keep it as Collection s.t. we have freedom in choosing its 
implementation? 

I will keep this issue open to see whether we can find any better ideas for now.


> On Oct. 7, 2016, 9:04 p.m., Jake Maes wrote:
> > samza-operator/src/main/java/org/apache/samza/operators/api/internal/Operators.java,
> >  line 78
> > <https://reviews.apache.org/r/47994/diff/3/?file=1522711#file1522711line78>
> >
> >     This can be simplified using the new java 8 "getOrDefault" method for 
> > maps.
> >     
> >     Also, should it really be null if there are no subscribers or 
> > Collections.emptyList()?

Nice suggestion. I will use JDK8! And returning the empty list does simplify 
the logic in ChainedOperators. Thanks!


> On Oct. 7, 2016, 9:04 p.m., Jake Maes wrote:
> > samza-operator/src/main/java/org/apache/samza/operators/impl/OperatorFactory.java,
> >  line 42
> > <https://reviews.apache.org/r/47994/diff/3/?file=1522713#file1522713line42>
> >
> >     Another global map. We should be super clear about why these are being 
> > used and what the assumptions are. This type of code can be very fragile if 
> > we're assuming singletons and that assumption is later broken.

The explanation is pretty much the same as I put above. I have a markdown file 
explaining the layered design. It seems that that is not enough to help 
understanding the layered representation of the DAG (from programming to 
representation to implementation). I will try to embed something in the code 
then.

Closing this one since the first issue is similar and is kept open.


- Yi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47994/#review151818
-----------------------------------------------------------


On Oct. 5, 2016, 7:50 a.m., Yi Pan (Data Infrastructure) wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47994/
> -----------------------------------------------------------
> 
> (Updated Oct. 5, 2016, 7:50 a.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Chinmay Soman, Jake 
> Maes, Navina Ramesh, Jagadish Venkatraman, and Xinyu Liu.
> 
> 
> Bugs: SAMZA-915
>     https://issues.apache.org/jira/browse/SAMZA-915
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-915: implementation of StreamPipeline and operator runtime impl classes
> 
> 
> Diffs
> -----
> 
>   
> samza-operator/src/main/java/org/apache/samza/operators/api/MessageStream.java
>  PRE-CREATION 
>   
> samza-operator/src/main/java/org/apache/samza/operators/api/internal/Operators.java
>  PRE-CREATION 
>   
> samza-operator/src/main/java/org/apache/samza/operators/impl/ChainedOperators.java
>  PRE-CREATION 
>   
> samza-operator/src/main/java/org/apache/samza/operators/impl/OperatorFactory.java
>  PRE-CREATION 
>   
> samza-operator/src/main/java/org/apache/samza/operators/impl/OperatorImpl.java
>  PRE-CREATION 
>   
> samza-operator/src/main/java/org/apache/samza/operators/impl/join/PartialJoinOpImpl.java
>  PRE-CREATION 
>   
> samza-operator/src/main/java/org/apache/samza/operators/impl/window/SessionWindowImpl.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/api/TestMessageStream.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/api/internal/TestOperators.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/TestChainedOperators.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/TestOperatorFactory.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/TestOperatorImpl.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/TestOutputMessage.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/TestSimpleOperatorImpl.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/TestSinkOperatorImpl.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/data/serializers/SqlAvroSerdeTest.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/operators/impl/window/TestSessionWindowImpl.java
>  PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/task/BroadcastOperatorTask.java 
> PRE-CREATION 
>   
> samza-operator/src/test/java/org/apache/samza/task/InputJsonSystemMessage.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/47994/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew clean build.
> 
> 
> Thanks,
> 
> Yi Pan (Data Infrastructure)
> 
>

Reply via email to