> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 71
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line71>
> >
> >     The explanation of "processAsync() will always be invoked in a single 
> > thread" is off-the-target. I think that you might simply want to focus on 
> > "by default, AsyncStreamTask in Samza guarantees in-order process of 
> > messages in a task, by making sure the processAsync() can only be called 
> > after the callback of previous processAsync() is triggered". Whether 
> > processAsync() is invoked in a single thread or not does not directly 
> > explain the default in-order guarantee.
> >     
> >     P.S. with this in-order guarantee, doesn't it defeat the whole purpose 
> > of async processing? Naturally, there is a question on what performance 
> > benefit we gain w/ the default processAsync()?

the perf benefit is that we have paralism among tasks. The processAsync() can 
be invoked independently among tasks.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 73
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line73>
> >
> >     I felt that the flow works better for me if we organize the explanation 
> > in the following order:
> >     - Sync process w/ multi-thread: more parallelism for remote I/O among 
> > tasks, in-order execution within a task
> >     - AsyncStreamTask w/ in-order execution in a task (alternative to sync 
> > task): default AsyncStreamTask, more parallelism for remote I/O among 
> > tasks, in-order execution within a task
> >     - AsyncStreamTask: more parallelism among and within the tasks, 
> > out-of-order execution

Thanks for the suggestion! I like this better too and I reordered the doc. The 
third bullet (paralism within a task) can be applied to both Sync and Async 
cases. Now it becomes:

- Sync with multi-threading, in-order
- Async, in-order
- Out-of-order for both.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 119
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line119>
> >
> >     "in the order of the message arrivals".
> >     
> >     Also, what do you refer to as "strict ordering of the output" here? In 
> > StreamTask w/ multi-threading, don't we guarantee the in-order processing 
> > within the task? We can't guarantee the ordering between the tasks anyways.

task.max.concurrency applies to both async and sync case, so we can have 
multiple process() run in parrallel for a StreamTask.


> On Sept. 16, 2016, 8:58 p.m., Yi Pan (Data Infrastructure) wrote:
> > docs/learn/tutorials/versioned/samza-async-user-guide.md, line 125
> > <https://reviews.apache.org/r/50174/diff/7/?file=1500250#file1500250line125>
> >
> >     My original question on the doc w/ process() call is: I think that we 
> > guarantee that process()/processAsync() and window(), commit() within a 
> > single task instance are mutally exclusive to each other. window() and 
> > commit() are also exclusive to themselves. But multiple processAsync() 
> > calls can be invoked in different threads simultaneously, right? Hence, in 
> > a single task instance, the variables used in processAsync() calls need to 
> > be thread-safe. Am I interpretting the current implementation right?

Actually it's different: processAsync() will always be invoked from a single 
thread (for all tasks). The reason being asynchronous calls will not require 
threads blocking on the calls. The callback can be invoked from multiple 
threads. Samza controls when to invoke processAsync()/window()/commit() for 
each task.


- Xinyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50174/#review149262
-----------------------------------------------------------


On Sept. 19, 2016, 5:23 p.m., Xinyu Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50174/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2016, 5:23 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Update samza web docs with new multithreading api, core and configs.
> 
> 
> Diffs
> -----
> 
>   docs/Gemfile.lock 8a236e6835cd82cfdfe5833c5b83f1c5e63ef814 
>   docs/learn/documentation/versioned/api/overview.md 
> 6712344e84e19883b857e00549db2acb101c7e0e 
>   docs/learn/documentation/versioned/container/event-loop.md 
> 116238312df7071747cbbc14bc9c46f558755195 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> 54c52981c3055b398ee60af50eeaf2592ed0e64f 
>   docs/learn/tutorials/versioned/index.md 
> b4d687a63638aca4f876af88556de9973acfd718 
>   docs/learn/tutorials/versioned/samza-async-user-guide.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50174/diff/
> 
> 
> Testing
> -------
> 
> Test the web pages locally.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>

Reply via email to