Hi Till,

Thanks for your feedback. I didn't think about this before. It will be
better if we can provide such tools instead of
let user deal with testing in operator level directly. Because test harness
was introduced for contributors, who
may have more knowledge about internal design, to test their patches
easily, it sometimes may not be intuitive
to users to use test harness. And that's why I want to provide some
examples in docs to ease the pain from such
problem in the beginning.

If Aljoscha has already made some efforts on it and let such tool be
available in the near future, I will look forward
to seeing this happen. And if there is anything I can help, I'm glad to
give a hand.

To follow up the discussion about users' requirements, I can provide some
from my experience. What I want to
achieve is to do unit testing on my operators with Flink features, such as
states, event time and checkpoint.

>From the testing doc [1] I provided, I concluded some scenarios I would
test in my flink application:

   1. Easily control stream records to see the changes about the states and
   outputs, no matter it is one input
   stream operator or two.
   2. Manually control watermark progress to test "out of order"
   problem, "late data" problem or the behavior of
   "event time service".
   3. Preserve states from every version for testing states evolution and
   compatibility or just to test the change
   of Flink version.
   4. Test customized stateful source function. In current version, source
   function still need to be implemented
   by a long running method. How to easily block it and verify the states
   is helpful.
   5. A simple way to expose states to verify the exactly value stored in
   states, instead of testing it indirectly by
   outputs. I archived this by getting state backend from test harness and
   refer to some examples from Flink
   project to access the keyed states I needed, but it is too deeper into
   the internal implementations to make
   it as an example in my testing doc [1].

Best,
Tony Wei

[1] https://github.com/apache/flink/compare/master...
tony810430:flink-testing-doc

2018-09-28 21:48 GMT+08:00 Till Rohrmann <trohrm...@apache.org>:

> Hi Tony,
>
> I think this is a long sought-after feature to provide better testing tools
> for our users. Thus, I'm strongly in favour of adding something like this.
> If I remember correctly Aljoscha already spend some brain cycles on this
> and he also gave a training about the current state at FF SF 2018. I've
> pulled him in to give more details.
>
> The first two things we can do is to collect requirements from our users
> and to see what the current state is. Based on that we could plan which
> things to add in which order.
>
> Cheers,
> Till
>
> On Fri, Sep 28, 2018 at 3:13 PM Tony Wei <tony19920...@gmail.com> wrote:
>
> > Hi all,
> >
> > @Ken, Thanks for your positive feedback. I have a similar experience with
> > test harness and that's why
> > I want to provide more contents on testing doc to prevent from this kind
> of
> > problems.
> >
> > Does anyone have any feedback and advice? I would like to collect more
> > opinions from every developers
> > and users. Please let me know what you think about this topic. All
> > suggestions are welcome. Thank you.
> >
> > Best,
> > Tony Wei
> >
> > 2018-09-26 2:20 GMT+08:00 Ken Krugler <kkrugler_li...@transpac.com>:
> >
> > > Hi Tony,
> > >
> > > I think this would be great - we’ve been building out tests using
> > > AbstractStreamOperator, and the lack of documentation has made it
> > > challenging.
> > >
> > > For example, there was this exchange I had with Piotr about a month
> ago:
> > >
> > > > You made a small mistake when restoring from state using test
> harness,
> > > that I myself have also done in the past. Problem is with an ordering
> of
> > > those calls:
> > > >
> > > >         result.open();
> > > >         if (savedState != null) {
> > > >             result.initializeState(savedState);
> > > >         }
> > > >
> > > > Open is supposed to be called after initializeState, and if you look
> > > into the code of AbstractStreamOperatorTestHarness#open, if it is
> called
> > > before initialize, it will initialize harness without any state.
> > > >
> > > > Unfortunate is that this is implicit behaviour that doesn’t throw any
> > > error (test harness is not part of a Flink’s public api). I will try to
> > fix
> > > this: https://issues.apache.org/jira/browse/FLINK-10159 <
> > > https://issues.apache.org/jira/browse/FLINK-10159>
> > > — Ken
> > >
> > > > On Sep 25, 2018, at 3:30 AM, Tony Wei <tony19920...@gmail.com>
> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > It seems that there are more and more users from user mailing list
> ask
> > > how
> > > > to do unit test with Flink
> > > > features like states or timer. And the community usually tends to
> > suggest
> > > > them using
> > > > `AbstractStreamOperator` and provide an example from Flink github
> repo.
> > > > Here I sort out some
> > > > examples and write them down in the testing documentation [1]. And I
> > > would
> > > > link to contribute back
> > > > to the Flink.
> > > >
> > > > The reason why I ask it first in dev mailing list is that
> > > > `AbstractStreamOperator` is an internal API and
> > > > could be changed at any time. I'm not sure if it is worth to provide
> > > these
> > > > examples on testing
> > > > document, so I want to collect some feedbacks before I go to open a
> > JIRA
> > > > ticket.
> > > >
> > > > If this is feasible and valuable, then I will open the corresponding
> > JIRA
> > > > ticket and we can discuss
> > > > more details of what examples are good to have in the document or how
> > to
> > > > structure the content.
> > > >
> > > > I would really appreciate any feedback from you. Thanks in advance.
> > > >
> > > > Best Regards,
> > > > Tony Wei
> > > >
> > > > [1]
> > > > https://github.com/apache/flink/compare/master...
> > > tony810430:flink-testing-doc
> > >
> > > --------------------------
> > > Ken Krugler
> > > +1 530-210-6378
> > > http://www.scaleunlimited.com
> > > Custom big data solutions & training
> > > Flink, Solr, Hadoop, Cascading & Cassandra
> > >
> > >
> >
>

Reply via email to