Re: [DISCUSS] Streams DSL/StateStore Refactoring

Damian Guy Thu, 22 Jun 2017 04:06:44 -0700

Thanks everyone. My latest attempt is below. It builds on the fluent
approach, but i think it is slightly nicer.
I agree with some of what Eno said about mixing configy stuff in the DSL,
but i think that enabling caching and enabling logging are things that
aren't actually config. I'd probably not add withLogConfig(...) (even
though it is below) as this is actually config and we already have a way of
doing that, via the StateStoreSupplier. Arguably we could use the
StateStoreSupplier for disabling caching etc, but as it stands that is a
bit of a tedious process for someone that just wants to use the default
storage engine, but not have caching enabled.


There is also an orthogonal concern that Guozhang alluded to.... If you
want to plug in a custom storage engine and you want it to be logged etc,
you would currently need to implement that yourself. Ideally we can provide
a way where we will wrap the custom store with logging, metrics, etc. I
need to think about where this fits, it is probably more appropriate on the
Stores API.

final KeyValueMapper<String, String, Long> keyMapper = null;
// count with mapped key
final KTable<Long, Long> count = stream.grouped()
        .withKeyMapper(keyMapper)
        .withKeySerde(Serdes.Long())
        .withValueSerde(Serdes.String())
        .withQueryableName("my-store")
        .count();

// windowed count
final KTable<Windowed<String>, Long> windowedCount = stream.grouped()
        .withQueryableName("my-window-store")
        .windowed(TimeWindows.of(10L).until(10))
        .count();

// windowed reduce
final Reducer<String> windowedReducer = null;
final KTable<Windowed<String>, String> windowedReduce = stream.grouped()
        .withQueryableName("my-window-store")
        .windowed(TimeWindows.of(10L).until(10))
        .reduce(windowedReducer);

final Aggregator<String, String, Long> aggregator = null;
final Initializer<Long> init = null;

// aggregate
final KTable<String, Long> aggregate = stream.grouped()
        .withQueryableName("my-aggregate-store")
        .aggregate(aggregator, init, Serdes.Long());

final StateStoreSupplier<KeyValueStore<String, Long>> stateStoreSupplier = null;

// aggregate with custom store
final KTable<String, Long> aggWithCustomStore = stream.grouped()
        .withStateStoreSupplier(stateStoreSupplier)
        .aggregate(aggregator, init);

// disable caching
stream.grouped()
        .withQueryableName("name")
        .withCachingEnabled(false)
        .count();

// disable logging
stream.grouped()
        .withQueryableName("q")
        .withLoggingEnabled(false)
        .count();

// override log config
final Reducer<String> reducer = null;
stream.grouped()
        .withLogConfig(Collections.singletonMap("segment.size", "10"))
        .reduce(reducer);


If anyone wants to play around with this you can find the code here:
https://github.com/dguy/kafka/tree/dsl-experiment

Note: It won't actually work as most of the methods just return null.

Thanks,
Damian


On Thu, 22 Jun 2017 at 11:18 Ismael Juma <[email protected]> wrote:

> Thanks Damian. I think both options have pros and cons. And both are better
> than overload abuse.
>
> The fluent API approach reads better, no mention of builder or build
> anywhere. The main downside is that the method signatures are a little less
> clear. By reading the method signature, one doesn't necessarily knows what
> it returns. Also, one needs to figure out the special method (`table()` in
> this case) that gives you what you actually care about (`KTable` in this
> case). Not major issues, but worth mentioning while doing the comparison.
>
> The builder approach avoids the issues mentioned above, but it doesn't read
> as well.
>
> Ismael
>
> On Wed, Jun 21, 2017 at 3:37 PM, Damian Guy <[email protected]> wrote:
>
> > Hi,
> >
> > I'd like to get a discussion going around some of the API choices we've
> > made in the DLS. In particular those that relate to stateful operations
> > (though this could expand).
> > As it stands we lean heavily on overloaded methods in the API, i.e, there
> > are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
> > feel it is only going to get worse as we add more optional params. In
> > particular we've had some requests to be able to turn caching off, or
> > change log configs,  on a per operator basis (note this can be done now
> if
> > you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> >
> > So this is a bit of an open question. How can we change the DSL overloads
> > so that it flows, is simple to use and understand, and is easily extended
> > in the future?
> >
> > One option would be to use a fluent API approach for providing the
> optional
> > params, so something like this:
> >
> > groupedStream.count()
> >    .withStoreName("name")
> >    .withCachingEnabled(false)
> >    .withLoggingEnabled(config)
> >    .table()
> >
> >
> >
> > Another option would be to provide a Builder to the count method, so it
> > would look something like this:
> > groupedStream.count(new
> > CountBuilder("storeName").withCachingEnabled(false).build())
> >
> > Another option is to say: Hey we don't need this, what are you on about!
> >
> > The above has focussed on state store related overloads, but the same
> ideas
> > could  be applied to joins etc, where we presently have many join methods
> > and many overloads.
> >
> > Anyway, i look forward to hearing your opinions.
> >
> > Thanks,
> > Damian
> >
>

Re: [DISCUSS] Streams DSL/StateStore Refactoring

Reply via email to