Re: Multi-stream SQL-like processing

2020-11-05 Thread Krzysztof Zarzycki
pisaƂ(a): > Yes. The dynamism might be a problem. > Does Kafka Connect support discovering new tables and synchronizing them > dynamically? > > Best, > Jark > > On Thu, 5 Nov 2020 at 04:39, Krzysztof Zarzycki > wrote: > >> Hi Jark, thanks for joining the discu

Re: Multi-stream SQL-like processing

2020-11-04 Thread Krzysztof Zarzycki
WHERE table = 'table2' > > ... > > The configuration tool can use `StatementSet` to package all the INSERT > INTO queries together and submit them in one job. > With the `StatementSet`, the job will share the common source task, so the > tables in MySQL are only read

Multi-stream SQL-like processing

2020-11-02 Thread Krzysztof Zarzycki
Hi community, I would like to confront one idea with you. I was thinking that Flink SQL could be a Flink's answer for Kafka Connect (more powerful, with advantages like being decoupled from Kafka). Flink SQL would be the configuration language for Flink "connectors", sounds great!. But one thing d

Re: Dynamic Flink SQL

2020-04-07 Thread Krzysztof Zarzycki
such approach is significantly better justifying the work by us, maybe also the community, on overcoming these difficulties. > > thanks, > > maciek > > > > On 27/03/2020 10:18, Krzysztof Zarzycki wrote: > > I want to do a bit different hacky PoC: > * I will write a

Complex graph-based sessionization (potential use for stateful functions)

2020-03-30 Thread Krzysztof Zarzycki
Hi! Interesting problem to solve ahead :) I need to implement a streaming sessionization algorithm (split stream of events into groups of correlated events). It's pretty non-standard as we DON'T have a key like user id which separates the stream into substreams which we just need to chunk based on

Re: Dynamic Flink SQL

2020-03-27 Thread Krzysztof Zarzycki
e "restart with savepoint" pattern discussed above. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/experimental.html#reinterpreting-a-pre-partitioned-data-stream-as-keyed-stream [2] https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html > O

Re: Dynamic Flink SQL

2020-03-25 Thread Krzysztof Zarzycki
ce once. The resulting code would be minimal > and easy to maintain. If the performance is not satisfying, you can always > make it more complicated. > > Best, > > Arvid > > > On Mon, Mar 23, 2020 at 7:02 PM Krzysztof Zarzycki > wrote: > >> Dear Flink community!

Dynamic Flink SQL

2020-03-23 Thread Krzysztof Zarzycki
Dear Flink community! In our company we have implemented a system that realize the dynamic business rules pattern. We spoke about it during Flink Forward 2019 https://www.youtube.com/watch?v=CyrQ5B0exqU. The system is a great success and we would like to improve it. Let me shortly mention what the

Re: Join a datastream with tables stored in Hive

2019-12-16 Thread Krzysztof Zarzycki
are also aiming to resolve this shortcoming maybe in 1 or 2 releases. > > Best, > Kurt > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html#idle-state-retention-time > > > On Sat, Dec 14, 2019 at 3:41 AM Krzyszt

Re: Join a datastream with tables stored in Hive

2019-12-13 Thread Krzysztof Zarzycki
ich >> support look up, >> like HBase. >> >> Both solutions are not ideal now, and we also aims to improve this maybe >> in the following >> release. >> >> Best, >> Kurt >> >> >> On Fri, Dec 13, 2019 at 1:44 AM Krzysztof Zarz

Join a datastream with tables stored in Hive

2019-12-12 Thread Krzysztof Zarzycki
Hello dear Flinkers, If this kind of question was asked on the groups, I'm sorry for a duplicate. Feel free to just point me to the thread. I have to solve a probably pretty common case of joining a datastream to a dataset. Let's say I have the following setup: * I have a high pace stream of events

Re: Does Kafka connector leverage Kafka message keys?

2016-05-12 Thread Krzysztof Zarzycki
If I can throw in my 2 cents, I agree with what Elias says. Without that feature (not partitioning already partitioned Kafka data), Flink is in bad position for common simpler processing, that don't involve shuffling at all, for example simple readKafka-enrich-writeKafka . The systems like the new

multi-application correlated savepoints

2016-05-10 Thread Krzysztof Zarzycki
Hi! I'm thinking about using a great Flink functionality - savepoints . I would like to be able to stop my streaming application, rollback the state of it and restart it (for example to update code, to fix a bug). Let's say I would like travel back in time and reprocess some data. But what if I had

Re: RocksDB state checkpointing is expensive?

2016-04-07 Thread Krzysztof Zarzycki
copy the whole RocksDB > database to HDFS (or whatever filesystem you chose as a backup location). > As far as I know, Stephan will start working on adding support for > incremental snapshots this week or next week. > > Cheers, > Aljoscha > > On Thu, 7 Apr 2016 at 09:55 Krz

RocksDB state checkpointing is expensive?

2016-04-07 Thread Krzysztof Zarzycki
Hi, I saw the documentation and source code of the state management with RocksDB and before I use it, I'm concerned of one thing: Am I right that currently when state is being checkpointed, the whole RocksDB state is snapshotted? There is no incremental, diff snapshotting, is it? If so, this seems

Re: Working with protobuf wrappers

2015-12-02 Thread Krzysztof Zarzycki
> wrote: >>> >>>> Hi Kryzsztof, >>>> >>>> it's true that we once added the Protobuf serializer automatically. >>>> However, due to versioning conflicts (see >>>> https://issues.apache.org/jira/browse/FLINK-1635), we removed it >>&

Working with protobuf wrappers

2015-11-30 Thread Krzysztof Zarzycki
Hi! I'm trying to use generated Protobuf wrappers compiled with protoc and pass them as objects between functions of Flink. I'm using Flink 0.10.0. Unfortunately, I get an exception on runtime: [...] Caused by: com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serial

Re: Apache Flink and serious streaming stateful processing

2015-10-14 Thread Krzysztof Zarzycki
Hi guys! I'm sorry I have abandoned this thread but I had to give up Flink for some time. Now I'm back and would like to resurrect this thread. Flink has rapidly evolved in this time too, so maybe new features will allow me what I want to do. By the way, I heard really only good stuff about you fro

Apache Flink and serious streaming stateful processing

2015-06-30 Thread Krzysztof Zarzycki
. These two are supported by Samza, which in code and API is not excellent, but at least allows serious stream processing, that does not require repeating the processing pipeline in batch (Hadoop). I'm looking forward to seeing features like these in Flink. Or they are already there and I'