YARN parallelism vs Config

2019-07-19 Thread Dominik Wosiński
Hey, I was wondering about the relation between the parallelism set by YARN in Yarn properties file. Currently, as far as I know there is only one execution of `writeYarnPropertiesFIle` method and it sets the parallelism in the YARN properties to the number of workers * number of slots per worker.

Kafka Checkpointing weird behavior.

2019-09-02 Thread Dominik Wosiński
Hey, I just want to understand something, because I am observing weird behavior of Kafka Consumer > 0.8 . So the idea is, if we enable the checkpointing and enable the commit offsets on checkpoint, which AFAIK is enabled by default, then for versions of Kafka > 0.8 we should see the changes in the

Storing offsets in Kafka

2019-09-04 Thread Dominik Wosiński
Hey, I was wondering whether something has changed for KafkaConsumer, since I am using Kafka 2.0.0 with Flink and I wanted to use group offsets but there seems to be no change in the topic where Kafka stores it's offsets, after restart Flink uses the `auto.offset.reset` so it seems that there is no

Re: Storing offsets in Kafka

2019-09-05 Thread Dominik Wosiński
your setting falls in one of the two cases? > > Thanks, > > Jiangjie (Becket) Qin > > > > > On Wed, Sep 4, 2019 at 9:03 PM Dominik Wosiński wrote: > > > Hey, > > I was wondering whether something has changed for KafkaConsumer, since I > am > >

Checkpointing clarification

2019-09-06 Thread Dominik Wosiński
Hello, I have a slight doubt on checkpointing in Flink and wanted to clarify my understanding. Flink uses barriers internally to keep track of the records that were processed. The documentation[1] describes it as the checkpoint was only happening when the barriers are transferred to the sink. So l

Re: Checkpointing clarification

2019-09-08 Thread Dominik Wosiński
Okay, thanks for clarifying. I have some followup question here. If we consider Kafka offsets commits, this basically means that the offsets committed during the checkpoint are not necessarily the offsets that were really processed by the pipeline and written to sink ? I mean If there is a window i

Flink CEP greedy match of single pattern

2020-02-21 Thread Dominik Wosiński
Hey, I have a question regarding CEP, assume I have a stream of readings from various sensors. The application is running in EventTime, so according to the CEP docs the events are buffered and sorted by timestamp ascending. So, I want to record the situations when reading from the sensor goes abov

AfterMatchSkipStrategy for timed out patterns

2020-03-11 Thread Dominik Wosiński
Hey all, I was wondering whether for CEP the *AfterMatchSkipStrategy *is applied during matching or if simply the results are removed after the match. The question is the result of the experiments I was doing with CEP. Say I have the readings from some sensor and I want to detect events over some

Watermark generation in Temporal Table Join

2020-03-17 Thread Dominik Wosiński
Hey Guys, I have observed a weird behavior on using the Temporal Table Join and the way it pushes the Watermark forward. Generally, I think the question is *When is the Watermark pushed forward by the Temporal Table Join?* The issue I have noticed is that Watermark seems to be pushed forward even

Re: Watermark generation in Temporal Table Join

2020-03-19 Thread Dominik Wosiński
es then I can help to > analyze and debug? > > Best, > Kurt > > > On Tue, Mar 17, 2020 at 9:53 PM Dominik Wosiński wrote: > > > Hey Guys, > > I have observed a weird behavior on using the Temporal Table Join and the > > way it pushes the Watermark forward.

Re: Watermark generation in Temporal Table Join

2020-03-22 Thread Dominik Wosiński
Hey, generally, that's what I thought more or less. I think I understand the behavior itself, thanks for explaining it to me. But what actually concerns me is the fact that this *assignTimestampsAndWatermarks* is required if You will select this Long field, which basically means that the type of s

Re: Flink CEP greedy match of single pattern

2020-03-25 Thread Dominik Wosiński
Hey, thanks for the answer. But if I add the *AfterMatchSkipStrategy* it simply seems to emit event by event so in the case described above it does emit: [400], [500] Shouldn't the *greedy* quantifier guarantee that this will be matched as many times as possible thus creating [400, 500] ?? Thanks

Re: Flink CEP greedy match of single pattern

2020-03-25 Thread Dominik Wosiński
P.S. So now my pattern looks like this: Pattern.begin[AccelVector](EventPatternName, AfterMatchSkipStrategy.skipPastLastEvent()) .where(_.data() > Threshold) .oneOrMore .greedy .consecutive() .within(Time.minutes(1)) śr., 25 mar 2020 o 10:03 Dominik Wosiński napisał(a): &g

Testing DataStreams

2019-10-02 Thread Dominik Wosiński
Hello, I have a question, since I am observing quite weird behavior. In the documentation[1] the example of FlinkMiniCluster usage, shows that we can expect the results to appear in the same order as they were injected to the stream by use of *fromElements(). *I mean that Java version of the code i

Re: Testing DataStreams

2019-10-07 Thread Dominik Wosiński
gt; >.setParallelism(2); > > This is not: > > env.fromElements(1L, 21L, 22L) >.map(x -> x * 2) >.setParallelism(2) > >.setParallelism(1); > > In other words, if you never reduce the parallelism your functions should > be fine. > If you h

Ignore operator failure

2019-10-14 Thread Dominik Wosiński
Hey, I have a question that I have not been able to find an answer for in the docs nor in any other source. Suppose we have a business system and we are using Elasticsearch sink, but not for the purpose of business case, but rather for keeping info on the data that is flowing through the system. Th

Avro Schema Resolution Compatibility

2019-11-04 Thread Dominik Wosiński
Hey, I have a question regarding Avro Types and schema evolution. According to the docs the schema resolution is compatible with the Avro docs [1]. But I have done some testing. For example, I have created a record, written it to Kafka, and then changed the order the fields in schema and tried to

Re: Avro Schema Resolution Compatibility

2019-11-04 Thread Dominik Wosiński
new > snapshot will be written entirely with the new updated avro schema. > > Hope this clarifies how the integration with Avro in Flink works. > > Best, > > Dawid > > > [1] > > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/schema

HadoopInputFormat Custom Partitioning

2019-11-07 Thread Dominik Wosiński
Hey, I wanted to ask if the *HadoopInputFormat* does currently support some custom partitioning scheme ? Say I have 200 files in HDFS each having the partitioning key in name, can we ATM use HadoopInputFormat to distribute reading to multiple TaskManagers using the key ?? Best Regards, Dom.

Flink Read thousands of files with batch

2019-11-10 Thread Dominik Wosiński
Hey, I have a very specific use case. I have a history of records stored as Parquet in S3. I would like to read and process them with Flink. The issue is that the number of files is quite large ( >100k). If I provide the full list of files to HadoopInputFormat that I am using it will fail with AskT

Re: Flink Read thousands of files with batch

2019-11-12 Thread Dominik Wosiński
t;> >> `akka.ask.timeout` is the default RPC timeout, while some RPCs may >> override >> this timeout for their own purpose. e.g. the RPCs from web usually use >> `web.timeout`. >> Providing the detailed call stack of the AskTimeoutException may help to >> identif

Re: Flink Read thousands of files with batch

2019-11-12 Thread Dominik Wosiński
an take a look and try.[2] > > [1] https://issues.apache.org/jira/browse/FLINK-14722 > [2] > > https://github.com/JingsongLi/flink/commit/90c021ab8e7a175c6644c345e63383d828c415d7 > > Best, > Jingsong Lee > > On Tue, Nov 12, 2019 at 6:49 PM Dominik Wosiński wrote: > >

Re: Flink Read thousands of files with batch

2019-11-12 Thread Dominik Wosiński
I have managed to locate the issue with timeout, changing `web.timeout` was the solution. However, now I am getting the following error : 019-11-12 16:58:00,741 INFO org.apache.parquet.hadoop.ParquetInputFormat - Total input paths to process : 671 2019-11-12 16:58:04,878 INFO org.

BroadcastState enforce processing

2020-01-20 Thread Dominik Wosiński
Hey, I have a question since I have observed the following situation: We have two streams A and B that will be read from Kafka. Let's say we have a set of rules for processing that is stored in A and B is the stream that we will process. Since there is no guarantee that elements from A will be pr

Join of three streams with different frequency

2020-01-20 Thread Dominik Wosiński
Hello, I wanted to ask whether the idea and the general concept that I have is correct or if there is anything better in Flink to use. Say, I that I have 3 different streams : - currency_codes, which has the name of the currency. It has the following fields (*currency_iso_code, tst, curren

Re: Join of three streams with different frequency

2020-01-21 Thread Dominik Wosiński
-stable/dev/table/streaming/temporal_tables.html > < > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/temporal_tables.html > > > > > On 20 Jan 2020, at 17:02, Dominik Wosiński wrote: > > > > Hello, > > I wanted to ask whether the idea and t

Re: Join of three streams with different frequency

2020-01-21 Thread Dominik Wosiński
lease-1.9/dev/table/streaming/joins.html#event-time-temporal-joins > > > > Piotrek > > > On 21 Jan 2020, at 11:25, Dominik Wosiński wrote: > > > > Hey, > > I have considered the Temporal Table Joins, but as far as I know from the > > docs, it is only curre

Re: Join of three streams with different frequency

2020-01-29 Thread Dominik Wosiński
Hey, But isn't the temporal table join generating output only on watermark ??? I have found such info here: https://stackoverflow.com/questions/54441594/how-to-use-flink-temporal-tables. But since one of the tables will have data that changes very rarely, this would mean that using a temporal table

ODP: [DISCUSS] Release Flink 1.5.3

2018-08-16 Thread Dominik Wosiński
+1, I agree that frequent releases are good for users. Dominik Wysłane z aplikacji Poczta dla Windows 10 Od: Stefan Richter Wysłano: czwartek, 16 sierpnia 2018 10:57 Do: dev Temat: Re: [DISCUSS] Release Flink 1.5.3 +1 sounds good. Stefan > Am 16.08.2018 um 09:55 schrieb Piotr Nowojski : > >

ODP: How to handle tests that need docker?

2018-08-16 Thread Dominik Wosiński
Imho, Such tests should be off by default as they impose some requirements on the people trying to build Flink from source. And one would have to install docker only for the tests to pass. But of course they should be enabled for CI builds. Best Regards, Dominik. Wysłane z aplikacji Poczta dla

SQL Client Limitations

2018-08-20 Thread Dominik Wosiński
Hey, Do we have any list of current limitations of SQL Client available somewhere or the only way is to go through JIRA issues? For example: I tried to make Group By Tumble Window and Inner Join in one query and it seems that it is not possible currently and I was wondering whether it's and issue

Flink SQL Client Kafka keyed serialization

2018-10-23 Thread Dominik Wosiński
Hey, I don't think we are currently supporting this, but it would be a good idea to have Kafka sink with support for keys. I have worked on something similar to SQL Client before it was created, but the keys in Kafka are crucial to us and this is currently the limitation that keeps us from switchin

Re: Flink SQL Client Kafka keyed serialization

2018-10-24 Thread Dominik Wosiński
36ca230d@%3Cdev.flink.apache.org%3E > [2] > > https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y > > Am Di., 23. Okt. 2018 um 14:46 Uhr schrieb Dominik Wosiński < > wos...@gmail.com>: > > > Hey, > > I don't think we are currently supp

Re: [DISCUSS] Flink SQL DDL Design

2018-11-04 Thread Dominik Wosiński
+1, Thanks for the proposal. I guess this is a long-awaited change. This can vastly increase the functionalities of the SQL Client as it will be possible to use complex extensions like for example those provided by Apache Bahir[1]. Best Regards, Dom. [1] https://github.com/apache/bahir-flink s

Re: Flip23

2018-11-05 Thread Dominik Wosiński
Hey Boris, We have developed something very similar for our needs, but we faced some issues when running it in HA mode, it was mainly because of the fact that Tensorflow uses native functions and this caused some issues when connected with automatic job restarts. As far as I remember, the issue w

Re: [ANNOUNCE] Apache Flink 1.7.0 released

2018-12-01 Thread Dominik Wosiński
Thanks Till for being the release manager! Thanks Everyone and Great Job. Best Regards, Dom. pt., 30 lis 2018 o 13:19 vino yang napisał(a): > Thanks Till for your great work, also thanks to the whole community! > > Thanks, vino. > > Timo Walther 于2018年11月30日周五 下午7:42写道: > > > Thanks for being

Re: [DISCUSS] Bot for stale PRs on GitHub

2019-01-13 Thread Dominik Wosiński
> > Hey, > I agree with Timo here that we should introduce labels that will improve communication for PRs. IMHO this will show what percentage of PRs is really stale and not just abandoned due to the misunderstanding or other communication issues. Best Regards, Dom.

Re: Data not getting passed between operators

2019-01-17 Thread Dominik Wosiński
Hey, Could You please format Your snippet? It is very hard to understand what is going on in there. Best Regards, Dom. śr., 16 sty 2019 o 13:05 Ramya Ramamurthy napisał(a): > Hi > > I have a Flink 1.7 with Kafka 0.11 and ES 6.5 setup. > > I can see the Flink Kafka Consumer consuming messages, bu

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-22 Thread Dominik Wosiński
Hey! I also think that creating the separate branch for Blink in Flink repo is a better idea than creating the fork as IMHO it will allow merging changes more easily. Best Regards, Dom. wt., 22 sty 2019 o 10:09 Ufuk Celebi napisał(a): > Hey Stephan and others, > > thanks for the summary. I'm ve

Why isn't state supported in AsyncFunction?

2019-04-30 Thread Dominik Wosiński
Hey, Currently when trying to get the state inside the *open* method for Flink RichAsyncFunction we will get *State is not supported in rich async functions *exception.Why doesn't AsyncFunction support state ? Thanks in advance, Best Regards, Dom.

WebUI Custom Metrics

2019-06-26 Thread Dominik Wosiński
Hey, I am building jobs that use Typesafe Config under the hood. The configs tend to grow big. I was wondering whether there is a possibility to use WebUI to show the config that the job was run with, currently the only idea is to log the config and check it inside the logs, but with dozens of job

[DISCUSS] Introduce MongoDB connector for Flink

2020-09-30 Thread Dominik Wosiński
Hey all, I've seen several tasks with propositions for this both in Flink and Bahir[1]. I haven't seen any discussion neither any work was actually done to implement this (that I know of). So I would like to start a discussion about whether this is something You think would be beneficial to have a

Temporal Table Ordering

2020-11-23 Thread Dominik Wosiński
Hey, I have a question about the ordering of the messages in the Temporal Table. I can observe that for one of my jobs the order of input is correct but the order of the output is not correct. Say I have two streams that both have *id* field which will be used to join and also for Kafka partitionin

State Processor API SQL State

2020-12-01 Thread Dominik Wosiński
Hey, Is it currently possible to obtain the state that was created by SQL query via the State Processor API? I am able to load the checkpoint via the State Processor API, but I wasn't able to think of a way to access the internal state of my JOIN Query. Best Regards, Dom.

Re: State Processor API SQL State

2020-12-02 Thread Dominik Wosiński
join/stream/StreamingJoinOperator.java#L83 > > Best > Yun Tang > > > > > From: Dominik Wosiński > Sent: Tuesday, December 1, 2020 21:05 > To: dev > Subject: State Processor API SQL State > > Hey, > Is it currently possible to obtain the state that was create

Flink Table from KeyedStream

2021-01-21 Thread Dominik Wosiński
Hey, I was wondering if that's currently possible to use KeyedStream to create a properly partitioned Table in Flink 1.11 ? I have a use case where I wanted to first join two streams using Flink SQL and then process them via *KeyedProcessFunction.* So I do something like: implicit val env = Stream

Re: Flink Table from KeyedStream

2021-01-21 Thread Dominik Wosiński
Hey, Thanks for the answer. That's what I've been observing but wanted to know for sure. Best Regards, Dom.

Watermarks when reading from file

2021-03-01 Thread Dominik Wosiński
Hey, I have a question regarding DataStream created from multiple files in s3. I have several files in AWS s3, say the path is s3://files/, and then there are several folders for different days, so in the end the full paths look like : s3://files/day=1/file.parquet, s3://files/day=2/file.parquet. I

Re: Watermarks when reading from file

2021-03-01 Thread Dominik Wosiński
Hey, Thanks for the answer, as I've mentioned in the email the data range is only 30 days, for the tests I've used the data from october so I basically have timestamps starting at midningt of 1st october 2020 and finishing at 23:59 30 october 2020, so if I understand correctly this shouldn't cause

Re: Watermarks when reading from file

2021-03-01 Thread Dominik Wosiński
Hey Till, You were obviously right, my bad here. My math was incorrect. The correct reasoning is that indeed first 5 days of october will be added to the window number 1 and the rest of days will end up in the second window. Solved! Thanks a lotte, Best Regards, Dom.

Incompatible RAW types in Table API

2021-08-09 Thread Dominik Wosiński
Hey all, I think I've hit some weird issue in Flink TypeInformation generation. I have the following code: val stream: DataStream[Event] = ... tableEnv.createTemporaryView("TableName",stream) val table = tableEnv .sqlQuery("SELECT id, timestamp, eventType from TableName") tableEnvironment.toAppen

Re: Incompatible RAW types in Table API

2021-08-09 Thread Dominik Wosiński
owing error: *Invalid argument type at position 0. Data type RAW('org.test.OneEnum', '...') expected but RAW('org.test.OneEnum', '...') passed.* pon., 9 sie 2021 o 15:13 Dominik Wosiński napisał(a): > Hey all, > > I think I've hit some weird

Re: Incompatible RAW types in Table API

2021-08-18 Thread Dominik Wosiński
elp is to explicitly define the types: > > > > E.g. you can also use `DataTypes.of(TypeInformation)` both in > > `ScalarFunction.getTypeInference` and > > `StreamTableEnvironment.toDataStream()`. > > > > I hope this helps. > > > > Timo > > > > On 09.08.21 16:27, Dominik Wos

Common type required when creating TableSchema

2021-08-18 Thread Dominik Wosiński
Hey, I've stumbled across a weird behavior and was wondering whether this is intentional for some reason or the result of a weird bug. So, basically currently if we want to create *org.apache.flink.table.api.Schema *taht has one of the types defined as *RAW (*AVRO enum in my case) it's probably not