Strange behaviour of windows

2015-12-07 Thread Dawid Wysakowicz
ts is empty). Do I understand the mechanism correctly and should my code work as I described? If not could you please explain a little bit? The code I've attached to this email. I would be grateful. Regards Dawid Wysakowicz import java.util.concurrent.TimeUn

Re: Strange behaviour of windows

2015-12-07 Thread Dawid Wysakowicz
Forgot to mention. I've checked it both on 0.10 and current master. 2015-12-07 20:32 GMT+01:00 Dawid Wysakowicz : > Hi, > > I have recently experimented a bit with windowing and event-time mechanism > in flink and either I do not understand how should it work or there is som

Re: Strange behaviour of windows

2015-12-08 Thread Dawid Wysakowicz
think you understood the watermarks quite well. :D > > Let us know if you need more information. > > Cheers, > Aljoscha > > > > On 07 Dec 2015, at 20:34, Dawid Wysakowicz > wrote: > > > > Forgot to mention. I've checked it both on 0.10 and current master. &g

Re: [SQL DDL] How to extract timestamps from Kafka message's metadata

2020-08-11 Thread Dawid Wysakowicz
I'm afraid it is not supported yet. The discussion[1] to support it started in the past, but unfortunately it has not concluded yet. One approach I can think of, how you can work this limitation around is to provide your own Format[2]. Unfortunately it is not the most straightforward solution. Be

Re: Matching largest event pattern without duplicates

2020-08-11 Thread Dawid Wysakowicz
Hi James, I think it is not easy to achieve with the CEP library. Adding the consecutive quantifier to the oneOrMore strategy should eliminate a few of the unwanted cases from your example (`b:c`, `b`, `a`, `c`), but it would not eliminate the `c:a`. The problem is you need to skip to the first du

Re: [Flink-KAFKA-KEYTAB] Kafkaconsumer error Kerberos

2020-08-11 Thread Dawid Wysakowicz
Hi, As far as I know the approach 2) is the supported way of setting up Kerberos authentication in Flink. In the second approach have you tried setting the `sasl.kerberos.service.name` in the configuration of your KafkaConsumer/Producer[1]? I think this might be the issue. Best, Dawid [1] https

Re: Flink issue in emitting data to same sideoutput from onTimer and processElement

2020-08-14 Thread Dawid Wysakowicz
ka.ms/ghei36> > > > *From:* Jaswin Shah > *Sent:* Friday, August 14, 2020 3:09:21 PM > *To:* user@flink.apache.org ; Dawid Wysakowicz > ; Yun Tang > *Subject:* Flink issue in emitting data to same sideoutput from > onTimer an

[DISCUSS] Removing deprecated methods from DataStream API

2020-08-17 Thread Dawid Wysakowicz
Hi devs and users, I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API. The APIs I have in mind are: * RuntimeContext#getAllAccumulators (deprecated in 0.10) * DataStream#fold and all related classes and methods such as FoldFunction,

Re: [DISCUSS] Removing deprecated methods from DataStream API

2020-08-17 Thread Dawid Wysakowicz
estigated to see if they are > actually easy to remove. > > Cheers, > Kostas > > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz > wrote: >> Hi devs and users, >> >> I wanted to ask you what do you think about removing some of the deprecated >> APIs aroun

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-18 Thread Dawid Wysakowicz
Hi all, @Klou Nice write up. One comment I have is I would suggest using a different configuration parameter name. The way I understand the proposal the BATCH/STREAMING/AUTOMATIC affects not only the scheduling mode but types of shuffles as well. How about `execution.mode` ? Or `execution-runtime-

Re: [DISCUSS] Removing deprecated methods from DataStream API

2020-08-19 Thread Dawid Wysakowicz
t; Best, >  Yun > > --Original Mail -- > *Sender:*Kostas Kloudas <mailto:kklou...@gmail.com>> > *Send Date:*Tue Aug 18 03:52:44 2020 > *Recipients:*Dawid Wysakowicz <mailto:dwysakow...@ap

Re: Table API Kafka Connector Sink Key Definition

2020-08-20 Thread Dawid Wysakowicz
Hi Yuval, Unfortunately setting the key or timestamp (or other metadata) from the SQL API is not supported yet. There is an ongoing discussion to support it[1]. Right now your option would be to change the code of KafkaTableSink and write your own version of KafkaSerializationSchema as Till menti

Re: JSON to Parquet

2020-08-25 Thread Dawid Wysakowicz
Hi Averell, If you can describe the JSON schema I'd suggest looking into the SQL API. (And I think you do need to define your schema upfront. If I am not mistaken Parquet must know the common schema.) Then you could do sth like: CREATE TABLE json (     // define the schema of your json data ) WIT

Re: JSON to Parquet

2020-08-25 Thread Dawid Wysakowicz
Hi Averell, If you can describe the JSON schema I'd suggest looking into the SQL API. (And I think you do need to define your schema upfront. If I am not mistaken Parquet must know the common schema.) Then you could do sth like: CREATE TABLE json (     // define the schema of your json data ) WIT

Re: Flink Table API/SQL debugging, testability

2020-08-25 Thread Dawid Wysakowicz
Hi, What exactly are you looking for? I think the simplest stub for a test could be sth like:             final TableEnvironment env = TableEnvironment.create(...);             TableResult result = env.fromValues(...)                 .select(...)                 .execute();             try (Clo

Re: Default Flink Metrics Graphite

2020-08-25 Thread Dawid Wysakowicz
Hi Vijay, I think the problem might be that you are using a wrong version of the reporter. You say you used flink-metrics-graphite-1.10.0.jar from 1.10 as a plugin, but it was migrated to plugins in 1.11 only[1]. I'd recommend trying it out with the same 1.11 version of Flink and Graphite report

Re: Why consecutive calls of orderBy are forbidden?

2020-08-26 Thread Dawid Wysakowicz
Hi, I think you are hitting a bug here. It should be possible what you are trying to do. Would you like to open a bug for it? However, the bug applies to the legacy batch planner (you are using the BatchTableEnvironment), which is no longer maintained and there were discussions already to drop it

Re: Idle stream does not advance watermark in connected stream

2020-08-26 Thread Dawid Wysakowicz
Hi Kien, I am afraid this is a valid bug. I am not 100% sure but the way I understand the code the idleness mechanism applies to input channels, which means e.g. when multiple parallell instances shuffle its results to downstream operators. In case of a two input operator, combining the watermark

Re: Default Flink Metrics Graphite

2020-08-26 Thread Dawid Wysakowicz
d, > > I have 1.10.0 version of flink. What is alternative for this version ? > > Regards, > Vijay > >> >> On Aug 25, 2020, at 11:44 PM, Dawid Wysakowicz >> wrote: >> >>  >> >> Hi Vijay, >> >> I think the problem might be tha

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-28 Thread Dawid Wysakowicz
@Aljoscha Let me bring back to the ML some of the points we discussed offline. Ad. 1 Yes I agree it's not just about scheduling. It includes more changes to the runtime. We might need to make it more prominent in the write up. Ad. 2 You have a good point here that switching the default value for

Re: Idle stream does not advance watermark in connected stream

2020-08-31 Thread Dawid Wysakowicz
ke operators aware of idleness > such > that they can take this into account when computing the combined > output > watermark. > > Best, > Aljoscha > > On 26.08.20 10:02, Dawid Wysakowicz wrote: > > Hi Kien, > > > > I a

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-07 Thread Dawid Wysakowicz
Hi Alexey, There is no mock for RocksDB. Moreover I am not sure what would be the use case for one. If you want to test specifically against RocksDB then you can use it in the test harness Gordon mentioned. On 04/09/2020 16:31, Alexey Trenikhun wrote: > Hi Gordon, > We already use [1]. Unfortunat

Re: FLINK DATASTREAM Processing Question

2020-09-07 Thread Dawid Wysakowicz
Hi, You can see the execution plan via StreamExecutionEnvironment#getExecutionPlan(). You can visualize it in[1]. You can also submit your job and check the execution plan in Web UI. As for the question which option is preferred it is very subjective. As long as in the option b) both maps are cha

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
ing that back in.. > > > On 28.08.20 13:54, Dawid Wysakowicz wrote: >> @Aljoscha Let me bring back to the ML some of the points we discussed >> offline. >> >> Ad. 1 Yes I agree it's not just about scheduling. It includes more >> changes to the runtime. W

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
where we work off > remaining timers but don't add new ones? Do we silently ignore adding > new ones? > > By the way, I assume WAIT means we wait for processing-time to > actually reach the time of pending timers? Or did you have something > else in mind with this? > >

Re: Flink Stateful Functions API

2020-09-14 Thread Dawid Wysakowicz
Hi, Not sure if there is a "developer" documentation for the protocol. I am cc'ing Igal and Gordon who know better than I if there is one. To give you some hints though. If I am correct the Python API is implemented as a so called remote functions [1][2], which communicate with Flink via HTTP/gRP

Re: Flink Table API and not recognizing s3 plugins

2020-09-14 Thread Dawid Wysakowicz
Hi Dan, As far as I checked in the code, the FileSystemSink will try to create staging directories from the client. I think it might be problematic, as your case shows. We might need to revisit that part. I am cc'ing Jingsong who worked on the FileSystemSink. As a workaround you might try putting

Re: Emit event to kafka when finish sink

2020-09-15 Thread Dawid Wysakowicz
Hi, I am not sure if I understand your first solution, but it sounds rather complicated. I think implementing a custom operator could be a valid approach. You would have to make sure it is run with parallelism of 1. You could additionally implement a BoundedOneInput interface and notify the exter

Re: Error: Avro "CREATE TABLE" with nested rows and missing fields

2020-09-16 Thread Dawid Wysakowicz
Hi Dan, I'd say this is a result of a few assumptions. 1. We try to separate the concept of format from the connector. Therefore we did not make too many assumption which connector does a format work with. 2. Avro needs the original schema that the incoming record was serialized wit

Re: Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-16 Thread Dawid Wysakowicz
Hi Dan, I am afraid there is no mechanism to do that purely in the Table API yet. Or I am not aware of one. If the reinterpretAsKeyedStream works for you, you could use this approach and convert a DataStream (with the reinterpretAsKeyedStream applied) to a Table[1] and then continue with the Table

Re:

2020-09-16 Thread Dawid Wysakowicz
It should work the way you're describing. Can you share a reproducible example? Best, Dawid On 14/08/2020 11:38, Jaswin Shah wrote: > Hi, > > I have a coProcessFunction which emits data to same side output from > processElement1 method and on timer method.But, data is not getting > emitted to si

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-17 Thread Dawid Wysakowicz
Hi, Could you share exactly how do you configure avro & kafka? Do you use Table API or DataStream API? Do you use the ConfluentRegistryDeserializationSchema that comes with Flink or did you built custom DeserializationSchema? Could you maybe share the code for instantiating the source with us? It

Re: Error: Avro "CREATE TABLE" with nested rows and missing fields

2020-09-17 Thread Dawid Wysakowicz
file. Best, Dawid [1] https://issues.apache.org/jira/browse/FLINK-16048 On 16/09/2020 21:20, Dan Hill wrote: > Interesting.  How does schema evolution work with Avro and Flink?  > E.g. adding new fields or enum values. > > On Wed, Sep 16, 2020 at 12:13 PM Dawid Wysakowicz >

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-17 Thread Dawid Wysakowicz
eserializer = new SpecificRecordSerDe<>( > PayloadRecord.class, > PayloadRecord.getClassSchema().toString(), > this.schemaRegistry); > > FlinkKafkaConsumer consumer = new FlinkKafkaConsumer( > this.inputTopic, > deserializer, > this.sour

Re: valuestate(descriptor) using a custom caseclass

2020-09-18 Thread Dawid Wysakowicz
Hi Martin, I am not sure what is the exact problem. Is it that the ProcessFunction is not invoked or is the problem with values in your state? As for the question of the case class and ValueState. The best way to do it, is to provide the TypeInformation explicitly. If you do not provide the TypeI

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-21 Thread Dawid Wysakowicz
> if (deserializer == null) { > synchronized (lock) { > if (deserializer == null) { > deserializer = > ConfluentRegistryAvroDeserializationSchema .forSpecific(tClass, > this.schemaRegistryUrl); >

Re: Flink Table SQL and writing nested Avro files

2020-09-21 Thread Dawid Wysakowicz
Hi Dan, I think the best what I can suggest is this: |SELECT || | |    ROW(left.field0, left.field1, left.field2, ...),| |    ROW(right.field0, right.field1, right.field2, ...)| |FROM ...| You will need to list all the fields manually, as SQL does not allow for asterisks in regular function c

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-22 Thread Dawid Wysakowicz
gt; >>          this.subject = subject; > >>      } > >> > >>      @Override > >>      public ProducerRecord serialize(T element, > @Nullable Long timestamp) { > >>          if (this.serializer == null) { > >>              synchronized (lo

Re: Live updating Serialization Schemas in Flink

2020-10-06 Thread Dawid Wysakowicz
Hi, Unfortunately I don't have a nice solution for you. I would also generally discourage such a pattern. Usually how multiple/dynamic schemas are used is with a help of schema registry. In that case you have some sort of an id serialized along with records which you can use to look up the schema.

Re: state access causing segmentation fault

2020-10-08 Thread Dawid Wysakowicz
Hi, It should be absolutely fine to use multiple state objects. I am not aware of any limits to that. A minimal, reproducible example would definitely be helpful. For those kind of exceptions, I'd look into the serializers you use. Other than that I cannot think of an obvious reason for that kind

Re: Flink Kafka offsets

2020-10-13 Thread Dawid Wysakowicz
Hey Rex, I agree the documentation might be slightly misleading. To get the full picture of that configuration I'd suggest having a look at the DataStream Kafka connector page[1]. The Table connector is just a wrapper around the DataStream one. Let me also try to clarify it a bit more. In case of

Re: Flink is failing for all Jobs if one job gets failed

2020-10-13 Thread Dawid Wysakowicz
Hi, As far as I understand it, it is not a Flink problem. It's your code that is failling to compile the code it gets. It's also quite hard to actually figure out how it is used from within Flink. Best, Dawid On 13/10/2020 10:42, saksham sapra wrote: > > I am working on flink local, i have crea

Re: Required context properties mismatch in connecting the flink with mysql database

2020-10-14 Thread Dawid Wysakowicz
Hi, I think the problem is that you are using BatchTableEnvironment which is deprecated and does not support newer features such as e.g. FLIP-95 sources/sinks. I am sorry it is not more prominent in the documentation. I am not too familiar with the python API, and I am not sure if a unified Table

Re: ValidationException using DataTypeHint in Scalar Function

2020-10-27 Thread Dawid Wysakowicz
Hey Steve, You should be able to do via the bridgedTo parameter. You can additionally specify a serializer you want to use via rawSerializer parameter:         @FunctionHint(                 input = {                         @DataTypeHint(value = "RAW", bridgedTo = Map.class[, rawSerializer = ...

Re: ValidationException using DataTypeHint in Scalar Function

2020-10-29 Thread Dawid Wysakowicz
.TypeInferenceOperandChecker.checkOperandTypesOrError(TypeInferenceOperandChecker.java:126) > ... 50 more* > * > > > [1]  > https://github.com/apache/flink/blob/release-1.11.0/flink-core/src/main/java/org/apache/flink/api/common/typeutils/base/MapSerializer.java > > O

Re: Table Print SQL Connector

2020-10-29 Thread Dawid Wysakowicz
You should be able to use the "print" sink. Remember though that the "print" sink prints into the stdout/stderr of TaskManagers, not the Client, where you submit the query. This is different from the TableResult, which collects results in the client. BTW, for printing you can use TableResult#print,

Re: Table Print SQL Connector

2020-10-30 Thread Dawid Wysakowicz
d, amount, transaction_time]) (1/1) (attempt #0) with > attempt id > 1fd25ab4d3d51009542ebda2bbadb55d_4b71d4c67c3b183d6f63a06700c86645_0_0 > to 2358fbac-908d-4aa2-b643-c32d44b40193 @ localhost (dataPort=-1) with > allocation id 45ea6506d06f73f61a36db764ce07ba7 > 2020-10-29 18:18:3

Re: [Flink::Test] access registered accumulators via harness

2020-10-30 Thread Dawid Wysakowicz
Hi Rinat, First of all, sorry for some nitpicking in the beginning, but your message might be a bit misleading for some. If I understood your message correctly you are referring to Metrics, not accumulators, which are a different concept[1]. Or were you indeed referring to accumulators? Now on to

Re: Logging when building and testing Flink

2020-10-30 Thread Dawid Wysakowicz
You should be able to globally override the configuration file used by surefire plugin which executes tests like this: mvn '-Dlog4j.configuration=[path]/log4j2-on.properties' clean install Bear in mind there is a minor bug in our surefire configuration now: https://issues.apache.org/jira/browse/F

Re: Logging when building and testing Flink

2020-10-30 Thread Dawid Wysakowicz
Small correction to my previous email. The previously mentioned problem is actually not a problem. You can just pass the log4j.configurationFile explicitly: mvn '-Dlog4j.configurationFile=[path]/log4j2-on.properties' clean install Best, Dawid On 23/10/2020 09:48, Juha Mynttinen wrote: > Hey the

Re: Can I get the filename as a column?

2020-10-30 Thread Dawid Wysakowicz
I am afraid there is no such functionality available yet. I think though it is a valid request. I think we can use the upcoming FLIP-107 metadata columns for this purpose and expose the file name as metadata column of a filesystem source. Would you like to create a JIRA issue for it? Best, Dawi

Re: Can I get the filename as a column?

2020-10-30 Thread Dawid Wysakowicz
FLINK-19903 > [2]: > https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors > [3]: https://issues.apache.org/jira/browse/FLINK-15869 > > > On Fri, Oct 30, 2020 at 1:29 PM Ruben Laguna <mailto:ruben.lag...@gmail.com>> wrote: &g

Re: ValidationException using DataTypeHint in Scalar Function

2020-11-09 Thread Dawid Wysakowicz
Hi Steve, Unfortunately the information you posted still does not explain how you ended up with *RAW('java.util.Map', ?)* for your input type. Would be best if you could share an example that I could use to reproduce it. I tried putting down some potential approaches: I tested it with a class ge

Re: Table API, accessing nested fields

2020-11-10 Thread Dawid Wysakowicz
You should use the get method: val table = tenv       .fromDataStream(stream)       .select($"context".get("url"), $"name") Best, Dawid On 10/11/2020 10:15, Ori Popowski wrote: > > How can I access nested fields e.g. in select statements? > > For example, this won't work: > > val tabl

Re: IllegalStateException Printing Plan

2020-11-17 Thread Dawid Wysakowicz
Hi Rex, The executeInsert method as the name states executes the query. Therefore after the method there is nothing in the topology and thus you get the exception. You can either explain the userDocsTable: |userDocsTable.explain()| or you can explain a statement set if you want to postpone the

Re: IllegalStateException Printing Plan

2020-11-18 Thread Dawid Wysakowicz
erators defined in streaming > topology. Cannot execute." Ordering doesn't seem to make a difference > here. > > Anything else I can try to get the JSON? > > Thanks! > > On Tue, Nov 17, 2020 at 1:24 AM Dawid Wysakowicz > mailto:dwysakow...@apache.org>> wrote: >

Re: Dynamic ad hoc query deployment strategy

2020-11-24 Thread Dawid Wysakowicz
Hi, Really sorry for a late reply. To the best of my knowledge there is no such possibility to "attach" to a source/reader of a different job. Every job would read the source separately. `The GenericInMemoryCatalog is an in-memory implementation of a catalog. All objects will be available only f

Re: Flink 1.11 avro format question

2020-11-25 Thread Dawid Wysakowicz
Hi, Just wanted to comment on: How to map the nullable types to union(null, something)? In our schema definition, we follow the Avro recommended definition, list 'null' as the first type. I've also spotted that problem and it will be fixed in 1.12 in https://issues.apache.org/jira/browse/FLINK-2

Re: Flink 1.11 avro format question

2020-11-30 Thread Dawid Wysakowicz
> > -- > > Thanks, > Hongjian Peng > > > At 2020-11-25 22:26:32, "Dawid Wysakowicz" wrote: > > Hi, > > Just wanted to comment on: > > How to map the nullable types to union(null, something)? In our > schema definition, we fol

Re: TextFile source && KeyedWindow triggers --> Unexpected execution order

2020-12-09 Thread Dawid Wysakowicz
Hi Marta, Do you mean you want to emit results every 5 minutes based on the wall time (processing time)? If so you should use the ContinuousProcessingTimeTrigger instead of ContinuousEventTimeTrigger which will emit results based on the event time. Does that solve your problem? Best, Dawid On

Re: Batch loading into postgres database

2020-12-09 Thread Dawid Wysakowicz
Your approach looks rather good to me. In the version with querying for the JobStatus you must remember that there are such states as e.g. INITIALIZING, which just tells you that the job was submitted. In 1.12 we introduced the TableResult#await method, which is a shortcut over what you did in th

Re: Flink UDF registration from jar at runtime

2020-12-10 Thread Dawid Wysakowicz
Hi Jakub, As Guowei said the UDF must be present in the user classloader. It must be there when compiling the program and when executing on the cluster. As of now the TableEnvironment uses the Thread context classloader as the "user classloader" when compiling the query. Therefore you can do the t

Re: Flink UDF registration from jar at runtime

2020-12-10 Thread Dawid Wysakowicz
ill results in a ClassNotFoundException when > executing the environment. (The class is located outside of the > classpath and is loaded succesfully, instances of it behave as expected) > Did I possibly missunderstand what you were proposing? > > Kind regards, > > Jakub >

Re: How to register TableSinks

2021-01-04 Thread Dawid Wysakowicz
Hi Patrick. Happy New Year to you too ;) The method you referring was deprecated along with the TableSink whatsoever in favour of a much improved and feature rich new Source & Sink API. You can find an extensive documentation on this new API here[1]. Therefore if you use the old TableSink interf

Re: StreamingFileSink with ParquetAvroWriters

2021-01-14 Thread Dawid Wysakowicz
Hi Jan Could you make sure you are packaging that dependency with your job jar? There are instructions how to configure your build setup[1]. Especially the part how to build a jar with dependencies might come in handy[2]. Best, Dawid [1] https://ci.apache.org/projects/flink/flink-docs-release-1

Re: Elasticsearch config maxes can not be disabled

2021-01-14 Thread Dawid Wysakowicz
Hi, First of all, what Flink versions are you using? You are right it is a mistake in the documentation of the sink.bulk-flush.max-actions. It should say: Can be set to |'-1'| to disable it. I created a ticket[1] to track that. And as far as I can tell and I quickly checked that it should work. A

Re: error accessing S3 bucket 1.12

2021-01-14 Thread Dawid Wysakowicz
Hi Billy, I think you might be hitting the same problem as described in this thread[1]. Does your bucket meet all the name requirements as described in here[2] (e.g. have an underscore)? Best, Dawid [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unable-to-set-S3-like-ob

Re: Enrich stream with SQL api

2021-01-14 Thread Dawid Wysakowicz
Hi Marek, I am afraid I don't have a good answer for your question. The problem indeed is that the JDBC source can work only as a bounded source. As you correctly pointed out, as of now mixing bounded with unbounded sources does not work with checkpointing, which we want to address in the FLIP-147

Re: Enrich stream with SQL api

2021-01-15 Thread Dawid Wysakowicz
meProvider-org.apache.flink.table.connector.source.LookupTableSource.LookupContext- > > > czw., 14 sty 2021 o 20:07 Dawid Wysakowicz <mailto:dwysakow...@apache.org>> napisał(a): > > Hi Marek, > > I am afraid I don't have a good answer for your question. The >

Re: AW: StreamingFileSink with ParquetAvroWriters

2021-01-15 Thread Dawid Wysakowicz
rg/projects/flink/flink-docs-release-1.11/dev/project-configuration.html#create-project > >   > > Best, > > Jan > >   > >   > > *Von:*Dawid Wysakowicz > *Gesendet:* Donnerstag, 14. Januar 2021 12:42 > *An:* Jan Oelschlegel ; > user@flink.apache.org >

Re: Elasticsearch config maxes can not be disabled

2021-01-15 Thread Dawid Wysakowicz
'index'= '${sys:graph.flink.index_name}', > 'format'= 'json', > 'sink.bulk-flush.max-actions'= '0', > 'sink.bulk-flush.max-size'= '0', > 'sink.bulk-flush.interval'= '1s', > '

Re: Elasticsearch config maxes can not be disabled

2021-01-18 Thread Dawid Wysakowicz
QL and change '0's to '-1'. We received "Caused by: > java.lang.IllegalArgumentException: Could not parse value '-1' for key > 'sink.bulk-flush.max-size'." > > On Fri, Jan 15, 2021 at 6:04 AM Dawid Wysakowicz > mailto:dwysako

Re: DataStream API: Best way for reading csv file

2021-01-26 Thread Dawid Wysakowicz
Hi Jan, First of all I'd rather recommend Table API for processing structured data. However if you are convinced you want to use the DataStream API, the CsvInputFormat supports the java.sql.Date type. You can try that or what I would suggest is to parse the Date field as string and then parse it

Re: Converting non-mini-batch to mini-batch from checkpoint or savepoint

2021-01-26 Thread Dawid Wysakowicz
I am pulling in Jark and Godfrey who are more familiar with the internals of the planner. On 21/01/2021 01:43, Rex Fenley wrote: > Just tested this and I couldn't restore from a savepoint. If I do a > new job from scratch, can I tune the minibatch parameters and restore > from a savepoint without

Re: Comment in source code of CoGroupedStreams

2021-01-26 Thread Dawid Wysakowicz
For the problem of the uid you can follow Guowei's advice. As for the comment, I think it means that all elements of a single key must fit into the memory when they're passed as iterators to the CoGroupFunction. Best, Dawid On 21/01/2021 21:32, Sudharsan R wrote: > Is this comment in the file >

Re: A few questions about minibatch

2021-01-26 Thread Dawid Wysakowicz
I am pulling Jark and Godfrey who are more familiar with the planner internals. Best, Dawid On 22/01/2021 20:11, Rex Fenley wrote: > Hello, > > Does anyone have any more information here? > > Thanks! > > On Wed, Jan 20, 2021 at 9:13 PM Rex Fenley > wrote: > > Hi,

Re: Difference between table.exec.source.idle-timeout and setIdleStateRetentionTime ?

2021-01-26 Thread Dawid Wysakowicz
Hi, The difference is that the *table.exec.source.idle-timeout *is used for dealing with source idleness[1]. It is a problem that a watermark cannot advance if some of the partition become idle (do not produce any records). Watermark is always the minimum of watermarks of all input partitions. The

Re: flink slot communication

2021-01-26 Thread Dawid Wysakowicz
Hi, If tasks end up in the same TaskManager, they us LocalInputChannel(s), which does not go through network, but reads directly from local partitions. I am also pulling in @Piotr who might give you some more insights, or correct me if I am wrong. [1] https://ci.apache.org/projects/flink/flink-d

Re: Conflicts between the JDBC and postgresql-cdc SQL connectors

2021-01-26 Thread Dawid Wysakowicz
Hi, Unfortunately I am not familiar with the packaging of flink-connector-postgres-cdc. Maybe @Jark could help here? However, I think the problem that you cannot find the connector is caused because of lack of entry in the resulting Manifest file. If there are overlapping classes maven does not e

Re: Difference between table.exec.source.idle-timeout and setIdleStateRetentionTime ?

2021-01-27 Thread Dawid Wysakowicz
ch that no new event comes in for those > partial matches. > > thanks. > >   > > *From: *Dawid Wysakowicz > *Date: *Tuesday, January 26, 2021 at 3:14 AM > *To: *Dcosta, Agnelo (HBO) , > user@flink.apache.org > *Subject: *Re: Difference between table.exec.source.id

Re: Schema registry deserialization: Kryo NPE error

2020-03-02 Thread Dawid Wysakowicz
Hi Nitish, Just to slightly extend on Arvid's reply. As Arvid said the Kryo serializer comes from the call to TypeExtractor.getForClass(classOf[GenericRecord]). As a GenericRecord is not a pojo this call will produce a GenericTypeInfo which uses Kryo serialization. For a reference example I would

Re: Flink EventTime Processing Watermark is always coming as 9223372036854725808

2020-03-02 Thread Dawid Wysakowicz
Hi Anuj, What parallelism has your source? Do all of your source tasks produce records? Watermark is always the minimum of timestamps seen from all the upstream operators. Therefore if some of them do not produce records the watermark will not progress. You can read more about Watermarks and how t

Re: Kerberos authentication for SQL CLI

2020-03-24 Thread Dawid Wysakowicz
Hi Gyula, As far as I can tell SQL cli does not support Kerberos natively. SQL CLI submits all the queries to a running Flink cluster. Therefore if you kerberize the cluster the queries will use that configuration. On a different note. Out of curiosity. What would you expect the SQL CLI to use th

Re: Importance of calling mapState.clear() when no entries in mapState

2020-03-24 Thread Dawid Wysakowicz
I think there should be no reason to do that. Best, Dawid On 24/03/2020 09:29, Ilya Karpov wrote: > Hi, > > given: > - flink 1.6.1 > - stateful function with MapState mapState = //init logic; > > Is there any reason I should call mapState.clear() if I know beforehand that > there are no entrie

Re: How to calculate one alarm strategy for each device or one alarm strategy for each type of IOT device

2020-03-24 Thread Dawid Wysakowicz
Hi, Could you elaborate a bit more what do you want to achieve. What have you tried so far? Could you share some code with us? What problems are you facing? From the vague description you provided you should be able to design it with e.g. KeyedProcessFunction[1] Best, Dawid [1] https://ci.apac

Re: Lack of KeyedBroadcastStateBootstrapFunction

2020-03-24 Thread Dawid Wysakowicz
Hi, I am not very familiar with the State Processor API, but from a brief look at it, I think you are right. I think the State Processor API does not support mixing different kinds of states in a single operator for now. At least not in a nice way. Probably you could implement the KeyedBroadcastSt

Re: Kerberos authentication for SQL CLI

2020-03-24 Thread Dawid Wysakowicz
> I was just wondering why is this different from how the CliFrontend > works which also installs the security context on the Client side. > > I guess same arguments should apply for the SQL CLI, whatever they > might be :) > > Gyula > > On Tue, Mar 24, 2020 at 12:30

Re: How to calculate one alarm strategy for each device or one alarm strategy for each type of IOT device

2020-03-25 Thread Dawid Wysakowicz
Hi, I think what you are doing makes sense in principal. Probably you don't want to store all the data until you have enough but compute only what's necessary on the fly. So e.g. for your example I would store only how many consecutive events with temperature higher than 10 you have seen so far.

Re: When i use the Tumbling Windows, find lost some record

2020-03-26 Thread Dawid Wysakowicz
Hi, Can you share more details what do you mean that you loose some records? Can you share what data are you ingesting what are the expected results and what are the actual results you are getting. Without that it's impossible to help you. So far your code looks rather correct. Best, Dawid On 2

Re: Fault tolerance in Flink file Sink

2020-04-24 Thread Dawid Wysakowicz
Hi Eyal, First of all I would say a local filesystem is not a right choice for what you are trying to achieve. I don't think you can achive a true exactly once policy in this setup. Let me elaborate why. Let me clarify a bit how the StreamingFileSink works.  The interesting bit is how it behaves

Re: Fault tolerance in Flink file Sink

2020-04-24 Thread Dawid Wysakowicz
Forgot to cc Kostas On 23/04/2020 12:11, Eyal Pe'er wrote: > > Hi all, > I am using Flink streaming with Kafka consumer connector > (FlinkKafkaConsumer) and file Sink (StreamingFileSink) in a cluster > mode with exactly once policy. > > The file sink writes the files to the local disk. > > I’ve no

Re: Cannot map nested Tuple fields to table columns

2020-04-27 Thread Dawid Wysakowicz
Hi Gyula, I think you are hitting a bug with the naming/aliasing of the fields of a Tuple. The bug is in the org.apache.flink.table.typeutils.FieldInfoUtils#isReferenceByPosition method. As it does not work correctly for aliases. Would you mind creating an issue for it? You should be able to alia

Re: Cannot map nested Tuple fields to table columns

2020-04-27 Thread Dawid Wysakowicz
lds even after aliasing and also tuple fields > (f1, f0) so I assume reordering will still work if tuple and row > aliasing is fixed. > > I will open a JIRA for this! > > Thanks! > Gyula > > On Mon, Apr 27, 2020 at 4:58 PM Dawid Wysakowicz > mailto:dwysakow...@apache.or

Re: [DISCUSS] Hierarchies in ConfigOption

2020-04-29 Thread Dawid Wysakowicz
Hi all, I also wanted to share my opinion. When talking about a ConfigOption hierarchy we use for configuring Flink cluster I would be a strong advocate for keeping a yaml/hocon/json/... compatible style. Those options are primarily read from a file and thus should at least try to follow common p

Re: Using logicalType in the Avro table format

2020-04-30 Thread Dawid Wysakowicz
Hi Gyula, I have not verified it locally yet, but I think you are hitting yet another problem of the unfinished migration from old TypeInformation based type system to the new type system based on DataTypes. As far as I understand the problem the information about the bridging class (java.sql.Time

Re: Flink consuming rate increases slowly

2020-05-13 Thread Dawid Wysakowicz
Hi Eyal, Could you explain your job a bit more? Did you increase the parallelism of your job? What does it do? Does it perform any time based operations? How do you measure the processing rate? Best, Dawid On 10/05/2020 21:18, Chen Qin wrote: > Hi Eyal, > > It’s unclear what warmup phase does i

Re: Flink BLOB server port exposed externally

2020-05-13 Thread Dawid Wysakowicz
Hi Omar, Theoretically I think it could be possible to change the address on which the BlobServer runs (even to localhost). There is no configuration option for it now and the BlobServer always binds to the wildcard. One important aspect to consider here is that the BlobServer must be accessible f

Re: Register time attribute while converting a DataStream to Table

2020-05-13 Thread Dawid Wysakowicz
Hi, Unfortunately support for consuming upsert stream is not supported yet. It's not as easy as adding the type information there as you suggested. Even if you do that it will still be considered to be an append message internally by the planner. There is an ongoing effort (FLIP-95[1]) to support

Re: How to read UUID out of a JDBC table source

2020-05-18 Thread Dawid Wysakowicz
Hi Dario, What version of Flink are you using? Are you implementing your own TableSource or do you use the JdbcTableSource that comes with Flink? Which planner do you use blink or the old one? Unfortunately the rework of Table API type system is not finished yet, therefore there are still rough e

Re: Timeout Callbacks issue -Flink

2020-05-25 Thread Dawid Wysakowicz
Hi Jaswin, I can't see any obvious problems in your code. It looks rather correct. What exactly do you mean that "callback is coming earlier than registered callback timeout"? Could you explain that with some examples? As for the different timezones. Flink does not make any assumptions on the ti

  1   2   3   4   5   >