Re: QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-29 Thread Ufuk Celebi
Actually, I couldn't even find a mention of this flag in the docs here: https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/queryable_state.html – Ufuk On Mon, Apr 29, 2019 at 8:45 AM Ufuk Celebi wrote: > I didn't find this as part of the > https://flink.apache.org/new

Re: RocksDB backend with deferred writes?

2019-04-29 Thread Congxian Qiu
Hi, David When you flush data to db, you can reference the serialize logic[1], and store the serialized bytes to RocksDB. [1]  https://github.com/apache/flink/blob/c4b0e8f68c5c4bb2ba60b358df92ee5db1d857df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/stre

Re:Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment

2019-04-29 Thread 胡逸才
Thanks Tang: Following your prompt, I deleted the useless parameters from the command line and added your parameters to flink-config.xml, which has been successfully implemented on YARN in the JAVA 7 environment. At 2019-04-28 11:54:18, "Yun Tang" wrote: Hi Zhangjun Thanks for your rep

Re: Exceptions when launching counts on a Flink DataSet concurrently

2019-04-29 Thread Fabian Hueske
Hi Juan, count() and collect() trigger the execution of a job. Since Flink does not cache intermediate results (yet), all operations from the sink (count()/collect()) to the sources are executed. So in a sense a DataSet is immutable (given that the input of the sources do not change) but completel

Re: Working around lack of SQL triggers

2019-04-29 Thread Fabian Hueske
Hi, I don't think that (the current state of) Flink SQL is a good fit for your requirements. Each query will be executed as an independent job. So there won't be any sharing of intermediate results. You can do some of this manually if you use the Table API, but even then it won't allow for early r

Re: Emitting current state to a sink

2019-04-29 Thread Fabian Hueske
Hi Avi, I'm not sure if you cannot emit data from the keyed state when you receive a broadcasted message. The Context parameter of the processBroadcastElement() method in the KeyedBroadcastProcessFunction has the applyToKeyedState() method. The method takes a KeyedStateFunction that is applied to

Re: FileInputFormat that processes files in chronological order

2019-04-29 Thread Fabian Hueske
Hi Sergei, It depends whether you want to process the file with the DataSet (batch) or DataStream (stream) API. Averell's answer was addressing the DataStream API part. The DataSet API does not have any built-in support to distinguish files (or file splits) by folders and process them in order. F

Re: Apache Flink - Question about dynamically changing window end time at run time

2019-04-29 Thread Fabian Hueske
Hi Mans, I don't know if that would work or not. Would need to dig into the source code for that. TBH, I would recommend to check if you can implement the logic using a (Keyed-)ProcessFunction. IMO, process functions are a lot easier to reason about than Flink's windowing framework. You can manag

Re: Data Locality in Flink

2019-04-29 Thread Fabian Hueske
Hi, The method that I described in the SO answer is still implemented in Flink. Flink tries to assign splits to tasks that run on local TMs. However, files are not split per line (this would be horribly inefficient) but in larger chunks depending on the number of subtasks (and in case of HDFS the

Re: Emitting current state to a sink

2019-04-29 Thread Avi Levi
Thanks! Works like a charm :) On Mon, Apr 29, 2019 at 12:11 PM Fabian Hueske wrote: > *This Message originated outside your organization.* > -- > Hi Avi, > > I'm not sure if you cannot emit data from the keyed state when you > receive a broadcasted message. > The Con

Re: Emitting current state to a sink

2019-04-29 Thread Fabian Hueske
Nice! Thanks for the confirmation :-) Am Mo., 29. Apr. 2019 um 13:21 Uhr schrieb Avi Levi : > Thanks! Works like a charm :) > > On Mon, Apr 29, 2019 at 12:11 PM Fabian Hueske wrote: > >> *This Message originated outside your organization.* >> -- >> Hi Avi, >> >> I'm n

Re: Read mongo datasource in Flink

2019-04-29 Thread Flavio Pompermaier
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1]. The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version. Best, Flavio [1] https://g

Re: Version "Unknown" - Flink 1.7.0

2019-04-29 Thread Vishal Santoshi
Ok, I will check. On Fri, Apr 12, 2019, 4:47 AM Chesnay Schepler wrote: > have you compiled Flink yourself? > > Could you check whether the flink-dist jar contains a > ".version.properties" file in the root directory? > > On 12/04/2019 03:42, Vishal Santoshi wrote: > > Hello ZILI, > I run

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. Cheers, Wouter [1]: https://github.com/codefee

Re: Read mongo datasource in Flink

2019-04-29 Thread Hai
Hi, Flavio: That’s good, Thank you. I will try it later ~ Regards Original Message Sender:Flavio pompermaierpomperma...@okkam.it Recipient:hai...@magicsoho.com Cc:useru...@flink.apache.org Date:Monday, Apr 29, 2019 19:56 Subject:Re: Read mongo datasource in Flink I'm not aware of an officia

Re: Read mongo datasource in Flink

2019-04-29 Thread Hai
Thanks for your sharing ~ That’s great ! Original Message Sender:Wouter zorgdragerw.d.zorgdra...@tudelft.nl Recipient:hai...@magicsoho.com Cc:useru...@flink.apache.org Date:Monday, Apr 29, 2019 20:05 Subject:Re: Read mongo datasource in Flink For a framework I'm working on, we actually implem

Re: Read mongo datasource in Flink

2019-04-29 Thread Flavio Pompermaier
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong? On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager wrote: > For a framework I'm working on, we actually implemented a (basic) Mongo > source [1].

Re: Data Locality in Flink

2019-04-29 Thread Flavio Pompermaier
Hi Fabian, I wasn't aware that "race-conditions may happen if your splits are very small as the first data source task might rapidly request and process all splits before the other source tasks do their first request". What happens exactly when a race-condition arise? Is this exception internally

Re: Read mongo datasource in Flink

2019-04-29 Thread Wouter Zorgdrager
Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working. [1]: https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan Op ma 29 apr. 2019 om 1

Re: Apache Flink - Question about dynamically changing window end time at run time

2019-04-29 Thread M Singh
Sounds great Fabian.  I was just trying to see if I can use higher level datastream apis.  I appreciate your advice and help.  Mans On Monday, April 29, 2019, 5:41:36 AM EDT, Fabian Hueske wrote: Hi Mans, I don't know if that would work or not. Would need to dig into the source c

Re: Data Locality in Flink

2019-04-29 Thread Fabian Hueske
Hi Flavio, These typos of race conditions are not failure cases, so no exception is thrown. It only means that a single source tasks reads all (or most of the) splits and no splits are left for the other tasks. This can be a problem if a record represents a large amount of IO or an intensive compu

RE: kafka partitions, data locality

2019-04-29 Thread Smirnov Sergey Vladimirovich (39833)
Hi Stefan, Thnx for clarify! But still it remains an open question for me because we use keyBy method and I did not found any public interface of keys reassignment (smth like partionCustom for DataStream). As I heard, there is some internal mechanism with key groups and mapping key to groups. I

Re: Data Locality in Flink

2019-04-29 Thread Flavio Pompermaier
Thanks Fabian, that's more clear..many times you don't know when to rebalance or not a dataset because it depends on the specific use case and dataset distribution. An automatic way of choosing whether a Dataset could benefit from a rebalance or not could be VERY nice (at least for batch) but I fea

POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
In https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset , POJO data type is available to conve

Re: Version "Unknown" - Flink 1.7.0

2019-04-29 Thread Vishal Santoshi
#Generated by Git-Commit-Id-Plugin #Wed Apr 03 22:57:42 PDT 2019 git.commit.id.abbrev=4caec0d git.commit.user.email=aljoscha.kret...@gmail.com git.commit.message.full=Commit for release 1.8.0\n git.commit.id=4caec0d4bab497d7f9a8d9fec4680089117593df git.commit.message.short=Commit for release

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Timo Walther
Hi Sung, private fields are only supported if you specify getters and setters accordingly. Otherwise you need to use `Row.class` and perform the mapping in a subsequent map() function manually via reflection. Regards, Timo Am 29.04.19 um 15:44 schrieb Sung Gon Yi: In https://ci.apache.org/

Re: Read mongo datasource in Flink

2019-04-29 Thread Kenny Gorman
Just a thought, A robust and high performance way to potentially achieve your goals is: Debezium->Kafka->Flink https://debezium.io/docs/connectors/mongodb/ Good robust handling of various topologies, reasonably good scaling properties, good resta

Re: Emitting current state to a sink

2019-04-29 Thread M Singh
Hi Avi: Can you please elaborate (or include an example/code snippet) of how you were able to achieve collecting the keyed states from the processBroadcastElement method using the applyToKeyedState ?  I am trying to understand which collector you used to emit the state since the broadcasted e

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

2019-04-29 Thread Gary Yao
Since there were no objections so far, I will proceed with removing the code [1]. [1] https://issues.apache.org/jira/browse/FLINK-12312 On Wed, Apr 24, 2019 at 1:38 PM Gary Yao wrote: > The idea is to also remove the rescaling code in the JobMaster. This will > make > it easier to remove the Ex

Write simple text into hdfs

2019-04-29 Thread Hai
Hi, Could anyone give a simple way to write a DataSetString into hdfs using a simple way? I look up the official document, and didn’t find that, am I missing some thing ? Many thanks.

Re: Write simple text into hdfs

2019-04-29 Thread Ken Krugler
DataSet.writeAsText(hdfs://) should work. — Ken > On Apr 29, 2019, at 8:00 AM, Hai wrote: > > Hi, > > Could anyone give a simple way to write a DataSet into hdfs using a > simple way? > > I look up the official document, and didn’t find that, am I missing some > thing ? > > Many thanks.

Re: Working around lack of SQL triggers

2019-04-29 Thread deklanw
Hi, Thanks for the reply. I had already almost completely lost hope in using Flink SQL. You have confirmed that. But, like I said, I don't know how to reduce the large amount of boilerplate I foresee this requiring with the DataStream API. Can you help me with that? You mention "parameterizable

Flink Load multiple file

2019-04-29 Thread Soheil Pourbafrani
Hi, I want to load multiple file and apply the processing logic on them. After some searches using the following code I can load all the files in the directory named "input" into Flink: TextInputFormat tif = new TextInputFormat(new Path("input")); DataSet raw = env.readFile(tif, "input//"); If

Flink session window not progressing

2019-04-29 Thread Henrik Feldt
Hi guys, I'm going a PoC with Flink and I was wondering if you could help me. I've asked a question here https://stackoverflow.com/questions/55907954/flink-session-window-sink-timestamp-not-progressing with some images. However, in summary my question is this; why doesn't my session window pr

Re: Flink session window not progressing

2019-04-29 Thread Henrik Feldt
Thinking more about this; it might just be me who is reacting to the sink having a zero rate of output. In fact, I have about two gigs of messages left in the queue until it's up to date, so I may just be running a slow calculation (because I've run a batch job to backfill to after stream). Per

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-29 Thread an0
Thanks very much. It definitely explains the problem I'm seeing. However, something I need to confirm: You say "Watermarks are broadcasted/forwarded anyway." Do you mean, in assingTimestampsAndWatermarks.keyBy.window, it doesn't matter what data flows through a specific key's stream, all key str

Flink heap memory

2019-04-29 Thread Rad Rad
Hi, I would like to know the amount of heap memory currently used (in bytes) of a specific job which runs on Flink cluster. Regards. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink session window not progressing

2019-04-29 Thread Konstantin Knauf
Hi Henrik, yes, the output count of a sink (and the input count of sources) is always zero, because only Flink internal traffic is reflected in these metrics. There is a Jira issue to change this [1]. Cheers, Konstantin [1] https://issues.apache.org/jira/browse/FLINK-7286 On Mon, Apr 29, 201

Re: assignTimestampsAndWatermarks not work after KeyedStream.process

2019-04-29 Thread M Singh
Hi An0: Here is my understanding - each operator has the watermark which is the lowest of all it's input streams. When the watermark for an operator is updated, the lowest one becomes the new watermark for that operator and is fowarded to the output streams for that operator.  So, if one of the

Timestamp and key preservation over operators

2019-04-29 Thread Averell
Hello, I extracted timestamps using BoundedOutOfOrdernessTimestampExtractor from my sources, have a WindowFunction, and found that my timestamps has been lost. To do another Window operation, I need to extract timestamp again. I tried to find a document for that but haven't found one. Could you pl

How to verify what maxParallelism is set to?

2019-04-29 Thread Sean Bollin
Hi all, How do you verify what max parallelism is set to on the job level? I do not see it in the 1.6 UI, for example. I’m setting maxParallelism to 4096 on the StreamExecutionEnvironment before execution but printing out the maxParallelism in an operator still displays -1. Since this is such

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
> On 29 Apr 2019, at 11:12 PM, Timo Walther wrote: > > Hi Sung, > > private fields are only supported if you specify getters and setters > accordingly. Otherwise you need to use `Row.class` and perform the mapping in > a subsequent map() function manually via reflection. > > Regards, > Timo

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
Sorry. I sent an empty reply. I tried again with getter/setter. And it works. Thanks. — import lombok.Getter; import lombok.Setter; import java.io.Serializable; @Getter @Setter public class P implements Serializable { private String name; private Integer value; } — > On 29 Ap

Re: How to verify what maxParallelism is set to?

2019-04-29 Thread Guowei Ma
Hi, StreamExecutionEnvironment is used to set a default maxParallelism for global. If a "operator"'s maxParallelism is -1 the operator will be set the maxParallelism which is set by StreamExecutionEnvironment. >>>Any API or way I can verify? I can't find any easy way to do that. But you could use

Re: How to verify what maxParallelism is set to?

2019-04-29 Thread Sean Bollin
Thanks! Do you know if it's possible somehow to verify the global maxParallelism other than calling .getMaxParallelism? Either through an API call or the UI? On Mon, Apr 29, 2019 at 8:12 PM Guowei Ma wrote: > > Hi, > StreamExecutionEnvironment is used to set a default maxParallelism for > global

can we do Flink CEP on event stream or batch or both?

2019-04-29 Thread kant kodali
Hi All, I have the following questions. 1) can we do Flink CEP on event stream or batch? 2) If we can do streaming I wonder how long can we keep the stream stateful? I also wonder if anyone successfully had done any stateful streaming for days or months(with or without CEP)? or is stateful stream

Re: Flink heap memory

2019-04-29 Thread Konstantin Knauf
Hi Rad, the heap memory can only measured on a the JVM (Taskmanager/Jobmanager) level. If you have multiple jobs running in the same cluster, you can not separate their memory footprint easily unless you you only run Taskmanagers with a single Taskslot, so that one Taskmanager is always only execu

Re: Timestamp and key preservation over operators

2019-04-29 Thread Guowei Ma
Hi, Most operators will preserve the input elements timestamp if it has. Window is a special case. The timestamp of elements emitted by window is the maxTimestamp of the Window which is triggered. Different Window will have different implementation.(GlobalWindow/TimeWindow/CustomizedWindow). Keyby