Re: Slow watermark advances

2018-04-13 Thread Xingcan Cui
Yes, Chengzhi. That’s exactly what I mean. But you should be careful with the semantics of your pipeline. The problem cannot be gracefully solved if there’s a natural time offset between the two streams. Best, Xingcan > On 14 Apr 2018, at 4:00 AM, Chengzhi Zhao wrote: > > Hi Xingcan, > > Tha

Re: Slow watermark advances

2018-04-13 Thread Chengzhi Zhao
Hi Xingcan, Thanks for your quick response and now I understand it better. To clarify, do you mean try to add a static time when I override extractTimestamp function? For example, override def extractTimestamp(element: MyEvent, previousElementTimestamp: Long): Long = { val timestamp = elemen

Re: Is Flink able to do real time stock market analysis?

2018-04-13 Thread Fabian Hueske
Hi Ivan, You can certainly do these things with Flink. Michael pointed you in a good direction if you want to implement the logic in the DataStream API / ProcessFunctions. Flink's SQL support should also be able to handle the use case you described. The "ingredients" would be - a TUMBLE window [

Re: Access logs for a running Flink app in YARN cluster

2018-04-13 Thread Tao Xia
Thank you all for the tips. Will give a try. On Fri, Apr 13, 2018 at 12:13 PM, Gary Yao wrote: > Hi, > > I see two options: > > 1. You can login to the slave machines, which run the NodeManagers, and > access > the container logs. The path of the container logs can be configured in > yarn-site.x

Re: Access logs for a running Flink app in YARN cluster

2018-04-13 Thread Gary Yao
Hi, I see two options: 1. You can login to the slave machines, which run the NodeManagers, and access the container logs. The path of the container logs can be configured in yarn-site.xml with the key yarn.nodemanager.log-dirs. In my tests with EMR, the logs are stored at /var/log/hadoop-yarn/con

Trying to understand KafkaConsumer_records_lag_max

2018-04-13 Thread Julio Biason
Hi guys, We are on the final stages of moving our Flink pipeline from staging to production, but I just found something kinda weird: We are graphing some Flink metrics, like flink_taskmanager_job_task_operator_KafkaConsumer_records_lag_max. If I got this right, that's "kafka head offset - flink c

Re: Recovering snapshot state saved with TypeInformation generated by the implicit macro from the scala API

2018-04-13 Thread Gábor Gévay
Hello, A bit of an ugly hack, but maybe you could manually create a class named exactly io.relayr.counter.FttCounter$$anon$71$$anon$33, and copy-paste into it the code that the macro is expanded into [1]? Best, Gábor [1] https://stackoverflow.com/questions/11677609/how-do-i-print-an-expanded-ma

Scaling down Graphite metrics

2018-04-13 Thread ashish pok
All, We love Flinks OOTB metrics but boy there is a ton :) Any way to scale them down (frequency and metric itself)? Flink apps are becoming huge source of data right now. Thanks, -- Ashish

Re: Slow watermark advances

2018-04-13 Thread Xingcan Cui
Hi Chengzhi, currently, the watermarks of the two streams of a connected stream are forcibly synchronized, i.e., the watermark is decided by the stream with a larger delay. Thus the window trigger is also affected by this mechanism. As a workaround, you could try to add (or subtract) a static

Slow watermark advances

2018-04-13 Thread Chengzhi Zhao
Hi, flink community, I had an issue with slow watermark advances and needs some help here. So here is what happened: I have two streams -- A and B, and they perform co-process to join together and A has another steam as output. A --> Output B --> (Connect A) --> Output I used BoundedOutOfOrderne

Re: Any metrics to get the shuffled and intermediate data in flink

2018-04-13 Thread Darshan Singh
Thanks, I could see those on UI. Thanks On Fri, Apr 13, 2018 at 3:12 PM, TechnoMage wrote: > If you look at the web UI for flink it will tell you the bytes received > and sent for each stage of a job. I have not seen any similar metric for > persisted state per stage, which would be nice to ha

Re: Any metrics to get the shuffled and intermediate data in flink

2018-04-13 Thread TechnoMage
If you look at the web UI for flink it will tell you the bytes received and sent for each stage of a job. I have not seen any similar metric for persisted state per stage, which would be nice to have as well. Michael > On Apr 13, 2018, at 6:37 AM, Darshan Singh wrote: > > Hi > > Is there an

Any metrics to get the shuffled and intermediate data in flink

2018-04-13 Thread Darshan Singh
Hi Is there any useful metrics in flink which tells me that a given operator read say 1 GB of data and shuffled(or anything else) and written(in case it was written to temp or anywhere else) say 1 or 2 GB data. One of my job is failing with disk space and there are many sort, group and join is ha

How to rebalance a table without converting to dataset

2018-04-13 Thread Darshan Singh
Hi I have a table and I want to rebalance the data so that each partition is equal. I cna convert to dataset and rebalance and then convert to table. I couldnt find any rebalance on table api. Does anyone know any better idea to rebalance table data? Thanks

Re: Flink - Kafka Connector

2018-04-13 Thread Alexandru Gutan
You will be able to use it. Kafka 1.10 has backwards compatibility with v1.0, 0.11 and 0.10 connectors as far as I know. On 13 April 2018 at 15:12, Lehuede sebastien wrote: > Hi All, > > I'm very new in Flink (And on Streaming Application topic in general) so > sorry if for my newbie question. >

Flink - Kafka Connector

2018-04-13 Thread Lehuede sebastien
Hi All, I'm very new in Flink (And on Streaming Application topic in general) so sorry if for my newbie question. I plan to do some test with Kafka and Flink and use the Kafka connector for that. I find information on this page : https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/co

Simulating Time-based and Count-based Custom Windows with ProcessFunction

2018-04-13 Thread m@xi
Hello Flinkers! Around here and there one may find some post for sliding windows in Flink. I have read that default sliding windows of Flink, the system maintains each window separately in memory, which in my case is prohibitive. Therefore, I want to implement my own sliding windows through Proc

Classloading issues after changing to 1.4

2018-04-13 Thread eSKa
Hello, I still have problem after upgrading from flink 1.3.1 to 1.4.2 Our scenario looks like that: we have container running on top of yarn. Machine that starts it has installed flink and also loading some classpath libraries (e.g. hadoop) into container. there is seperate rest service that gets

heartbeat.timeout in 1.4 document

2018-04-13 Thread makeyang
in code of flink 1.4: HeartbeatManagerOptions HEARTBEAT_TIMEOUT = key("heartbeat.timeout").defaultValue(5L); but this config is not finkd in https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/config.html -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nab

Unable to launch job with 100 SQL queries in yarn cluster

2018-04-13 Thread Adrian Hains
Hi, We are having trouble scaling up Flink to execute a collection of SQL queries on a yarn cluster. Has anyone run this kind of workload on a cluster? Any tips on how to get past this issue? With a high number of Flink SQL queries (100 instances of the query at the bottom of this message), the Fl