Re: Expressing Flink array aggregation using Table / SQL API

2019-03-11 Thread Kurt Young
Hi Piyush, Could you try to add clientId into your aggregate function, and to track the map of inside your new aggregate function, and assemble what ever result when emit. The SQL will looks like: SELECT userId, some_aggregation(clientId, eventType, `timestamp`, dataField) FROM my_kafka_stream_t

Re: How to join stream and dimension data in Flink?

2019-03-11 Thread 徐涛
Hi Hequn, I want to implement stream join dimension in Flink SQL, I found there is a new feature named Temporal Tables delivered by Flink1.7, I think it maybe could be used to achieve the join between stream and dimension table. But I am not sure about that. Could anyone help me about it

Re: Flink credit based flow control

2019-03-11 Thread zhijiang
Hi Brian, Actually I also thought of adding the metrics you mentioned after contributing the credit-based flow control. It should help performance tuning sometimes. If you want to add this metirc, you could trace the related process in `ResultSubpartition`. When the backlog is increasd during a

Expressing Flink array aggregation using Table / SQL API

2019-03-11 Thread Piyush Narang
Hi folks, I’m getting started with Flink and trying to figure out how to express aggregating some rows into an array to finally sink data into an AppendStreamTableSink. My data looks something like this: userId, clientId, eventType, timestamp, dataField I need to compute some custom aggregation

Re: Discrepancy between the part length file's length and the part file length during recover

2019-03-11 Thread Vishal Santoshi
This seems strange. When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length. This seems to be true for all these type of discrepancies. It might be that the block information did not g

K8s job cluster and cancel and resume from a save point ?

2019-03-11 Thread Vishal Santoshi
There are some issues I see and would want to get some feedback 1. On Cancellation With SavePoint with a Target Directory , the k8s job does not exit ( it is not a deployment ) . I would assume that on cancellation the jvm should exit, after cleanup etc, and thus the pod should too. That does not

Re: Random forest - Flink ML

2019-03-11 Thread Flavio Pompermaier
I know there's an outgoing promising effort on improving Flink ML in the Streamline project [1] but I don't know why it's not very considered/advertised. Best, Flavio [1] https://h2020-streamline-project.eu/apache-flink/ Il Lun 11 Mar 2019, 15:40 Avi Levi ha scritto: > HI , > According to Til

Flink credit based flow control

2019-03-11 Thread Brian Ramprasad
Hi, I am trying to use the most recent version of Flink over a high latency network and I am trying to measure how long a sender may wait for credits before it can send buffers to the receiver. Does anyone know which function/class where I can measure at the sender side the time spent waiting to

Custom Partition/Task Placement Algorithm

2019-03-11 Thread mbilalce . dev
Hi, I am experimenting with Gelly and Graph partitioning. I wanted to perform some experiments where I can control the placement of partitions of the graph. As far as I know there is no out of the box way of implementing such a capability in Flink. So I am looking for guidance regarding the c

Re: Side Output from AsyncFunction

2019-03-11 Thread Ken Krugler
Hi Mike, 1. Depending on what you need the side output for, you can use metrics to track some things. But yes, that’s a very limited subset of all use cases. 2. As you mentioned, you could output a combo record. Using an Either

Migrating Existing TTL State to 1.8

2019-03-11 Thread Ning Shi
It's exciting to see TTL state cleanup feature in 1.8. I have a question regarding the migration of existing TTL state to the newer version. Looking at the Pull Request [1] that introduced this feature, it seems like that Flink is leveraging RocksDB's compaction filter to remove stale state. I ass

Side Output from AsyncFunction

2019-03-11 Thread Mikhail Pryakhin
Hello Flink experts! My streaming pipeline makes async IO calls via the recommended AsyncFunction. The pipeline evolves and I've encountered a requirement to side output additional events from the function. As it turned out the side output feature is only available in the following functions: Pr

Re: Flink zookeeper HA problem

2019-03-11 Thread Harris, Mark
Sometimes it's the simplest things - the 40 or so jobs we have seem to take longer to reload on cluster start up than in flink 1.6, and it was timing out. Increasing the value for the timeout over 5 minutes and everything works again. From: Harris, Mark Sent: 07

Re: Set partition number of Flink DataSet

2019-03-11 Thread Ken Krugler
Hi Qi, I’m guessing you’re calling createInput() for each input file. If so, then instead you want to do something like: Job job = Job.getInstance(); for each file… FileInputFormat.addInputPath(job, new org.apache.hadoop.fs.Path(file path)); env.createI

Random forest - Flink ML

2019-03-11 Thread Avi Levi
HI , According to Tills comment I understand that flink-ml is going to be ditched. What will be the alternative ? Looking for a "rando

Set partition number of Flink DataSet

2019-03-11 Thread qi luo
Hi, We’re trying to distribute batch input data to (N) HDFS files partitioning by hash using DataSet API. What I’m doing is like: env.createInput(…) .partitionByHash(0) .setParallelism(N) .output(…) This works well for small number of files. But when we need to distribute to

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-11 Thread Niels Basjes
Hi, The Beam project has something in this area that is simply a page within their documentation website: https://beam.apache.org/documentation/sdks/java-thirdparty/ Niels Basjes On Fri, Mar 8, 2019 at 11:39 PM Bowen Li wrote: > > Confluent hub for Kafka is another good example of this kind. I

Re: RMQSource synchronous message ack

2019-03-11 Thread gcandal
First of all, thanks for your time and quick response. I'm not completely sure I understood your example, but is this what you mean: - Sink processes A, B, C - Checkpoint persisted with A, B, C - Notify checkpoint starts - Notify checkpoints ACKs A - Notify checkpoints ACKs B - Job crashes - Job

flink-io FileNotFoundException

2019-03-11 Thread Alexander Smirnov
Hi everybody, I am using Flink 1.4.2 and periodically my job goes down with the following exception in logs. Relaunching the job does not help, only restarting the whole cluster. Is there a JIRA problem for that? will upgrade to 1.5 help? java.io.FileNotFoundException: /tmp/flink-io-20a15b29-183

Re: Joining two streams of different priorities

2019-03-11 Thread Fabian Hueske
Hi, This is not possible with Flink. Events in transport channels cannot be reordered and function cannot pick which input to read from. There are some upcoming changes for the unified batch-stream integration that enable to chose which input to read from, but this is not there yet, AFAIK. Best,