date:20190311

Re: Expressing Flink array aggregation using Table / SQL API

2019-03-11 Thread Kurt Young

Hi Piyush, Could you try to add clientId into your aggregate function, and to track the map of inside your new aggregate function, and assemble what ever result when emit. The SQL will looks like: SELECT userId, some_aggregation(clientId, eventType, `timestamp`, dataField) FROM my_kafka_stream_t

Re: How to join stream and dimension data in Flink?

2019-03-11 Thread 徐涛

Hi Hequn, I want to implement stream join dimension in Flink SQL, I found there is a new feature named Temporal Tables delivered by Flink1.7, I think it maybe could be used to achieve the join between stream and dimension table. But I am not sure about that. Could anyone help me about it

Re: Flink credit based flow control

2019-03-11 Thread zhijiang

Hi Brian, Actually I also thought of adding the metrics you mentioned after contributing the credit-based flow control. It should help performance tuning sometimes. If you want to add this metirc, you could trace the related process in `ResultSubpartition`. When the backlog is increasd during a

Expressing Flink array aggregation using Table / SQL API

2019-03-11 Thread Piyush Narang

Hi folks, I’m getting started with Flink and trying to figure out how to express aggregating some rows into an array to finally sink data into an AppendStreamTableSink. My data looks something like this: userId, clientId, eventType, timestamp, dataField I need to compute some custom aggregation

Re: Discrepancy between the part length file's length and the part file length during recover

2019-03-11 Thread Vishal Santoshi

This seems strange. When I pull the ( copyToLocal ) the part file to local FS, it has the same length as reported by the length file. The fileStatus from hadoop seems to have a wrong length. This seems to be true for all these type of discrepancies. It might be that the block information did not g

K8s job cluster and cancel and resume from a save point ?

2019-03-11 Thread Vishal Santoshi

There are some issues I see and would want to get some feedback 1. On Cancellation With SavePoint with a Target Directory , the k8s job does not exit ( it is not a deployment ) . I would assume that on cancellation the jvm should exit, after cleanup etc, and thus the pod should too. That does not

Re: Random forest - Flink ML

2019-03-11 Thread Flavio Pompermaier

I know there's an outgoing promising effort on improving Flink ML in the Streamline project [1] but I don't know why it's not very considered/advertised. Best, Flavio [1] https://h2020-streamline-project.eu/apache-flink/ Il Lun 11 Mar 2019, 15:40 Avi Levi ha scritto: > HI , > According to Til

Flink credit based flow control

2019-03-11 Thread Brian Ramprasad

Hi, I am trying to use the most recent version of Flink over a high latency network and I am trying to measure how long a sender may wait for credits before it can send buffers to the receiver. Does anyone know which function/class where I can measure at the sender side the time spent waiting to

Custom Partition/Task Placement Algorithm

2019-03-11 Thread mbilalce . dev

Hi, I am experimenting with Gelly and Graph partitioning. I wanted to perform some experiments where I can control the placement of partitions of the graph. As far as I know there is no out of the box way of implementing such a capability in Flink. So I am looking for guidance regarding the c

Re: Side Output from AsyncFunction

2019-03-11 Thread Ken Krugler

Hi Mike, 1. Depending on what you need the side output for, you can use metrics to track some things. But yes, that’s a very limited subset of all use cases. 2. As you mentioned, you could output a combo record. Using an Either

Migrating Existing TTL State to 1.8

2019-03-11 Thread Ning Shi

It's exciting to see TTL state cleanup feature in 1.8. I have a question regarding the migration of existing TTL state to the newer version. Looking at the Pull Request [1] that introduced this feature, it seems like that Flink is leveraging RocksDB's compaction filter to remove stale state. I ass

Side Output from AsyncFunction

2019-03-11 Thread Mikhail Pryakhin

Hello Flink experts! My streaming pipeline makes async IO calls via the recommended AsyncFunction. The pipeline evolves and I've encountered a requirement to side output additional events from the function. As it turned out the side output feature is only available in the following functions: Pr

Re: Flink zookeeper HA problem

2019-03-11 Thread Harris, Mark

Sometimes it's the simplest things - the 40 or so jobs we have seem to take longer to reload on cluster start up than in flink 1.6, and it was timing out. Increasing the value for the timeout over 5 minutes and everything works again. From: Harris, Mark Sent: 07

Re: Set partition number of Flink DataSet

2019-03-11 Thread Ken Krugler

Hi Qi, I’m guessing you’re calling createInput() for each input file. If so, then instead you want to do something like: Job job = Job.getInstance(); for each file… FileInputFormat.addInputPath(job, new org.apache.hadoop.fs.Path(file path)); env.createI

Random forest - Flink ML

2019-03-11 Thread Avi Levi

HI , According to Tills comment I understand that flink-ml is going to be ditched. What will be the alternative ? Looking for a "rando

Set partition number of Flink DataSet

2019-03-11 Thread qi luo

Hi, We’re trying to distribute batch input data to (N) HDFS files partitioning by hash using DataSet API. What I’m doing is like: env.createInput(…) .partitionByHash(0) .setParallelism(N) .output(…) This works well for small number of files. But when we need to distribute to

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-11 Thread Niels Basjes

Hi, The Beam project has something in this area that is simply a page within their documentation website: https://beam.apache.org/documentation/sdks/java-thirdparty/ Niels Basjes On Fri, Mar 8, 2019 at 11:39 PM Bowen Li wrote: > > Confluent hub for Kafka is another good example of this kind. I

Re: RMQSource synchronous message ack

2019-03-11 Thread gcandal

First of all, thanks for your time and quick response. I'm not completely sure I understood your example, but is this what you mean: - Sink processes A, B, C - Checkpoint persisted with A, B, C - Notify checkpoint starts - Notify checkpoints ACKs A - Notify checkpoints ACKs B - Job crashes - Job

flink-io FileNotFoundException

2019-03-11 Thread Alexander Smirnov

Hi everybody, I am using Flink 1.4.2 and periodically my job goes down with the following exception in logs. Relaunching the job does not help, only restarting the whole cluster. Is there a JIRA problem for that? will upgrade to 1.5 help? java.io.FileNotFoundException: /tmp/flink-io-20a15b29-183

Re: Joining two streams of different priorities

2019-03-11 Thread Fabian Hueske

Hi, This is not possible with Flink. Events in transport channels cannot be reordered and function cannot pick which input to read from. There are some upcoming changes for the unified batch-stream integration that enable to chose which input to read from, but this is not there yet, AFAIK. Best,

Re: Expressing Flink array aggregation using Table / SQL API

Re: How to join stream and dimension data in Flink?

Re: Flink credit based flow control

Expressing Flink array aggregation using Table / SQL API

Re: Discrepancy between the part length file's length and the part file length during recover

K8s job cluster and cancel and resume from a save point ?

Re: Random forest - Flink ML

Flink credit based flow control

Custom Partition/Task Placement Algorithm

Re: Side Output from AsyncFunction

Migrating Existing TTL State to 1.8

Side Output from AsyncFunction

Re: Flink zookeeper HA problem

Re: Set partition number of Flink DataSet

Random forest - Flink ML

Set partition number of Flink DataSet

Re: [DISCUSS] Create a Flink ecosystem website

Re: RMQSource synchronous message ack

flink-io FileNotFoundException

Re: Joining two streams of different priorities

20 matches

Site Navigation

Mail list logo

Footer information