subject:"sub"

Keying a data stream by tenant and performing ML on each sub-stream - help

2023-07-18 Thread Catalin Stavaru

Hello everyone, Here is my use case: I have an event data stream which I need to key by a certain field (tenant id) and then for each tenant's events I need to independently perform ML clustering using FlinkML's OnlineKMeans component. I am using Java. I tried different approaches but none of the

Re: Backpressure due to busy sub-tasks

2022-12-16 Thread Alexis Sarda-Espinosa

Hi Martijn, yes, that's what I meant, the throughput in the process function(s) didn't change, so even if they were busy 100% of the time with parallelism=2, they were processing data quickly enough. Regards, Alexis. Am Fr., 16. Dez. 2022 um 14:20 Uhr schrieb Martijn Visser < martijnvis...@apach

Re: Backpressure due to busy sub-tasks

2022-12-16 Thread Martijn Visser

Hi, Backpressure implies that it's actually a later operator that is busy. So in this case, that would be your process function that can't handle the incoming load from your Kafka source. Best regards, Martijn On Tue, Dec 13, 2022 at 7:46 PM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wro

Backpressure due to busy sub-tasks

2022-12-13 Thread Alexis Sarda-Espinosa

Hello, I have a Kafka source (the new one) in Flink 1.15 that's followed by a process function with parallelism=2. Some days, I see long periods of backpressure in the source. During those times, the pool-usage metrics of all tasks stay between 0 and 1%, but the process function appears 100% busy.

Stateful function with GCP Pub/Sub ingress/egress

2022-03-16 Thread David Dixon

The statefun docs have some nice examples of how to use Kafka and Kinesis for ingress/egress in conjunction with a function. Is there some documentation or example code I could reference to do the same with a GCP Pub/Sub topic? Thanks. Dave

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-29 Thread David Anderson

Yes, since the two streams have the same type, you can union the two streams, key the resulting stream, and then apply something like a RichFlatMapFunction. Or you can connect the two streams (again, they'll need to be keyed so you can use state), and apply a RichCoFlatMapFunction. You can use whic

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-29 Thread vishalovercome

I've gone through the example as well as the documentation and I still couldn't understand whether my use case requires joining. 1. What would happen if I didn't join?2. As the 2 incoming data streams have the same type, if joining is absolutely necessary then just a union (oneStream.union(anotherS

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread vishalovercome

Let me make the example more concrete. Say O1 gets as input a data stream T1 which it splits into two using some function and produces DataStreams of type T2 and T3, each of which are partitioned by the same key function TK. Now after O2 processes a stream, it could sometimes send the stream to O3

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread David Anderson

For an example of a similar join implemented as a RichCoFlatMap, see [1]. For more background, the Flink docs have a tutorial [2] on how to work with connected streams. [1] https://github.com/apache/flink-training/tree/master/rides-and-fares [2] https://ci.apache.org/projects/flink/flink-docs-stab

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread Matthias Pohl

1. yes - the same key would affect the same state variable 2. you need a join to have the same operator process both streams Matthias On Wed, Mar 24, 2021 at 7:29 AM vishalovercome wrote: > Let me make the example more concrete. Say O1 gets as input a data stream > T1 > which it splits into two

Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread vishalovercome

type and value for a key then will the two streams end up in the same sub-task and therefore affect the same state variables keyed to that particular key? Do the streams themselves have to have the same type or is it enough that just the keys of each of the input streams have the same type and va

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread Matthias Pohl

stream partitioned by keyBy > > If operator O3 receives inputs from two operators and both inputs have the > same type and value for a key then will the two streams end up in the same > sub-task and therefore affect the same state variables keyed to that > particular key? Do the streams

Re: Change in sub-task id assignment from 1.9 to 1.10?

2020-08-12 Thread Zhu Zhu

that each slot can contain a source task. With config cluster.evenly-spread-out-slots set to true, slots can be evenly distributed in all available taskmanagers in most cases. Thanks, Zhu Zhu Ken Krugler 于2020年8月7日周五上午5:28写道： > Hi all, > > Was there any change in how sub-tasks get all

Change in sub-task id assignment from 1.9 to 1.10?

2020-08-06 Thread Ken Krugler

Hi all, Was there any change in how sub-tasks get allocated to TMs, from Flink 1.9 to 1.10? Specifically for consecutively numbered sub-tasks (e.g. 0, 1, 2) did it become more or less likely that they’d be allocated to the same Task Manager? Asking because a workflow that ran fine in 1.9 now

Re: sub

2020-04-14 Thread Sivaprasanna

Hi, To subscribe, you have to send a mail to user-subscr...@flink.apache.org On Wed, 15 Apr 2020 at 7:33 AM, lamber-ken wrote: > user@flink.apache.org >

sub

2020-04-14 Thread lamber-ken

user@flink.apache.org

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-27 Thread Andrey Zagrebin

>> Hello all! >>>> >>>> I have a setup composed of several streaming pipelines. These have >>>> different deployment lifecycles: I want to be able to modify and redeploy >>>> the topology of one while the other is still up. I am thus putting the

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-25 Thread Xintong Song

gy of one while the other is still up. I am thus putting them in >>> different jobs. >>> >>> The problem is I have a Co-Location constraint between one subtask of >>> each pipeline; I'd like them to run on the same TaskSlots, much like if >>&

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-25 Thread Benoît Paris

like if >> they were sharing a TaskSlot; or at least have them on the same JVM. >> >> A semi-official feature >> "DataStream.getTransformation().setCoLocationGroupKey(stringKey)" [1] >> exists for this, but seem to be tied to the Sub-Tasks actually being able &g

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-24 Thread Xintong Song

fficial feature > "DataStream.getTransformation().setCoLocationGroupKey(stringKey)" [1] > exists for this, but seem to be tied to the Sub-Tasks actually being able > to be co-located on the same Task Slot. > > The documentation mentions [2] that it might be impossible to

Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-24 Thread Benoît Paris

between one subtask of each pipeline; I'd like them to run on the same TaskSlots, much like if they were sharing a TaskSlot; or at least have them on the same JVM. A semi-official feature "DataStream.getTransformation().setCoLocationGroupKey(stringKey)" [1] exists for this, but seem to be

Re: Sub-user

2020-01-02 Thread vino yang

Hi Jary, All the Flink's mailing list information can be found here[1]. [1]: https://flink.apache.org/community.html#mailing-lists Best, Vino Benchao Li 于2020年1月2日周四下午4:56写道： > Hi Jary, > > You need to send a email to *user-subscr...@flink.apache.org > * to subscribe, not user@flink.apache.o

Re: Sub-user

2020-01-02 Thread Benchao Li

Hi Jary, You need to send a email to *user-subscr...@flink.apache.org * to subscribe, not user@flink.apache.org. Jary Zhen 于2020年1月2日周四下午4:53写道： > > -- Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: libenc...@gmail.com; libenc

Sub-user

2020-01-02 Thread Jary Zhen

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Nico Kruber

)start but what makes it worse is that it didn't exit > as failed either. > > Next time I tried running the same job (but new EMR > cluster & all from scratch) it just worked normally. > > On the problematic ru

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Gary Yao

On Thu, Mar 29, 2018 at 8:59 AM, Juho Autio >>> wrote: >>> >>>> I built a new Flink distribution from release-1.5 branch yesterday. >>>> >>>> The first time I tried to run a job with it ended up in some stalled >>>> state so that the jo

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Juho Autio

branch yesterday. >>> >>> The first time I tried to run a job with it ended up in some stalled >>> state so that the job didn't manage to (re)start but what makes it worse is >>> that it didn't exit as failed either. >>> >>> Next ti

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Juho Autio

x27;t exit as failed either. >> >> Next time I tried running the same job (but new EMR cluster & all from >> scratch) it just worked normally. >> >> On the problematic run, The YARN job was started and Flink UI was being >> served, but

Re: All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-29 Thread Gary Yao

YARN job was started and Flink UI was being > served, but Flink UI kept showing status CREATED for all sub-tasks and > nothing seemed to be happening. > > I found this in Job manager log first (could be unrelated) : > > 2018-03-28 15:26:17,449 INFO > org.apache.flink.r

All sub-tasks stuck in CREATED state after "BlobServerConnection: GET operation failed"

2018-03-28 Thread Juho Autio

e job (but new EMR cluster & all from scratch) it just worked normally. On the problematic run, The YARN job was started and Flink UI was being served, but Flink UI kept showing status CREATED for all sub-tasks and nothing seemed to be happening. I found this in Job manager log first (coul

Re: Re-keying / sub-keying a stream without repartitioning

2017-04-26 Thread Elias Levy

stream keyed by (A,B) are already being processed by the local task. Reshuffling as a result of rekeying would have no benefit and would double the network traffic. It is why I suggested subKey(B) may be a good to clearly indicate that the new key just sub-partitions the existing key partition wi

Re: Re-keying / sub-keying a stream without repartitioning

2017-04-26 Thread Aljoscha Krettek

Hi Elias, sorry for the delay, this must have fallen under the table after Flink Forward. I did spend some time thinking about this and we had the idea for a while now to add an operation like “keyByWithoutPartitioning()” (name not final ;-) that would allow the user to tell the system that we d

Re: Re-keying / sub-keying a stream without repartitioning

2017-04-25 Thread Elias Levy

Anyone? On Fri, Apr 21, 2017 at 10:15 PM, Elias Levy wrote: > This is something that has come up before on the list, but in a different > context. I have a need to rekey a stream but would prefer the stream to > not be repartitioned. There is no gain to repartitioning, as the new > partition k

Re-keying / sub-keying a stream without repartitioning

2017-04-21 Thread Elias Levy

This is something that has come up before on the list, but in a different context. I have a need to rekey a stream but would prefer the stream to not be repartitioned. There is no gain to repartitioning, as the new partition key is a composite of the stream key, going from a key of A to a key of

Re: fan out parallel-able operator sub-task beyond total slots number

2016-04-18 Thread Till Rohrmann

t; https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html#workers-slots-resources > > My question is, which is best way to fan out large number of sub tasking > parallel within a task? > > public void testFanOut() throws Exception{ > env =

fan out parallel-able operator sub-task beyond total slots number

2016-04-17 Thread Chen Qin

at. Yet it seems it were not designed to address the scenario above. https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html#workers-slots-resources My question is, which is best way to fan out large number of sub tasking parallel within a task? public void testFanOut() thr

Re: streaming hdfs sub folders

2016-02-26 Thread Stephan Ewen

t; (that might be relevant how the flag needs to be applied) >>>>>> See the code Snipped below: >>>>>> >>>>>> DataStream inStream = >>>>>> env.readFile(new AvroInputFormat(new >>>>>> Path(filePat

Re: streaming hdfs sub folders

2016-02-23 Thread Martin Neumann

;> See the code Snipped below: >>>>> >>>>> DataStream inStream = >>>>> env.readFile(new AvroInputFormat(new >>>>> Path(filePath), EndSongCleanedPq.class), filePath); >>>>> >>>>> >>>>> On

Re: streaming hdfs sub folders

2016-02-19 Thread Robert Metzger

usually it gets the data from >>>>> kafka. It's an anomaly detection program that learns from the stream >>>>> itself. The reason I want to read from files is to test different settings >>>>> of the algorithm and compare them.

Re: streaming hdfs sub folders

2016-02-18 Thread Martin Neumann

I don't need to reply things in the exact order (wich is not >>>> possible with parallel reads anyway) and I have written the program so it >>>> can deal with out of order events. >>>> I only need the subfolders to be processed roughly in order. Its fine &g

Re: streaming hdfs sub folders

2016-02-18 Thread Martin Neumann

exact order (wich is not >>> possible with parallel reads anyway) and I have written the program so it >>> can deal with out of order events. >>> I only need the subfolders to be processed roughly in order. Its fine to >>> process some stuff from 01 before everythin

Re: streaming hdfs sub folders

2016-02-18 Thread Stephan Ewen

ssed roughly in order. Its fine to >> process some stuff from 01 before everything from 00 is finished, if I get >> records from all 24 subfolders at the same time things will break though. >> If I set the flag will it try to get data from all sub dir's in parallel or

Re: streaming hdfs sub folders

2016-02-17 Thread Martin Neumann

process some stuff from 01 before everything from 00 is finished, if I get > records from all 24 subfolders at the same time things will break though. > If I set the flag will it try to get data from all sub dir's in parallel or > will it go sub dir by sub dir? > > Also can you poi

Re: streaming hdfs sub folders

2016-02-17 Thread Martin Neumann

rom all 24 subfolders at the same time things will break though. If I set the flag will it try to get data from all sub dir's in parallel or will it go sub dir by sub dir? Also can you point me to some documentation or something where I can see how to set the Flag? cheers Martin On Wed, Feb

Re: streaming hdfs sub folders

2016-02-17 Thread Stephan Ewen

Hi! Going through nested folders is pretty simple, there is a flag on the FileInputFormat that makes sure those are read. Tricky is the part that all "00" files should be read before the "01" files. If you still want parallel reads, that means you need to sync at some point, wait for all parallel

streaming hdfs sub folders

2016-02-16 Thread Martin Neumann

Hi, I have a streaming machine learning job that usually runs with input from kafka. To tweak the models I need to run on some old data from HDFS. Unfortunately the data on HDFS is spread out over several subfolders. Basically I have a datum with one subfolder for each hour within those are the a

Re: Flink Streaming and Google Cloud Pub/Sub?

2015-09-15 Thread Robert Metzger

Hey Martin, I don't think anybody used Google Cloud Pub/Sub with Flink yet. There are no tutorials for implementing streaming sources and sinks, but Flink has a few connectors that you can use as a reference. For the sources, you basically have to extend RichSourceFunctio

Flink Streaming and Google Cloud Pub/Sub?

2015-09-14 Thread Martin Neumann

Hej, Has anyone tried use connect Flink Streaming to Google Cloud Pub/Sub and has a code example for me? If I have to implement my own sources and sinks are there any good tutorials for that? cheers Martin

48 matches

Mail list logo