Re: Running a performance benchmark load test of Flink on new CPUs

2021-11-08 Thread Vijay Balakrishnan
hat could be useful >> to run on different CPUs. >> >> Hope those help, >> Austin >> >> [1]: https://github.com/knaufk/flink-faker >> [2]: https://github.com/apache/flink-benchmarks >> >> On Fri, Nov 5, 2021 at 5:14 PM Vijay Balakrishnan >> wrote: &g

Re: Running a performance benchmark load test of Flink on new CPUs

2021-11-08 Thread Vijay Balakrishnan
m/knaufk/flink-faker > [2]: https://github.com/apache/flink-benchmarks > > On Fri, Nov 5, 2021 at 5:14 PM Vijay Balakrishnan > wrote: > >> Hi, >> I am a newbie to running a performance benchmark load test of Flink on >> new CPUs. >> Is there an* existing w

Running a performance benchmark load test of Flink on new CPUs

2021-11-05 Thread Vijay Balakrishnan
Hi, I am a newbie to running a performance benchmark load test of Flink on new CPUs. Is there an* existing workload generator* that I can use with Kafka and then ingest it with Flink KafkaConnector & test the performance against various new chips on servers ? Measuring CPU performance etc, vCPU us

ConnectionPool to DB and parallelism of operator question

2020-10-05 Thread Vijay Balakrishnan
HI, Basic question on parallelism of operators and ConnectionPool to DB: Will this result in 82 * 300 connections to InfluxDB or just 300 connections to InfluxDB ? main() { sink = createInfluxMonitoringSink(..); keyStream.addSink(sink).addParallelism(82);//will this result in 82 * 300 conne

Get only the 1st gz file from an s3 folder

2020-09-14 Thread Vijay Balakrishnan
Hi, Able to read *.gz files from an s3 folder. I want to *get the 1st gz file* from the s3 folder and then sort only the 1st gz file into an Ordered Map as below and get the orderedMap.*getFirstKey() as a 1st event timestamp*. I want to then *pass this 1st event timestamp to all TaskManagers along

Re: Struggling with reading the file from s3 as Source

2020-09-14 Thread Vijay Balakrishnan
in IntelliJ IDEA On Mon, Sep 14, 2020 at 12:13 PM Vijay Balakrishnan wrote: > Hi Robert, > Thanks for the link. > Is there a simple example I can use as a starting template for using S3 > with pom.xml ? > > I copied the flink-s3-fs-hadoop-1.11.1.jar into the plugins/s3-fs-

Re: Struggling with reading the file from s3 as Source

2020-09-14 Thread Vijay Balakrishnan
ger wrote: > Hi Vijay, > > Can you post the error you are referring to? > Did you properly set up an s3 plugin ( > https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/) ? > > On Fri, Sep 11, 2020 at 8:42 AM Vijay Balakrishnan > wrote: > >> Hi, >&g

Struggling with reading the file from s3 as Source

2020-09-10 Thread Vijay Balakrishnan
Hi, I want to *get data from S3 and process and send to Kinesis.* 1. Get gzip files from an s3 folder(s3://bucket/prefix) 2. Sort each file 3. Do some map/processing on each record in the file 4. send to Kinesis Idea is: env.readTextFile(s3Folder) .sort(SortFunction) .map(MapFunction) .sink(Kines

Re: Count of records coming into the Flink App from KDS and at each step through Execution Graph

2020-07-30 Thread Vijay Balakrishnan
t 5 seconds, then new Meterview(5) should do > the trick. > If you want to have a rate-per-5-seconds, then you will need to implement > a custom meter. Note that I would generally discourage this as it will not > work properly with some metric systems which assume rates to be per-second. &g

Count of records in the Stream for a time window of 5s

2020-07-30 Thread Vijay Balakrishnan
Hi, Trying to get a count of records in the Stream for a time window of 5s. Always getting a count of 1 ?? Sent in 10 records.Expect the count to be 10 at the end. Tried to follow the advise here from Fabian Hueske- https://stackoverflow.com/questions/45606999/how-to-count-the-number-of-records-p

Re: Count of records coming into the Flink App from KDS and at each step through Execution Graph

2020-07-27 Thread Vijay Balakrishnan
I might not know all custom event_names in advance .counter("myCounter"); Pardon my confusion here. TIA, On Mon, Jul 27, 2020 at 10:00 AM Vijay Balakrishnan wrote: > Hi David, > Thanks for your reply. > I am already using the PrometheusReporter. I am trying to figure out how

Re: Count of records coming into the Flink App from KDS and at each step through Execution Graph

2020-07-27 Thread Vijay Balakrishnan
Maximilian Bode [2]. > > Best, > David > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter > [2] > https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html > > On Fri, Jul 24, 2020 at 12:57 AM Vijay Balakrishnan

Re: FlinkKinesisProducer blocking ?

2020-07-23 Thread Vijay Balakrishnan
s that you are experiencing, > is that the FlinkKinesisProducer needs to flush all pending records in the > buffer before the checkpoint can complete for the sink. > That would also apply backpressure upstream. > > Gordon > > On Fri, Jul 10, 2020 at 7:02 AM Vijay Balakrishnan &g

Re: MaxConnections understanding on FlinkKinesisProducer via KPL

2020-07-22 Thread Vijay Balakrishnan
> FlinkKinesisProducer would need to flush pending records on checkpoints > (which ultimately also applies backpressure upstream). > > BR, > Gordon > > On Wed, Jul 22, 2020 at 5:21 AM Vijay Balakrishnan > wrote: > >> Hi, >> Trying to tune

MaxConnections understanding on FlinkKinesisProducer via KPL

2020-07-21 Thread Vijay Balakrishnan
Hi, Trying to tune the KPL and FlinkKinesisProducer for Kinesis Data stream(KDS). Getting following errors: 1. Throttling at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727) org.apache.flink.streaming.runtime.tasks.OperatorCha

Re: FlinkKinesisProducer blocking ?

2020-07-09 Thread Vijay Balakrishnan
nodes in my case but occupying 80 slots/vCPUs. Is my understanding correct and will this be the reason that the KPL gets flooded with too many pending requests at regular intervals ?? TIA, On Thu, Jul 9, 2020 at 12:15 PM Vijay Balakrishnan wrote: > Thanks,Gordon for your reply. > > I do

Re: FlinkKinesisProducer blocking ?

2020-07-09 Thread Vijay Balakrishnan
Thanks,Gordon for your reply. I do not set a queueLimit and so the default unbounded queueSize is 2147483647. So, it should just be dropping records being produced from the 80(parallelism) * 10 (ThreadPoolSize) = 800 threads based on Recordttl. I do not want backpressure as you said it effectively

Validating my understanding of SHARD_DISCOVERY_INTERVAL_MILLIS

2020-07-09 Thread Vijay Balakrishnan
Hi, I see these 2 constants- SHARD_GETRECORDS_INTERVAL_MILLIS & SHARD_DISCOVERY_INTERVAL_MILLIS. My understanding was SHARD_GETRECORDS_INTERVAL_MILLIS defines how often records are fetched from Kinesis Data Stream(KDS). Code seems to be doing this in ShardConsumer.run()-->getRecords() SHARD_DISCO

FlinkKinesisProducer blocking ?

2020-07-07 Thread Vijay Balakrishnan
Hi, current setup. Kinesis stream 1 -> Kinesis Analytics Flink -> Kinesis stream 2 | > Firehose Delivery stream Curl eror: org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.LogInputStreamReader - [2020-07-02 15:22:32.203053] [0x07f4][0x7ffbced15700] [err

Re: Insufficient number of network buffers- what does Total mean on the Flink Dashboard

2020-06-12 Thread Vijay Balakrishnan
o 1/4 of that is > much closer to 29GB if we consider there are some rounding errors and > accuracy loss. > > > Thank you~ > > Xintong Song > > > > On Fri, Jun 12, 2020 at 4:33 PM Vijay Balakrishnan > wrote: > >> Thx, Xintong for a great answer. Much apprec

Re: Insufficient number of network buffers- what does Total mean on the Flink Dashboard

2020-06-12 Thread Vijay Balakrishnan
l - Max(cutoff-min, total * cutoff-ratio)) * >(1 - networkFraction) = (102GB - Max(600MB, 102GB * 0.25)) * (1 - 0.48) = >40.6GB > > Have you specified a custom "-Xmx" parameter? > > Thank you~ > > Xintong Song > > > > On Fri, Jun 12, 2020 at 7:50 AM Vij

Re: FlinkKinesisProducer sends data to only 1 shard. Is it because I don't have "AggregationEnabled" set to false ?

2020-06-05 Thread Vijay Balakrishnan
t 6:43 PM Vijay Balakrishnan wrote: > Hi, > Looks like I am sending a Map to Kinesis and it is being > sent to 1 partition only. *How can I make this distribute across multiple > partitions/shards on the Kinesis Data stream with this Map* > data ? > > *Sending t

Re: FlinkKinesisProducer sends data to only 1 shard. Is it because I don't have "AggregationEnabled" set to false ?

2020-06-04 Thread Vijay Balakrishnan
ls.EVENT_TIMESTAMP, influxDBPoint.getTimestamp()); mapObj.put(Utils.MEASUREMENT, influxDBPoint.getMeasurement()); mapObj.put(Utils.TAGS, influxDBPoint.getTags()); mapObj.put(Utils.FIELDS, influxDBPoint.getFields()); TIA, On Thu, Jun 4, 2020 at 5:35 PM Vijay Balakrishnan wrote: > Hi, > My FlinkKines

FlinkKinesisProducer sends data to only 1 shard. Is it because I don't have "AggregationEnabled" set to false ?

2020-06-04 Thread Vijay Balakrishnan
Hi, My FlinkKinesisProducer sends data to only 1 shard. Is it because I don't have "AggregationEnabled" set to false ? flink_connector_kinesis_2.11 : flink version 1.9.1 //Setup Kinesis Producer Properties kinesisProducerConfig = new Properties(); kinesisProducerConfig.setProperty

Re: Flink Dashboard UI Tasks hard limit

2020-05-29 Thread Vijay Balakrishnan
(4gb). That means increasing the "max" > does not help in your case. It is the "fraction" that you need to increase. > > Thank you~ > > Xintong Song > > > > On Thu, May 28, 2020 at 9:30 AM Vijay Balakrishnan > wrote: >

Re: Flink Dashboard UI Tasks hard limit

2020-05-27 Thread Vijay Balakrishnan
. >>> - Use `SingleOutputStreamOperator#setParallelism()` to set >>> parallelism for a specific operator. (Only supported for subclasses of >>> `SingleOutputStreamOperator`.) >>>- When submitting your job, use `-p ` as an argument >>>

Re: Flink Dashboard UI Tasks hard limit

2020-05-22 Thread Vijay Balakrishnan
formation about your use case? > >- What kind of job are your executing? Is it a streaming or batch >processing job? >- Which Flink deployment do you use? Standalone? Yarn? >- It would be helpful if you can share the Flink logs. > > > Thank you~ > > Xintong Song

Re: Flink Dashboard UI Tasks hard limit

2020-05-20 Thread Vijay Balakrishnan
tasks equate to number of open files. I am using 15 slots per TaskManager on AWS m5.4xlarge which has 16 vCPUs. TIA. On Tue, May 19, 2020 at 3:22 PM Vijay Balakrishnan wrote: > Hi, > > Flink Dashboard UI seems to show tasks having a hard limit for Tasks > column around 18000 on a Ubun

Pre-process data before it hits the Source

2019-11-25 Thread Vijay Balakrishnan
Hi, Need to pre-process data(transform incoming data to a different format) before it hits the Source I have defined. How can I do that ? I tried to use a .map on the DataStream but that is too late as the data has already hit the Source I defined. FlinkKinesisConsumer> kinesisConsumer = getMonito

currentWatermark for Event Time is not increasing fast enough to go past the window.maxTimestamp

2019-10-17 Thread Vijay Balakrishnan
Hi, *Event Time Window: 15s* My currentWatermark for Event Time processing is not increasing fast enough to go past the window maxTimestamp. I have reduced *bound* used for watermark calculation to just *10 ms*. I have increased the parallelInput to process input from Kinesis in parallel to 2 slots

Re: add() method of AggregateFunction not called even though new watermark is emitted

2019-10-15 Thread Vijay Balakrishnan
Object groupByValueObj = inputMap.get(groupBy); return groupByValueObj != null; });*/ //String metric = Objects.requireNonNull(inputMetricSelector).getMetric(); TIA, Vijay On Tue, Oct 15, 2019 at 9:34 AM Vijay Balakrishnan wrote: > Hi Theo, > It gets to the FilterFunction during th

Re: add() method of AggregateFunction not called even though new watermark is emitted

2019-10-15 Thread Vijay Balakrishnan
ut > all events? Or is not even the filter function itself called? (Due to your > comments suggesting it). > > Best regards > Theo > > -- > *Von: *"Vijay Balakrishnan" > *An: *"Dawid Wysakowicz" > *CC: *"user"

Re: add() method of AggregateFunction not called even though new watermark is emitted

2019-10-14 Thread Vijay Balakrishnan
e upstream operators? The watermark > for a particular operator is a minimum of watermarks received from all of > the upstream operators. Therefore if some of them does not produce any, the > resulting watermark will not advance. > > Best, > > Dawdi > On 11/10/2019 21:

add() method of AggregateFunction not called even though new watermark is emitted

2019-10-11 Thread Vijay Balakrishnan
Hi, Here is my issue with *Event Processing* with the *add() method of MGroupingWindowAggregate not being called* even though a new watermark is fired 1. *Ingest data from Kinesis (works fine)* 2. *Deserialize* in MonitoringMapKinesisSchema(*works fine* and get json back) 3. I do *assign Monitoring

Re: Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-08-01 Thread Vijay Balakrishnan
fic reason. > I am using it because I am computing the HyperLogLog over a window. > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Mon, Jul 1, 2019 at 12:3

Re: Converting Metrics from a Reporter to a Custom Events mapping

2019-08-01 Thread Vijay Balakrishnan
nd sends them > directly to a Kinesis Stream? > > Best, > Haibo > > At 2019-07-16 00:01:36, "Vijay Balakrishnan" > wrote: > > Hi, > I need to capture the Metrics sent from a Flink app to a Reporter and > transform them to an Events API format I have des

Converting Metrics from a Reporter to a Custom Events mapping

2019-07-15 Thread Vijay Balakrishnan
Hi, I need to capture the Metrics sent from a Flink app to a Reporter and transform them to an Events API format I have designed. I have been looking at the Reporters( https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#list-of-all-variables) and have used them but what w

Re: Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-06-30 Thread Vijay Balakrishnan
i Aroch wrote: > >> Hi Vijay, >> >> When using windows, you may use the 'trigger' to set a Custom Trigger >> which would trigger your *ProcessWindowFunction* accordingly. >> >> In your case, you would probably use: >> >>> *.trigger(

Re: Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-06-17 Thread Vijay Balakrishnan
apKeyStateDescriptor); >globalGroupKeyState = > context.globalState().getMapState(globalMapKeyStateDescriptor); > ... > //get data fromm state > Object timedGroupStateObj = timedGroupKeyState.get(groupKey); > > //how do i push the data out every 5 mins to the sink during

Re: NotSerializableException: DropwizardHistogramWrapper inside AggregateFunction

2019-06-17 Thread Vijay Balakrishnan
1] > https://howtodoinjava.com/java/serialization/custom-serialization-readobject-writeobject/ > > Am Do., 6. Juni 2019 um 23:04 Uhr schrieb Vijay Balakrishnan < > bvija...@gmail.com>: > >> HI, >> I have a class defined : >> >> public class MGroupingW

Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-06-17 Thread Vijay Balakrishnan
Hi, Need to calculate a 4 hour time window for count, sum with current calculated results being output every 5 mins. How do i do that ? Currently, I calculate results for 5 sec and 5 min time windows fine on the KeyedStream. Time timeWindow = getTimeWindowFromInterval(interval);//eg: timeWindow =

NotSerializableException: DropwizardHistogramWrapper inside AggregateFunction

2019-06-06 Thread Vijay Balakrishnan
HI, I have a class defined : public class MGroupingWindowAggregate implements AggregateFunction.. { > private final Map keyHistMap = new TreeMap<>(); > } > In the constructor, I initialize it. > public MGroupingWindowAggregate() { > Histogram minHist = new Histogram(new > SlidingTimeWindowReservo

FlinkKinesisConsumer not getting data from Kinesis at a constant speed -lag of about 30-55 secs

2019-05-17 Thread Vijay Balakrishnan
Hi, In using FlinkKinesisConsumer, I am seeing a lag of about 30-55 secs in fetching data from Kinesis after it has done 1 or 2 fetches even though data is getting put in the Kinesis data stream at a high clip. I used ConsumerConfigConstants.SHARD_GETRECORDS_MAX of 1 (tried with 5000, 200 etc)

Re: Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-05-09 Thread Vijay Balakrishnan
return count; } } TIA, On Wed, May 1, 2019 at 1:39 PM Vijay Balakrishnan wrote: > Hi, > Had asked this questions earlier as topic - "Flink - Type Erasure > Exception trying to use Tuple6 instead of Tuple" > > Having issues defining a generic Tupl

Re: Type Erasure - Usage of class Tuple as a type is not allowed. Use a concrete subclass (e.g. Tuple1, Tuple2, etc.

2019-05-09 Thread Vijay Balakrishnan
understand it, what do you believe to be missing? > > If, for a given job, the number/types of fields are fixed you could look > into using Row. > > On 01/05/2019 22:40, Vijay Balakrishnan wrote: > > Hi, > Had asked this questions earlier as topic - "Flink - Type Erasure &

Type Erasure - Usage of class Tuple as a type is not allowed. Use a concrete subclass (e.g. Tuple1, Tuple2, etc.

2019-05-01 Thread Vijay Balakrishnan
Hi, Had asked this questions earlier as topic - "Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple" Having issues defining a generic Tuple instead of a specific Tuple1,Tuple2 etc. Exception in thread "main" org.apache.flink.api.common.functions.InvalidTypesException: Usage of cl

Re: Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-05-01 Thread Vijay Balakrishnan
it. >> >> Tim >> >> >> On Fri, Apr 5, 2019 at 7:53 AM Chesnay Schepler >> wrote: >> >>> > I tried using [ keyBy(KeySelector, TypeInformation) ] >>> >>> What was the result of this approach? >>> >>> On 03

Re: Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-04-11 Thread Vijay Balakrishnan
d if it doesn't apply just ignore it. >> >> Tim >> >> >> On Fri, Apr 5, 2019 at 7:53 AM Chesnay Schepler >> wrote: >> >>> > I tried using [ keyBy(KeySelector, TypeInformation) ] >>> >>> What was

Re: Timestamp Watermark Assigner bpund question

2019-04-10 Thread Vijay Balakrishnan
> 2. > https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_timestamps_watermarks.html#timestamp-assigners--watermark-generators > Best, > Guowei > > > Vijay Balakrishnan 于2019年4月10日周三 上午7:41写道: > >> Hi, >> I have created a TimestampAssigner as follows.

Timestamp Watermark Assigner bpund question

2019-04-09 Thread Vijay Balakrishnan
Hi, I have created a TimestampAssigner as follows. I want to use monitoring.getEventTimestamp() with an Event Time processing and collected aggregated stats over time window intervals of 5 secs, 5 mins etc. Is this the right way to create the TimeWaterMarkAssigner with a bound ? I want to collect t

Re: Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-04-03 Thread Vijay Balakrishnan
stions/50945509/apache-flink-return-type-of-function-could-not-be-determined-automatically-due/50947554 > > Tim > > On Tue, Apr 2, 2019, 7:03 PM Vijay Balakrishnan > wrote: > >> Hi, >> I am trying to use the KeyedStream with Tuple to handle diffrent ty

Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-04-02 Thread Vijay Balakrishnan
Hi, I am trying to use the KeyedStream with Tuple to handle diffrent types of Tuples including Tuple6. Keep getting the Exception: *Exception in thread "main" org.apache.flink.api.common.functions.InvalidTypesException: Usage of class Tuple as a type is not allowed. Use a concrete subclass (e.g. Tu

Re: Watermark not firing to push data

2018-12-18 Thread Vijay Balakrishnan
esult.CONTINUE; } } On Mon, Dec 17, 2018 at 10:00 AM Vijay Balakrishnan wrote: > Hi, > Thx for your reply and pointers on the currentLowWatermark. Looks like the > Flink UI has tab for Watermarks itself for an Operator. > > I dump 5 records into the Kinesis Data Stream and am try

Re: Watermark not firing to push data

2018-12-17 Thread Vijay Balakrishnan
er way to verify if it > is a watermark problem. > > Best, Hequn > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/debugging_event_time.html > > > On Sat, Dec 15, 2018 at 12:59 AM Vijay Balakrishnan > wrote: > >> Hi, >> Observa

Watermark not firing to push data

2018-12-14 Thread Vijay Balakrishnan
Hi, Observations on Watermarks: Read this great article: https://data-artisans.com/blog/watermarks-in-apache-flink-made-easy * Watermark means when for any event TS, when to stop waiting for arrival of earlier events. * Watermark t means all events with Timestamp < t have already arrived. * When t

Re: Encountered the following Expired Iterator exception in getRecords using FlinkKinesisConsumer

2018-12-14 Thread Vijay Balakrishnan
; I’m suspecting that this is the issue: > https://issues.apache.org/jira/browse/FLINK-11164. > > One more thing to clarify to be sure of this: > Do you have multiple shards in the Kinesis stream, and if yes, are some of > them actually empty? > Meaning that, even though you menti

Re: Encountered the following Expired Iterator exception in getRecords using FlinkKinesisConsumer

2018-12-13 Thread Vijay Balakrishnan
records were written to the Kinesis stream at all. > 3. After a period of time, you received the “Encountered an unexpected > expired iterator” warning in the logs, and the job failed with the > misleading AmazonKinesisException? > > Cheers, > Gordon > > On 13 December 2018 at

Encountered the following Expired Iterator exception in getRecords using FlinkKinesisConsumer

2018-12-12 Thread Vijay Balakrishnan
Hi, Using FlinkKinesisConsumer in a long running Flink Streaming app consuming from a Kinesis Stream. Encountered the following Expired Iterator exception in getRecords(): org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer [] - Encountered an unexpected expired iterator The err

Re: Using FlinkKinesisConsumer through a proxy

2018-11-30 Thread Vijay Balakrishnan
gt; https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/section-client-configuration.html > > > On 7 November 2018 at 1:19:02 AM, Vijay Balakrishnan (bvija...@gmail.com) > wrote: > > Hi Gordon, > This still didn't work :( > > Tried a few co

Re: Never gets into ProcessWindowFunction.process()

2018-11-09 Thread Vijay Balakrishnan
v/connectors/kinesis.html#event-time-for-consumed-records > > On Fri, Nov 9, 2018 at 6:58 PM Vijay Balakrishnan > wrote: > >> Hi, >> Any help is appreciated.Dug into this. *I can see the deserialized >> output log from FlinkKinesisConsumer deserialization but it ke

Re: Never gets into ProcessWindowFunction.process()

2018-11-09 Thread Vijay Balakrishnan
Collector out) throws Exception { logger.debug("@@never gets here@@Window5SecProcessing - Entered process ");// ... } On Mon, Nov 5, 2018 at 4:10 PM Vijay Balakrishnan wrote: > Hi, > Running in IntelliJ IDE on a Mac with 4 vProcessors. > Code compiles fine. It never gets into

Re: Never gets into ProcessWindowFunction.process()

2018-11-09 Thread Vijay Balakrishnan
you would need to > provide > an executable example. The log only shows that all offered slots are > occupied > by tasks of your job. > > Best, > Gary > > On Tue, Nov 6, 2018 at 1:10 AM Vijay Balakrishnan > wrote: > >> Hi, >> Running in IntelliJ IDE on

Re: Using FlinkKinesisConsumer through a proxy

2018-11-06 Thread Vijay Balakrishnan
eers, > Gordon > > [1] https://issues.apache.org/jira/browse/FLINK-9188 > [2] https://issues.apache.org/jira/browse/FLINK-10492 > > On Thu, Oct 4, 2018, 7:57 PM Aljoscha Krettek wrote: > >> Hi, >> >> I'm looping in Gordon and Thomas, they might have some

Re: Parallelize an incoming stream into 5 streams with the same data

2018-11-06 Thread Vijay Balakrishnan
5))) >> .(); > > then, you can perform a windowAll after the TumblingEventTimeWindow to get > the final total count. > > Best, > Hequn > > > > On Fri, Nov 2, 2018 at 6:20 AM Vijay Balakrishnan > wrote: > >> Thanks,Hequn. >> If I have to do

Never gets into ProcessWindowFunction.process()

2018-11-05 Thread Vijay Balakrishnan
Hi, Running in IntelliJ IDE on a Mac with 4 vProcessors. Code compiles fine. It never gets into the Window5SecProcessing's process().I am able to get data from the Kinesis Consumer and it is deserialized properly when I debug the code. It gets into the Window5SecProcessing.open() method for initial

Re: Parallelize an incoming stream into 5 streams with the same data

2018-11-01 Thread Vijay Balakrishnan
cate all data to each task and option 3 split data > into smaller groups without duplication. > > Best, Hequn > > On Fri, Oct 26, 2018 at 7:01 AM Vijay Balakrishnan > wrote: > >> Hi, >> I need to broadcast/parallelize an incoming stream(inputStream) into 5 >> str

Parallelize an incoming stream into 5 streams with the same data

2018-10-25 Thread Vijay Balakrishnan
Hi, I need to broadcast/parallelize an incoming stream(inputStream) into 5 streams with the same data. Each stream is keyed by different keys to do various grouping operations on the set. Do I just use inputStream.keyBy(5 diff keys) and then just use the DataStream to perform windowing/grouping op

Guava conflict when using flink kinesis consumer with grpc protobuf

2018-10-24 Thread Vijay Balakrishnan
Hi, I have a dependency on guava in grpc protobuf as follows: com.google.guava guava 26.0-jre I also use Flink Kinesis Connector in the same project: org.apache.flink flink-connector-kinesis_${scala.binary.version} ${flink.version} This Flink Kinesis connector has a dep

Re: Using FlinkKinesisConsumer through a proxy

2018-10-03 Thread Vijay Balakrishnan
the code in com.amazonaws.ClientConfiguration On Tue, Oct 2, 2018 at 3:49 PM Vijay Balakrishnan wrote: > HI, > How do I use FlinkKinesisConsumer using the Properties through a proxy ? > Getting a Connection issue through the proxy. > Works outside the proxy. > > Properties kin

Using FlinkKinesisConsumer through a proxy

2018-10-02 Thread Vijay Balakrishnan
HI, How do I use FlinkKinesisConsumer using the Properties through a proxy ? Getting a Connection issue through the proxy. Works outside the proxy. Properties kinesisConsumerConfig = new Properties(); kinesisConsumerConfig.setProperty(AWSConfigConstants.AWS_REGION, region); if (lo

Re: AsyncFunction used twice with Asyncdatastream.unorderedWait - 2nd function's asyncInvoke not getting called

2018-07-27 Thread Vijay Balakrishnan
ameraWithCube cameraWithCube) throws Exception { ;}});*/* Vijay On Thu, Jul 26, 2018 at 10:39 PM Vijay Balakrishnan wrote: > Hi, > > I have 2 AsyncFunctions SampleCopyAsyncFunction and > SampleSinkAsyncFunction called with AsyncDataStream.unorderedWait. The 1st > AsyncDataStream.unord

AsyncFunction used twice with Asyncdatastream.unorderedWait - 2nd function's asyncInvoke not getting called

2018-07-26 Thread Vijay Balakrishnan
Hi, I have 2 AsyncFunctions SampleCopyAsyncFunction and SampleSinkAsyncFunction called with AsyncDataStream.unorderedWait. The 1st AsyncDataStream.unorderedWait’s SampleCopyAsyncFunction .asyncInvoke gets called properly but the 2nd SampleSinkAsyncFunction.asyncInvoke never gets called(though open

Re: How to partition within same physical node in Flink

2018-06-26 Thread Vijay Balakrishnan
al nodes). > > Btw. why is it important that all records of the same cam are processed by > the same physical node? > > Fabian > > 2018-06-25 21:36 GMT+02:00 Vijay Balakrishnan : > >> I see a .slotSharingGroup for SingleOutputStreamOperator >> <https://ci.apache

Re: How to partition within same physical node in Flink

2018-06-25 Thread Vijay Balakrishnan
Mon, Jun 25, 2018 at 10:10 AM Vijay Balakrishnan wrote: > Thanks, Fabian. > Been reading your excellent book on Flink Streaming.Can't wait for more > chapters. > Attached a pic. > > [image: partition-by-cam-ts.jpg] > > I have records with seq# 1 and cam1 and cam2.

Re: How to partition within same physical node in Flink

2018-06-25 Thread Vijay Balakrishnan
s in two different operators are > send to the same slot. > Sharing information by side-passing it (e.g., via a file on a machine or > in a static object) is an anti-pattern and should be avoided. > > Best, Fabian > > 2018-06-24 20:52 GMT+02:00 Vijay Balakrishnan : > >> Hi

How to partition within same physical node in Flink

2018-06-24 Thread Vijay Balakrishnan
Hi, Need to partition by cameraWithCube.getCam() 1st using parallelCamTasks(passed in as args). Then within each partition, need to partition again by cameraWithCube.getTs() but need to make sure each of the 2nd partition by getTS() runs on the same physical node ? How do I achieve that ? DataS

Docker Containers on YARN when using Flink on EMR

2018-06-17 Thread Vijay Balakrishnan
Hi, Trying to use Docker Containers to be launched from YARN when using Flink on EMR on Ubuntu. Can't seem to launch a Docker Container from YARN Resource Manager while starting up the ./flink-yarn-session or Submitting a Flink job ./bin/flink run ... Following the docs here: https://ci.apache.org/

Using Flink RocksDBStateBackend with EFS as an NFS mount for large image data

2018-06-02 Thread Vijay Balakrishnan
Hi, We have big image data(about 20 MB each) coming in at high frequency/volume from a video stream from many cameras. The current design thought is to store this data in the 1st step of the Flink Dataflow in EFS(NAS) and access the EFS data from the 3rd step in the dataflow(may be in a totally d

Re: Recovering from 1 of the nodes/slots of a Task Manager failing without resetting entire state during Recovery

2018-05-29 Thread Vijay Balakrishnan
me the IndividualStrategy (entire job consists unconnected tasks) & PipelinedRegionStrategy( weakly connected component of tasks that communicate via pipelined data exchange) with an example ? TIA, Vijay On Tue, May 15, 2018 at 3:16 PM Vijay Balakrishnan wrote: > Hi, > I have been goi

Re: How to sleep for 1 sec and then call keyBy for partitioning

2018-05-16 Thread Vijay Balakrishnan
018 at 1:41 PM Jörn Franke wrote: > Just some advice - do not use sleep to simulate a heavy task. Use real > data or generated data to simulate. This sleep is garbage from a software > quality point of view. Furthermore, it is often forgotten etc. > > On 16. May 2018, at 22:32, Vi

How to sleep for 1 sec and then call keyBy for partitioning

2018-05-16 Thread Vijay Balakrishnan
Hi, Newbie question - What I am trying to do is the following: CameraWithCubeSource source sends data-containing tuples of (cameraNbr,TS). 1. Need to partition data by cameraNbr. *2. Then sleep for 1 sec to simulate a heavy process in the task.* *3. Then need to partition data by TS and finally get

Recovering from 1 of the nodes/slots of a Task Manager failing without resetting entire state during Recovery

2018-05-15 Thread Vijay Balakrishnan
Hi, I have been going through the book "Real time streaming with Apache Flink". How do I recover state for just a single node/slot in a TaskManager without having the recovery reset the application state for all the Task Managers ? They mention the following: *Reset the state of the whole applicat