subject:"Batch Processing"

Spark multiple iterations in batch processing

2022-12-23 Thread Suparn Lele (sulele)

Hi, Basically I am running a flink batch job. My requirement is following I have 10 tables having raw data in postgresql I want to aggregate that data by creating a tumble window of 10 minutes I need to store the aggregated data into aggregated postgresql tables My pseudo code somewhat looks l

Re: Flink Batch Processing

2020-09-29 Thread Timo Walther

like to use CEP API, you can use Table API (StreamTableEnvironment) to read from Hbase and call `toAppendStream` directly afterwards to further process in DataStream API. This works also for bounded streams thus you can do "batch" processing. Regards, Timo On 29.09.20 09:56, Til

Re: Flink Batch Processing

2020-09-29 Thread Till Rohrmann

> Piotrek > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 > > pon., 28 wrz 2020 o 15:14 s_penakalap...@yahoo.com < > s_penakalap...@yahoo.com> napisał(a): > > Hi All, > > Need your help in Flink Batch processing: scenario described

Re: Flink Batch Processing

2020-09-29 Thread s_penakalap...@yahoo.com

could help here. Piotrek [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 pon., 28 wrz 2020 o 15:14 s_penakalap...@yahoo.com napisał(a): Hi All, Need your help in Flink Batch processing: scenario described below: we have multiple vehicles, we get data from each vehicl

Re: Flink Batch Processing

2020-09-28 Thread Piotr Nowojski

66741 pon., 28 wrz 2020 o 15:14 s_penakalap...@yahoo.com napisał(a): > Hi All, > > Need your help in Flink Batch processing: scenario described below: > > we have multiple vehicles, we get data from each vehicle at a very high > speed, 1 record per minute. > thresholds c

Flink Batch Processing

2020-09-28 Thread s_penakalap...@yahoo.com

Hi All, Need your help in Flink Batch processing: scenario described below: we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.thresholds can be set by the owner for each vehicle. Say: we have 3 vehicles, threshold is set for 2 vehicles. Vehicle 1

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards

On Tue, Jul 7, 2020 at 10:53 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey Xiaolong, > > Thanks for the suggestions. Just to make sure I understand, are you saying > to run the download and decompression in the Job Manager before executing > the job? > > I think another way to e

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards

Hey Chesnay, Thanks for the advice, and easy enough to do it in a separate process. Best, Austin On Tue, Jul 7, 2020 at 10:29 AM Chesnay Schepler wrote: > I would probably go with a separate process. > > Downloading the file could work with Flink if it is already present in > some supported fi

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Chesnay Schepler

I would probably go with a separate process. Downloading the file could work with Flink if it is already present in some supported filesystem. Decompressing the file is supported for selected formats (deflate, gzip, bz2, xz), but this seems to be an undocumented feature, so I'm not sure how us

Decompressing Tar Files for Batch Processing

2020-07-06 Thread Austin Cawley-Edwards

Hey all, I need to ingest a tar file containing ~1GB of data in around 10 CSVs. The data is fairly connected and needs some cleaning, which I'd like to do with the Batch Table API + SQL (but have never used before). I've got a small prototype loading the uncompressed CSVs and applying the necessar

Configuration Best Practices for Batch Processing

2019-10-20 Thread Micah Whitacre

I was curious suggested best practices as it relates to running batch processes on Flink. Does anyone have any good guides on good default settings and configuration? One question I'm really curious about is what suggestions there might be for the relationship of memory of TaskManagers? Number of

Batch Processing

2018-07-05 Thread Gaurav Sehgal

Hello, I am looking for batch processing framework which will read data in batches from MongoDb and enrich it using another data source and then upload them in ElasticSearch, is Flink a good framework for such a use case. Regards, Gaurav

Re: Flink batch processing fault tolerance

2017-02-17 Thread Aljoscha Krettek

ready focusing on realizing the ideas mentioned in FLIP1, > wish to contirbute to flink in months. > > Best, > > Zhijiang > > -- > 发件人：Si-li Liu > 发送时间：2017年2月17日(星期五) 11:22 > 收件人：user > 主题：Re: Flink ba

回复：Flink batch processing fault tolerance

2017-02-16 Thread wangzhijiang999

--发件人：Si-li Liu 发送时间：2017年2月17日(星期五) 11:22收件人：user 主　题：Re: Flink batch processing fault tolerance Hi, It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again. 2017-02-17 10:56 GMT+08:00 Renjie Liu : https://cwiki.apache.org/confluence/di

Re: Flink batch processing fault tolerance

2017-02-16 Thread Si-li Liu

a Krettek [mailto:aljos...@apache.org] >> *Sent:* Thursday, February 16, 2017 2:48 PM >> *To:* user@flink.apache.org >> *Subject:* Re: Flink batch processing fault tolerance >> >> >> >> Hi, >> >> yes, this is indeed true. We had some plans for

Re: Flink batch processing fault tolerance

2017-02-16 Thread Renjie Liu

gt; > > > > > *From:* Aljoscha Krettek [mailto:aljos...@apache.org] > *Sent:* Thursday, February 16, 2017 2:48 PM > *To:* user@flink.apache.org > *Subject:* Re: Flink batch processing fault tolerance > > > > Hi, > > yes, this is indeed true. We had some plans fo

RE: Flink batch processing fault tolerance

2017-02-16 Thread Anton Solovev

Hi Aljoscha, Could you share your plans of resolving it? Best, Anton From: Aljoscha Krettek [mailto:aljos...@apache.org] Sent: Thursday, February 16, 2017 2:48 PM To: user@flink.apache.org Subject: Re: Flink batch processing fault tolerance Hi, yes, this is indeed true. We had some plans for

Re: Flink batch processing fault tolerance

2017-02-16 Thread Aljoscha Krettek

Hi, yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API. Best, Aljoscha On Wed, 15 Feb 2017 at 0

Flink batch processing fault tolerance

2017-02-15 Thread Renjie Liu

Hi, all: I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs? -- Liu, Renjie Software Engineer, MVAD

Re: Flink Batch Processing with Kafka

2016-08-03 Thread Prabhu V

If your environment is not kerberized (or if you can offord to restart the job every 7 days), a checkpoint enabled, flink job with windowing and the count trigger, would be ideal for your requirement. Check the api's on flink windows. I had something like this that worked stream.keyBy(0).countW

Flink Batch Processing with Kafka

2016-08-03 Thread Alam, Zeeshan

Hi, Flink works very well with Kafka if you wish to stream data. Following is how I am streaming data with Kafka and Flink. FlinkKafkaConsumer08 kafkaConsumer = new FlinkKafkaConsumer08<>(KAFKA_AVRO_TOPIC, avroSchema, properties); DataStream messageStream = env.addSource(kafkaConsumer); Is th

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread Suneel Marthi

rk and u don't need a separate Batch process. A similar architecture using Spark Streaming (for both batch and streaming) is demonstrated by Cloudera's Oryx 2.0 project - see http://oryx.io On Thu, Jul 21, 2016 at 12:41 PM, milind parikh wrote: > At this point in time, imo

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread milind parikh

At this point in time, imo, batch processing is not why you should be considering Flink. That said, I predict that the stream processing (and event processing) will become the dominant methodology; as we begin to gravitate towards "I can't wait; I want it now" phenomenon. In that

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Leith Mudge

Thanks Milind & Till, This is what I thought from my reading of the documentation but it is nice to have it confirmed by people more knowledgeable. Supplementary to this question is whether Flink is the best choice for batch processing at this point in time or whether I would be better to

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Till Rohrmann

> On Jul 19, 2016 9:37 PM, "Leith Mudge" wrote: > >> I am currently working on an architecture for a big data streaming and >> batch processing platform. I am planning on using Apache Kafka for a >> distributed messaging system to handle data from streaming data sources a

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-19 Thread milind parikh

to enable processing of that file. If you really needed to have provenance around processing, you could route data processing through Nifi before Flink. Regards Milind On Jul 19, 2016 9:37 PM, "Leith Mudge" wrote: > I am currently working on an architecture for a big data streaming

Using Kafka and Flink for batch processing of a batch data source

2016-07-19 Thread Leith Mudge

I am currently working on an architecture for a big data streaming and batch processing platform. I am planning on using Apache Kafka for a distributed messaging system to handle data from streaming data sources and then pass on to Apache Flink for stream processing. I would also like to use

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Robert Metzger

Flink's DataStream API also allows reading files from disk (local, hdfs, etc.). So you don't have to set up Kafka to make this work (If you have it already, you can of course use it). On Mon, Apr 11, 2016 at 11:08 AM, Ufuk Celebi wrote: > On Mon, Apr 11, 2016 at 10:26 AM, Raul Kripalani wrote:

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Ufuk Celebi

On Mon, Apr 11, 2016 at 10:26 AM, Raul Kripalani wrote: > Would appreciate the feedback of the community. Even if it's to inform that > currently this iterative, batch, windowed approach is not possible, that's > ok! Hey Raul! What you describe should work with Flink. This is actually the way to

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Raul Kripalani

t; Basically I have dumps of timeseries data (10y in ticks) which I need to >> calculate many metrics in an exploratory manner based on event time. NOTE: >> I don't have the metrics beforehand, it's gonna be an exploratory and >> iterative data analytics effort. >> &g

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Christophe Salperwyck

trics beforehand, it's gonna be an exploratory and > iterative data analytics effort. > > Flink doesn't seem to support windows on batch processing, so I'm thinking > about emulating batch by using the Kafka stream connector and rewinding the > data stream for every new met

Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Raul Kripalani

don't have the metrics beforehand, it's gonna be an exploratory and iterative data analytics effort. Flink doesn't seem to support windows on batch processing, so I'm thinking about emulating batch by using the Kafka stream connector and rewinding the data stream for every ne

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann

At the moment, the system can only deal with lost slots (nodes) if either there are some excess slots which have not been used before or if the died node is restarted. The latter is the case for yarn applications, for example. There the application master will restart containers which have died. I

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU

Thank you, Till! The current (in progress) implementation is considering also the problem related to losing the task's slots of the failed node(s), something related to [2] ? [2] https://issues.apache.org/jira/browse/FLINK-3047 Best, Ovidiu > On 22 Feb 2016, at 18:13, Till Rohrmann wrote: >

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann

Hi Ovidiu, at the moment Flink's batch fault tolerance restarts the whole job in case of a failure. However, parts of the logic to do partial backtracking such as intermediate result partitions and the backtracking algorithm are already implemented or exist as a PR [1]. So we hope to complete the

Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU

Hi In case of failure of a node what does it mean 'Fault tolerance for programs in the DataSet API works by retrying failed executions’ [1] ? -work already done by the rest of the nodes is not lost, only work of the lost node is recomputed, job execution will continue or -entire job execution is

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-18 Thread Stephan Ewen

, Maximilian Bode < maximilian.b...@tngtech.com> wrote: > Hi Stephan, > > thank you very much for your answer. I was happy to meet Robert in Munich > last week and he proposed that for our problem, batch processing is the way > to go. > > We also talked about how exactly to guaran

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-16 Thread Maximilian Bode

Hi Stephan, thank you very much for your answer. I was happy to meet Robert in Munich last week and he proposed that for our problem, batch processing is the way to go. We also talked about how exactly to guarantee in this context that no data is lost even in the case the job dies while

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-11 Thread Stephan Ewen

using an upsert command. If I understand > the documentation correctly, the DataSet API would be the natural candidate > for this problem. > > My first question is about the checkpointing system. Apparently (e.g. [1] > and [2]) it does not apply to batch processing. So how does Flink

Checkpoints in batch processing & JDBC Output Format

2015-11-09 Thread Maximilian Bode

natural candidate for this problem. My first question is about the checkpointing system. Apparently (e.g. [1] and [2]) it does not apply to batch processing. So how does Flink handle failures during batch processing? For the use case described above, 'at least once' semantics would suffi

Re: Batch Processing as Streaming

2015-07-02 Thread Welly Tambunan

(2) Streaming operators and user functions are long lived. They are > started once and live to the end of the stream, or the machine failure. > > Greetings, > Stephan > > > On Thu, Jul 2, 2015 at 11:48 AM, tambunanw wrote: > >> Hi All, >> >> I see that

Re: Batch Processing as Streaming

2015-07-02 Thread Stephan Ewen

Thu, Jul 2, 2015 at 11:48 AM, tambunanw wrote: > Hi All, > > I see that the way batch processing works in Flink is quite different with > Spark. It's all about using streaming engine in Flink. > > I have a couple of question > > 1. Is there any support on Checkpoin

Batch Processing as Streaming

2015-07-02 Thread tambunanw

Hi All, I see that the way batch processing works in Flink is quite different with Spark. It's all about using streaming engine in Flink. I have a couple of question 1. Is there any support on Checkpointing on batch processing also ? Or that's only for streaming 2. I want to

43 matches

Mail list logo