I am having all the Accumulators store their data on /tmp - as this is the
default.
when running on docker - this is mapped on my VM’s “/” partition.
a lot of accumulatos – cause low disk util => pods are evicted.
Is there a way to set the Accumulators persistence to a different path then
HI , I am trying to do some performance test to my flink deployment.
I am implementing an extremely simplistic use case
I built a ZMQ Source
The topology is ZMQ Source - > Mapper- > DIscardingSInk ( a sink that does
nothing )
Data is pushed via ZMQ at a very high rate.
When the incoming rate f
recordType_
from ParquetTable where id > 22 ");
DataSet result = batchTableEnvironment.toDataSet(tab, Row.class);
result.print();
}
}
From: Peter Huang
Sent: Monday, November 18, 2019 7:22 PM
To: dev
Cc: user@flink.apache.org
Subject: Re: SQL for Avro Gener
I have tried to persist Generic Avro records in a parquet file and then read it
via ParquetTablesource – using SQL.
Seems that the SQL I not executed properly !
The persisted records are :
Id , type
333,Type1
22,Type2
333,Type1
22,Type2
333,Type1
22,Type2
333,Type1
2
This seems like a controversial subject..
on purpose 😊
I have my data lake in parquet files – should I use Flink batch mode to query
historical batch ad Hoc queries ?
or should I use a dedicated “database” eg Drill / Dremio / Hiveand their
likes ?
what advantage will Flink give me f
Thanks Fabian.
is there any advantage using broadcast state VS using just CoMap function on 2
connected streams ?
From: Fabian Hueske
Sent: Thursday, September 5, 2019 12:59 PM
To: Hanan Yehudai
Cc: flink-u...@apache.org
Subject: Re: Join with slow changing dimensions/ streams
Hi,
Flink
I have a very common use case -enriching the stream with some dimension
tables.
e.g the events stream has a SERVER_ID , and another files have the LOCATION
associated with e SERVER_ID. ( a dimension table csv file)
in SQL I would simply join.
but hen using Flink stream API , as far
Im not sure what you mean by use process function and not window process
function , as the window operator takes in a windowprocess function..
From: Fabian Hueske
Sent: Monday, August 26, 2019 1:33 PM
To: Hanan Yehudai
Cc: user@flink.apache.org
Subject: Re: tumbling event time window
WM will be the highest EVENT_TIME on my
set of files..
thanks
From: Fabian Hueske
Sent: Monday, August 26, 2019 12:38 PM
To: Hanan Yehudai
Cc: user@flink.apache.org
Subject: Re: tumbling event time window , parallel
Hi,
The paths of the files to read are distributed across all reader
use a ContinuousEventTimeTrigger to make sure the window is
calculated ? and got the processing to trigger multiple times so I’m not sure
exactly how this type of trigger works..
Thanks
From: Fabian Hueske
Sent: Monday, August 26, 2019 11:06 AM
To: Hanan Yehudai
Cc: user
I have an issue with tumbling windows running in parallel.
I run a Job on a set of CSV files.
When the parallelism is set to 1. I get the proper results.
While it runs in parallel. I get no output.
Is it due to the fact the parallel streams take the MAX(watermark) from all
the parallel sou
It helps ! thank you 😊
From: Aljoscha Krettek
Sent: 20 May 2019 12:45
To: Hanan Yehudai
Cc: user@flink.apache.org
Subject: Re: monitor finished files on a Continues Reader
Hi,
I think what you’re trying to achieve is not possible with the out-of-box file
source. The problem is that it is
Hi
im looking for a way to delete / rename files that are done loading..
im using the env.readFile , monitoring a directory for all new files, once
files are done with I would like to delete it.
Is there a way to monitor the closed splits in the continues reader ? is there
an different way t
13 matches
Mail list logo