Hi Kien,
>From my point of view, RocksDB native metrics could be classified into 5 parts
>below, and you could select what you're interested in to enable. Enable those
>metrics could cause about 10% performance regression, and this might impact
>the overall performance as not all jobs are state
my code is:
https://paste.ubuntu.com/p/gVGrj2V7ZF/
it complains
A group window expects a time attribute for grouping in a stream environment.
but the data already as time attribute,
How to fix it?
Thanks for your help.
The complete code is:
https://paste.ubuntu.com/p/hpWB87kT6P/
The result is:
2> (true,1,diaper,4)
7> (true,3,rubber,2)
4> (true,1,beer,3)
7> (false,3,rubber,2)
7> (true,3,rubber,8)
That's the meaning of true/false in the result
after running the above code?
Thanks for your help~!
just a data point. we actually enabled all RocksDb metrics by default
(including very large jobs in terms of parallelism and state size). We
didn't see any significant performance impact. There is probably a small
impact. At least, it didn't jump out for our workload.
On Tue, Dec 8, 2020 at 9:00 A
https://issues.apache.org/jira/browse/FLINK-20533
There is no workaround in the current Flink releases, but you could
compile the reporter based on the PR that I opened.
On 12/8/2020 10:38 PM, Fanbin Bu wrote:
thank you Chesnay. I did verified that count works with datadog.
Please link here t
Great, thank you very much :)
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
After a bit more playing around with it today, I figured out that what I needed
to call was:
statementSet.execute().getJobClient().get().getJobExecutionResult(getClass().getClassLoader()).get()
The fact that getJobExecutionResult required a classloader is what threw me
off. Since I’m using an a
thank you Chesnay. I did verified that count works with datadog. Please
link here the ticket once you create it. Meanwhile, is there any workaround
for now?
Fanbin
On Tue, Dec 8, 2020 at 2:56 AM Chesnay Schepler wrote:
> It appears that the datadog reporter does not report histograms. I'll file
I need to elaborate on my use case. I would like the SQL api to do
aggregation for me in an SQL TUMBLING window.
But I want the next window to perform business logic on all the
records just aggregated in a DataStream ProcessWindowFunction.
This would be a mix of SQL and DataStream API.
On Tue, De
GIVEN two windows (ProcessWindowFunction), window A, and window B,
AND window A is a tumbling processing time window of 15 minutes
AND 20 records entered window A, and performs its business logic.
How can I assure that Window B will process exactly all the records
that left window A within the
Hi,
This exception looks like it was thrown by a downstream Task/TaskManager
when trying to read a message/packet from some upstream Task/TaskManager
and that connection between two TaskManagers was reseted (closed abruptly).
So it's the case:
> involves communicating with other non-collocated tas
I believe it was solved in 1.11 by FLINK-15911 [1]
I tried setting taskmanager.rpc.port to 1 for 1.12 and got
tcp6 0 0 :::1:::*LISTEN
13768/java
[1]
https://issues.apache.org/jira/browse/FLINK-15911
Regards,
Roman
On Tue, Dec 8, 2020
Hello,
I'd like to better understand delete behavior of AggregateFunctions. Let's
assume there's an aggregate of `user_id` to a set of `group_ids` for groups
belonging to that user.
`user_id_1 -> [group_id_1, group_id_2, etc.]`
Now let's assume sometime later that deletes arrive for all rows which
I set up the following lookup cache values:
'lookup.cache.max-rows' = '20'
'lookup.cache.ttl' = '1min'
for a jdbc connector.
This table currently only has about 2 records in it. However,
since I set the TTL to 1 minute, I expected the job to query that
table every minute.
The documentat
I'm trying to figure out a way to make Flink jobmanager (in HA) connect to
zookeeper over SSL/TLS. It doesn't seem like there are native properties
like Kafka has that support this interaction yet. Is this true or is there
some way that I can go about doing this?
Hello, Piotr.
Thank you.
This is an error logged to the taskmanager just before it became "lost" to
the jobmanager (i.e., reported as "lost" in the jobmanager log just before
the job restart). In what context would this particular error (not the
root-root cause you referred to) be thrown from a t
I've notice that jobmanager ports all listen on all interfaces by default, as
well as data port on the taskmanager.
The only exception is the taskmanager RPC port,
```
bash-4.2$ netstat -lpn | grep 612
tcp0 0 172.20.54.176:6121 0.0.0.0:* LISTEN
54/java
tcp
Hi Kien,
I am pulling in Yun who might know better.
Regards,
Roman
On Sun, Dec 6, 2020 at 3:52 AM Truong Duc Kien
wrote:
> Hi all,
>
> We are thinking about enabling RocksDB metrics to better monitor our
> pipeline. However, since they will have performance impact, we will have to
> be select
Thank you for the clarification.
On Tue, Dec 8, 2020 at 8:14 AM Khachatryan Roman
wrote:
>
> Hi Marco,
>
> Yes, if TTL is not configured then the state will never expire (will stay
> forever until deleted explicitly).
>
> Regards,
> Roman
>
>
> On Tue, Dec 8, 2020 at 5:09 PM Marco Villalobos
>
Thank you very much!
On Tue, Dec 8, 2020 at 8:26 AM Khachatryan Roman
wrote:
>
> Hi Marco,
>
> You can find the list of the supported time units in TimeUtils javadoc [1]:
> DAYS: "d", "day"
> HOURS: "h", "hour"
> MINUTES: "min", "minute"
> SECONDS: "s", "sec", "second"
> MILLISECONDS: "ms", "mill
Hi Marco,
You can find the list of the supported time units in TimeUtils javadoc [1]:
DAYS: "d", "day"
HOURS: "h", "hour"
MINUTES: "min", "minute"
SECONDS: "s", "sec", "second"
MILLISECONDS: "ms", "milli", "millisecond"
MICROSECONDS: "µs", "micro", "microsecond"
NANOSECONDS: "ns", "nano", "nanosec
Hi Marco,
Yes, if TTL is not configured then the state will never expire (will stay
forever until deleted explicitly).
Regards,
Roman
On Tue, Dec 8, 2020 at 5:09 PM Marco Villalobos
wrote:
> After reading
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/state/stat
Thanks, Randal,
Yes, I think the only way is to partition the stream the same way as
kinesis does (as I wrote before).
Regards,
Roman
On Tue, Dec 8, 2020 at 1:38 PM Randal Pitt wrote:
> Hi Roman,
>
> We're using a custom watermarker that uses a histogram to calculate a "best
> fit" event time
After reading
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/state/state.html
It is unclear to me how long keyed state will exist if it has no TTL.
Is it cached forever, unless explicitly cleared or overwritten?
can somebody please explain to me?
Thank you.
Hi Narasimha,
I investigated your problem and it is caused by multiple issues. First vvp in
general cannot really handle multi job submissions per jar because the complete
deployment lifecycle in vvp is scoped around a single Flink job id.
Therefore vvp sets a generated Flink job id during submiss
scenario:
kafka stream enriched with tableS in postgresql
Let's pretend that the postgres has an organizations, departments, and
persons table, and we want to join the full name of the kafka table
that has the person id. I also want to determine if the person id is
missing.
This requires a left
In
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/jdbc.html
there no allowable dimensions specified for the lookup.cache.ttl.
Can somebody please provide a list of valid values and their meaning? I
know 's' for seconds is supported. How do I specify minutes?
Hi Guowei,
1. Unfortunately the UDF and the job are not in the same fatjar. Essentially
there is only one "fatjar" containing the Flink environment + the job, the UDF
is separate.
2. Yes, that is correct.
3. As explained in 1. I don't submit job jars to the Flink environment,
instea
Hi Roman,
We're using a custom watermarker that uses a histogram to calculate a "best
fit" event time as the data we receive can be very unordered.
As you can see we're using the timestamp from the first event in the batch,
so we're essentially sampling the timestamps rather than using them all.
Hi Randal,
Can you share the code for the 1st approach
(FlinkKinesisConsumer.setPeriodicWatermarkAssigner))?
I think the 2nd approach (flatMap) can be improved by partitioning the
stream the same way kinesis does (i.e. same partition key).
Regards,
Roman
On Mon, Dec 7, 2020 at 2:44 PM Randal Pi
Hi Deep,
It seems that the TypeInformation array in your code has 2 elements, but we
only need one here. This approach treats the entire csv file as a Row which has
only a one column, so there should be only one `BasicTypeInfo.STRING_TYPE_INFO`
in the array. And if you use the TextInputFormat i
Hi,
As far as I know, TableAggregateFunction is not supported yet in batch
mode[1]. You can try to use it in stream mode.
[1] https://issues.apache.org/jira/browse/FLINK-10978
Best,
Xingbo
Leonard Xu 于2020年12月8日周二 下午6:05写道:
> Hi, appleyuchi
>
> Sorry for the late reply,
> but could you descr
It appears that the datadog reporter does not report histograms. I'll
file an issue to fix that.
On 12/8/2020 4:42 AM, Fanbin Bu wrote:
Hi,
I followed [1] to define my own metric as:
val dropwizardHistogram = new com.codahale.metrics.Histogram(new
SlidingWindowReservoir(500))
histogram = ge
Awesome, thanks!
On Tue, Dec 8, 2020 at 11:55 AM Xingbo Huang wrote:
> Hi,
>
> This problem has been fixed[1] in 1.12.0,1.10.3,1.11.3, but release-1.11.3
> and release-1.12.0 have not been released yet (VOTE has been passed). I run
> your job in release-1.12, and the plan is correct.
>
>
> [1] h
Hi, appleyuchi
Sorry for the late reply,
but could you describe you problem more or post your exception stack? The doc
you posted has contained the section to define and register function.
And I suggest you post your entire code in email directly that can reproduce
the problem, thus the com
Hi,
This problem has been fixed[1] in 1.12.0,1.10.3,1.11.3, but release-1.11.3
and release-1.12.0 have not been released yet (VOTE has been passed). I run
your job in release-1.12, and the plan is correct.
[1] https://issues.apache.org/jira/browse/FLINK-19675
Best,
Xingbo
László Ciople 于2020年
Hello,
I am trying to use Flink v1.11.2 with Python and the Table API to read and
write back messages to kafka topics. I am trying to filter messages based
on the output of a udf which returns a boolean. It seems that Flink ignores
the WHERE clause in my queries and every input message is received
Hi Kye,
Almost for sure this error is not the primary cause of the failure. This
error means that the node reporting it, has detected some fatal failure on
the other side of the wire (connection reset by peer), but the original
error is somehow too slow or unable to propagate to the JobManager bef
38 matches
Mail list logo