Hi,
We use Flink to process our streaming data and need to convert categorical
variables to binary encoding for input in ML algo, any references on how to
do it in Flink?
Regards,
Adarsh
Hi,
Using flink 1.2.0, I faced to issue
https://issues.apache.org/jira/browse/FLINK-6117
https://issues.apache.org/jira/browse/FLINK-6117.
This issue is fixed at version 1.3.0. But I have some reason to trying to
find out work around.
I did,
1. change source according to
https://github.com/apache
I am still struggling to solve this problem.
I have no doubt that the JOB should automatically restart after restarting
the TASK MANAGER in YARN MODE. Is it a misunderstood?
Problem seems that *JOB MANAGER still try to connect to old TASK MANAGER
even after new TASK MANAGER container be created.*
Hello,
Has anybody experienced the following error on AWS EMR 5.8.0 with Flink 1.3.1
java.lang.ClassCastException:
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto
cannot be cast to com.google.protobuf.Message
Thanks,
Hi All,
I looked into an earlier email about the topic broadcast config through
connected stream and I couldn't find the conclusion.
I can't do the below approach since I need the config to be published to
all operator instances but I need keyed state for external querying.
streamToBeConfigured.
We have the same issue. We are finding that we cannot express the data flow in
a natural way because of unnecessary spilling. Instead, we're making our own
operators which combine multiple steps together and essentially hide it from
flink OR sometimes we even have to read an input dataset once p
Thanks Gordon for your response. I have around 80 parallel flatmap operator
instances and each instance requires 3 states. Out of which one is user
state in which each operator will have unique user's data and I need this
data to be queryable. The other two states are kind of static states which
ar
Thanks a lot everyone. I have the user data ingested from kafka and it is
keyed by userid. There are around 80 parallel flatmap operator instances
after keyby and there are around few million users. The map state includes
userid as the key and some value. I guess I will try the approach that
Aljosc
Thank you, Kien!
On Tue, Sep 5, 2017 at 8:01 AM, Kien Truong wrote:
> Hi,
>
> In my experience, RocksDB uses very little CPU, and doesn't need a
> dedicated CPU.
>
> However, it's quite disk intensive. You'd need fast, ideally dedicated
> SSDs to achieve the best performance.
>
> Regards,
>
> Ki
Hi,
This is mostly correct, but you cannot register a timer in open() because we
don't have an active key there. Only in process() and onTimer() can you
register a timer.
In your case, I would suggest to somehow clamp the timestamp to the nearest 2
minute (or whatever) interval or to keep an e
Hi Gábor,
thank you very much for your explanation, that makes a lot of sense.
Best regards,
Urs
On 05.09.2017 14:32, Gábor Gévay wrote:
> Hi Urs,
>
> Yes, the 1/10th ratio is just a very loose rule of thumb. I would
> suggest to try both the SORT and HASH strategies with a workload that
> is a
Hi,
In my experience, RocksDB uses very little CPU, and doesn't need a
dedicated CPU.
However, it's quite disk intensive. You'd need fast, ideally dedicated
SSDs to achieve the best performance.
Regards,
Kien
On 9/5/2017 1:15 PM, Bowen Li wrote:
Hi guys,
Does RocksDB need a dedicated C
Hi,
You can register a processing time timer inside the onTimer and the open
function to have a timer that run periodically.
Pseudo-code example:
|ValueState lastRuntime; void open() {
ctx.timerService().registerProcessingTimeTimer(current.timestamp +
6); } void onTimer() { // Run the p
Hi Navneeth,
Currently, I don't think there is any built-in functionality to trigger
onTimer periodically.
As for the second part of your question, do you mean that you want to query
on which key the fired timer was registered from? I think this also isn't
possible right now.
I'm looping in Aljos
I have a streaming application that has a keyBy operator followed by an
operator working on the keyed values (a custom sum operator). If the map
operator and aggregate operator are running on same Task Manager , will Flink
always serialize and deserialize the tuples, or is there an optimization
Hi,
I'm running a Flink batch job that reads almost 1 TB of data from S3 and
then performs operations on it. A list of filenames are distributed among
the TM's and each subset of files is read from S3 from each TM. This job
errors out at the read step due to the following error:
java.lang.Excepti
Hi!
Yes, backpressure should also increase the latency value calculated from
LatencyMarkers.
LatencyMarkers are special events that flow along with the actual stream
records, so they should also be affected by backpressure.
Are you asking because you observed otherwise?
Cheers,
Gordon
--
Sent
Hi Navneeth,
Answering your three questions separately:
1. Yes. Your MapState will be backed by RocksDB, so when removing an entry
from the map state, the state will be removed from the local RocksDB as
well.
2. If state classes are not POJOs, they will be serialized by Kryo, unless a
custom ser
Hi Urs,
Yes, the 1/10th ratio is just a very loose rule of thumb. I would
suggest to try both the SORT and HASH strategies with a workload that
is as similar as possible to your production workload (similar data,
similar parallelism, etc.), and see which one is faster for your
specific use case.
Hi Tony,
Currently, the functionality that you described does not exist in the
consumer. When a topic is deleted, as far as I know, the consumer would
simply consider the partitions as unreachable and continue to try fetching
records from them until they are up again.
I'm not entirely sure if a re
Hello,
There is a Flink Improvement Proposal to redesign the iterations:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132
This will address the termination issue.
Best,
Gábor
On Mon, Sep 4, 2017 at 11:00 AM, Xingcan Cui wrote:
> Hi Peter,
>
> That's a good idea, but
Hi all,
we have a DataSet pipeline which reads CSV input data and then
essentially does a combinable GroupReduce via first(n).
In our first iteration (readCsvFile -> groupBy(0) -> sortGroup(0) ->
first(n)), we got a jobgraph like this:
source --[Forward]--> combine --[Hash Partition on 0, Sort]-
How are you determining your data is stale? Also if you want to know the key,
why don't you store the key in your state as well?
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
23 matches
Mail list logo