Hi all,
we came across some interesting behaviour today.
We enabled object reuse on a streaming job that looks like this:
stream = env.addSource(source)
stream.map(mapFnA).addSink(sinkA)
stream.map(mapFnB).addSink(sinkB)
Operator chaining is enabled, so the optimizer fuses all operations into
a
Hi all,
we are struggling with RateLimitExceededExceptions with the Kinesis
Producer. The Flink documentation claims that the Flink Producer
overrides the RateLimit setting from Amazon's default of 150 to 100.
I am wondering whether we'd need 100/($sink_parallelism) in order for it
to work correc
via zip -d should
> be enough (I sure hope so, since this is some of the worst issues I've come
> across).
>
>
> Federico D'Ambrosio
>
> Il 25 set 2017 9:51 AM, "Urs Schoenenberger"
> ha scritto:
>
>> Hi Federico,
>>
>> just
Hi Federico,
just guessing, but are you explicitly setting the Main-Class manifest
attribute for the jar that you are building?
Should be something like
mainClass in (Compile, packageBin) :=
Some("org.yourorg.YourFlinkJobMainClass")
Best,
Urs
On 23.09.2017 17:53, Federico D'Ambrosio wrote:
>
also depend on the ration " #total records/#records that fit
>> into a single Sorter/Hashtable".
>>
>> I'm CC'ing Fabian, just to be sure. He knows that stuff better than I do.
>>
>> Best,
>> Aljoscha
>>
>>> On 31. Aug 20
Hi all,
we have a DataSet pipeline which reads CSV input data and then
essentially does a combinable GroupReduce via first(n).
In our first iteration (readCsvFile -> groupBy(0) -> sortGroup(0) ->
first(n)), we got a jobgraph like this:
source --[Forward]--> combine --[Hash Partition on 0, Sort]-
Hi,
you need to enable checkpointing for your job. Flink uses ".pending"
extensions to mark parts that have been completely written, but are not
included in a checkpoint yet.
Once you enable checkpointing, the .pending extensions will be removed
whenever a checkpoint completes.
Regards,
Urs
On
Hi all,
I was wondering about the heuristics for CombineHint:
Flink uses SORT by default, but the doc for HASH says that we should
expect it to be faster if the number of keys is less than 1/10th of the
number of records.
HASH should be faster if it is able to combine a lot of records, which
hap
Hi Karthik,
maybe I'm misunderstanding, but there are a few things in your
description that seem strange to me:
- Your "slow" operator seems to be slow not because it's compute-heavy,
but because it's waiting for a response. Is AsyncIO (
https://ci.apache.org/projects/flink/flink-docs-release-1.3
Hi Gyula,
I don't know the cause unfortunately, but we observed a similiar issue
on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1.
Which version are you running on?
Urs
On 12.07.2017 09:48, Gyula Fóra wrote:
> Hi,
>
> I have noticed a strange behavior in one of our jobs: ev
Hi,
if I use DataStream::partitionCustom, will the partition number that my
custom Partitioner returns always be equal to getIndexOfThisSubtask
in the following operator?
A test case with different parallelisms seems to suggest this is true,
but the Javadoc seems ambiguous to me since the Partiti
Hi Greg,
do you have a link where I could read up on the rationale behind
avoiding Kryo? I'm currently facing a similar decision and would like to
get some more background on this.
Thank you very much,
Urs
On 21.06.2017 12:10, Greg Hogan wrote:
> The recommendation has been to avoid Kryo where p
mitted.
> Another thing to consider is the state backend. You'll probably have to use
> the RocksDBStateBackend to be able to spill state to disk.
>
> Hope this helps,
> Fabian
>
>
> 2017-06-16 17:00 GMT+02:00 Urs Schoenenberger <
> urs.schoenenber...@tngtech.com>:
&
Hi,
I'm working on a batch job (roughly 10 billion records of input, 10
million groups) that is essentially a 'fold' over each group, that is, I
have a function
AggregateT addToAggrate(AggregateT agg, RecordT record) {...}
and want to fold this over each group in my DataSet.
My understanding is
14 matches
Mail list logo