Hi, what version of Flink did you use?
Best,
Kurt
On Fri, Apr 21, 2017 at 12:19 AM, JAVIER RODRIGUEZ BENITO <
javier.rodriguezben...@telefonica.com> wrote:
> Hi,
>
>
>
> I think there is a inconsistent behaviour in parseRecord function of
> GenericCsvInputFormat, but I would like anybody confirm
One additional note:
In FlinkKafkaConsumer 0.9+, the current read offset should already exist in
Flink metrics.
See https://issues.apache.org/jira/browse/FLINK-4186.
But yes, this is still missing for 0.8, so you need to directly query ZK for
this.
Cheers,
Gordon
On 21 April 2017 at 8:28:09 A
Hi Sandeep,
It isn’t fixed yet, so I think external tools like the Kafka offset checker
still won’t work.
If you’re using 08 and is currently stuck with this issue, you can still
directly query ZK to get the offsets.
I think for FlinkKafkaConsumer09 the offset is exposed to Flink's metric syste
I debugged a bit the process repeating the job on a sub-slice of the entire
data (using the id value to filter data with parquet push down filters) and
all slices completed successfully :(
So I tried to increase the parallelism (from 1 slot per TM to 4) to see if
this was somehow a factor of stress
Hello,
the MetricQueryService is used by the webUI to fetch fetch metrics from
the JobManager and all TaskManagers. It is only used when the
webUI is accessed.
Based on the logs you gave the TaskManager isn't killed by the
JobManager; instead the JobManager only detected that the TaskManager
Hi Folks,
I’m trying to use FlinkML 1.2 from Java … getting this:
SVM svm = new SVM()
.setBlocks(env.getParallelism())
.setIterations(100)
.setRegularization(0.001)
.setStepsize(0.1)
.setSeed(42);
svm.fit(labelledTraining);
The type org.apache.flink.api.scala.DataSet cannot be resolv
Hey all,
So we are doing some experimenting around large keyed state in Flink 1.2 on
a single task manager and we keep having our task manager killed by the job
manager after about 10 minutes due to this exception:
Fetching metrics failed.
akka.pattern.AskTimeoutException: Ask timed out on
[Actor
Hello,
You could also try using a profiler that shows what objects are using
what amount of memory. E.g., JProfiler or Java Flight Recorder [1].
Best,
Gábor
[1]
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks001.html
On Thu, Apr 20, 2017 at 6:00 PM, Newport, B
Hi,
I think there is a inconsistent behaviour in parseRecord function of
GenericCsvInputFormat, but I would like anybody confirm it.
When using readCsvFile and mapping to pojo objects with fields typed as String,
the result of the parsing is diferent depending on the field position when
having
Ok
The concensus seems to be that it’s us not Flink ☺ So we’ll look harder at what
we’re doing in case there is anything silly. We are using 16K network buffers
BTW which is around 0.5GB with the defaults.
From: Till Rohrmann [mailto:trohrm...@apache.org]
Sent: Thursday, April 20, 2017 11:52 AM
Hi Mary,
the groupBy + reduceGroup works across all partitions of a DataSet. This
means that elements from each partition are grouped (creating potentially a
new partitioning) and then for each group the reduceGroup function is
executed.
Cheers,
Till
On Thu, Apr 20, 2017 at 5:14 PM, Mary m wrot
Hi Billy,
if you didn't split the different data sets up into different slot sharing
groups, then your maximum parallelism is 40. Thus, it should be enough to
assign 40^2 * 20 * 4 = 128000 network buffers. If that is not enough
because you have more than 4 shuffling steps in parallel running then
I think that if you have a lot of memory available, the GC gets kind of lazy.
In our case, the issue was just the latency caused by the GC, cause we were
loading more data than it could fit in memory. Hence optimizing the code gave
us a lot of improvements. FlatMaps are also dangerous as objects
Hi If groupeby+reduceGroup is used, does each groupeby+reduceGroup take place
on a single partition? If yes, if we have more groups than the partitions, what
happens?
Cheers,Mary
Your reuse idea kind of implies that it’s a GC generation rate issue, i.e. it’s
not collecting fast enough so it’s running out of memory versus heap that’s
actually anchored, right?
From: Stefano Bortoli [mailto:stefano.bort...@huawei.com]
Sent: Thursday, April 20, 2017 10:33 AM
To: Newport, Bi
Hello,
Since ExecutionEnvironment#execute() blocks until the job is finished
you should be able to just do this:
data.writeAsText();
env.execute();
{ do Map-Job }
Note that your current solution is wrong, as it translates to this:
DataSet result = ...
result.writeAsText();
if (result.count()
Hi Billy,
The only suggestion I can give is to check very well in your code for useless
variable allocations, and foster reuse as much as possible. Don’t create a new
collection at any map execution, but rather clear, reuse the collected output
of the flatMap, and so on. In the past we run lon
I don’t think our function are memory heavy they typically are cogroups and
merge the records on the left with the records on the right.
We’re currently requiring 720GB of heap to do our processing which frankly
appears ridiculous to us. Could too much parallelism be causing the problem?
Lookin
Hi
I am working on a use case where I want to start a timer for a given event
type and when that timer expires it will perform certain action. This can
be done using Process Function.
But I also want to cancel scheduled timer in case of some other types of
events. I also checked the implementatio
Hi,
I have a program that contains a preprocessing with Flink Objects and at the
end writes the result with „result.writeAsText(„...“)“.
After that I call a method that is basically a MapReduce-Job (actually only a
Map-Job) which depends on the written file.
So what is the smartest way to dela
20 matches
Mail list logo