Don’t we need to set the number of slots to 24 (4 sources + 16 mappers + 4
sinks) ?
Or is there a way not to set the number of slots per TaskManager instead of
globally so that they are at least equally dispatched among the nodes ?
As for the sink deployment : that’s not good news ; I mean we w
Dear all,
I want to convert the data from each window of stream to dataset. What is
the best way to do that? So, while streaming, at the end of each window I
want to convert those data to dataset and possible apply dataset
transformations to it.
Any suggestions?
-best
-sane
Fabian,
Thank a lot for your response. Really appreciated. I've some additional
questions (please see inline)
On Wed, Feb 3, 2016 at 2:42 PM, Fabian Hueske wrote:
> Hi,
>
> 1) At the moment, state is kept on the JVM heap in a regular HashMap.
>
Is this state replicated across JVMs in a cluster
Hi Sane,
Currently, DataSet and DataStream API a strictly separated. Thus, this
is not possible at the moment.
What kind of operation do you want to perform on the data of a window?
Why do you want to convert the data into a data set?
-Matthias
On 02/04/2016 10:11 AM, Sane Lee wrote:
> Dear all
I had that problem/question some time ago, too.
The quick fix is to just put the line number in the line itself. Go for it.
However, we worked out a solution for another distributed processing
system, that did the following:
Read each partition, count the lines, broadcast a map
"partition->lin
Hi Matthias,
This need not to be necessarily in api functions. I just want to get a
roadmap to add this functionality. Should I save each window's data into
disk and create a new dataset environment in parallel? Or change trigger
functionality maybe?
I have large windows. As I asked in previous q
I have also, similar scenario. Any suggestion would be appreciated.
On Thu, Feb 4, 2016 at 10:29 AM Jeyhun Karimov wrote:
> Hi Matthias,
>
> This need not to be necessarily in api functions. I just want to get a
> roadmap to add this functionality. Should I save each window's data into
> disk an
Hi,
I’ve got two more questions on different topic…
First one :
Is there a way to monitor the buffers status. In order to find bottleneck in
our application we though it could be usefull to be able to have a look at the
different exchange buffers’ status. To know if they are full (or as an exa
Hi Gwen,
let me answer the second question: There is a JIRA to reintroduce the
configuration parameter: https://issues.apache.org/jira/browse/FLINK-2213.
I will try to get a fix for this into the 1.0 release.
I think I removed back then because users were unable to define the number
of vcores ind
Concerning the first question:
What you are looking for is backpressure monitoring. If a task cannot push
its data to the next task, it is backpressured.
This pull request adds a first version of backpressure monitoring:
https://github.com/apache/flink/pull/1578
We will try and get it merged soo
Hi Soumya,
Operator state is partitioned across JVMs and not replicated. However, it
is checkpointed (e.g., to HDFS) at regular intervals to guarantee
fault-tolerance with exactly-once semantics. In case of a failure, all
operator states are recovered from checkpoints.
The state backend is respon
Hi!
If I understand you correctly, what you are looking for is a kind of
periodic batch job, where the input data for each batch is a large window.
We have actually thought about this kind of application before. It is not
on the short term road map that we shared a few weeks ago, but I think it
w
Anyone looking into this? Java 7 reached its end of life at april 2015 with
its last public update (numer 80) and the ability to run Java 8 jobs would
be more and more important in the future. IMHO, the default target of the
maven compiler plugin should be set to 1.8 in the 1.0 release. In most of
Hi all,
I am new to flink. I wrote a simple program and I want it to output as csv
file.
timeWindowAll(Time.of(3, TimeUnit.MINUTES))
.apply(newFunction1())
.writeAsCsv("file:///user/someuser/Documents/somefile.csv");
When I change the sink to . print(), it works and outputs some results.
I w
Hi!
I am running Java 8 for a year without an issue. The code is compiled for
target Java 7, but can be run with Java 8.
User code that is targeted for Java 8 can be run if Flink is run with Java
8.
The initial error you got was because you probably compiled with Java 8 as
the target, and ran it
Hey Radu,
As you are using the streaming api I assume that you call env.execute() in
both cases. Is that the case?
Do you see any errors appearing? My first call would be if your data type
is not a tuple type then writeAsCsv does not work by default.
Best,
Marton
On Thu, Feb 4, 2016 at 11:36 A
I've tested several configurations (also changing my compilation to 1.7 but
then sesame 4 was causing the error [1]):
1. Flink compiled with java 1.7 (default), runned within Eclipse with
Java 8: OK
2. Flink compiled with java 1.7 (default), runned the cluster with java
8: not able to
Hi Marton,
Thanks to your comment I managed to get it worked. At least it outputs the
results. However, what I need is to output each window result seperately.
Now, it outputs the results of parallel working windows (I think) and
appends the new results to them. For example, If I have parallelism
Ok thanks !
All that’s left is to wait then.
B.R.
From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan
Ewen
Sent: jeudi 4 février 2016 11:19
To: user@flink.apache.org
Subject: Re: Internal buffers supervision and yarn vCPUs
Concerning the first question:
What you ar
> On 04 Feb 2016, at 12:02, Gwenhael Pasquiers
> wrote:
>
> Ok thanks !
>
> All that’s left is to wait then.
If you have spare time and are working with the current snapshot version, it
would be great to get some feedback on the pull request. :-)
– Ufuk
I found that merge method for datastream does not exist in latest version .
What is the equivalent for it ? Shall I use union or join ?
Thanks
Ashutosh
Very good, you are absolutely right. :D
> On 04 Feb 2016, at 05:07, Nirmalya Sengupta
> wrote:
>
> on
Hi,
sorry for the inconvenience. The new name of merge is now union.
Regards,
Aljoscha
> On 04 Feb 2016, at 13:01, Ashutosh Kumar wrote:
>
> I found that merge method for datastream does not exist in latest version .
> What is the equivalent for it ? Shall I use union or join ?
>
> Thanks
>
Hi Radu,
It is indeed interesting to know how each window could be registered separately
- I am not sure it any of the existing mechanisms in Flink support this.
I think you need to create your own output sink. It is a bit tricky to pass the
window sequence number (actually I do not think such
You can get the end time of a window from the TimeWindow object which is
passed to the AllWindowFunction. This is basically a window ID / index.
I would go for a custom output sink which writes records to files based on
their timestamp.
IMO, this would be cleaner & easier than implementing the file
I'm wondering which kind of transformations you want to apply to the window
you cannot apply with the DataStream API?
Would it be sufficient for you to have the windows as files in HDFS and
then run batch jobs against the windows on disk? If so, you could use our
filesystem sink, which creates fil
Sorry I was confused about the number of slots, it’s good now.
However, is disableChaing or disableOperatorChaining working properly ?
I called those methods everywhere I could but it still seems that some of my
operators are being chained together I can’t go over 16 used slot where I
should be
Hi Shikhar,
the currently open windows are also part of the operator state. Whenever a
window operator receives a barrier it will checkpoint the state of the user
function and additionally all uncompleted windows. This also means that the
window operator does not buffer the barriers. Once it has t
Hi Gwen!
You actually need not 24 slots, but only as many as the highest parallelism
is (16). Slots do not hold individual tasks, but "pipelines".
Here is an illustration how that works.
https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html#configuring-taskmanager-process
To your other question, there are two things in Flink:
(1) Chaining. Tasks are folded together into one task, run by one thread.
(2) Resource groups: Tasks stay separate, have separate threads, but share
a slot (which means share memory resources). See the link in my previous
mail for an explanat
Hi all,
as mentioned before I am trying to import the RowMatrix from Spark to Flink…
In the code I already ran into a dead end… In the function
multiplyGramianMatrixBy() (see end of mail) there is the line:
rows.context.broadcast(v) (rows is a DataSet[Vector]
What exactly is this line doing? Do
Hi Flavio,
To address your points:
1) It runs. That's fine.
2) It doesn't work to run a Java 8 compiled Flink job with Java 7
Flink cluster if you use Java 8 non-backwards-compatible features in
your job.
3) I compile Flink daily with Java 8. Also, we have Travis CI tests
which uses OpenJDK and O
Okay ;
Then I guess that the best we can do is to disable chaining (we really want one
thread per operator since they are doing long operations) and have the same
parallelism for sinks as mapping : that way each map will have it’s own sink
and there will be no exchanges between flink instances.
Hi Sourigna,
it turned out to be a bug in the GradientDescent implementation which
cannot handle sparse gradients. That is not so problematic by itself,
because the sum of gradient vectors is usually dense even if the individual
gradient vectors are sparse. We simply forgot to initialize the initi
Flink compiles correctly using java 8 as long as you leave java 1.7 source
and target in the maven java compiler.
If you change them to 1.8 flink-core doesn't compile anymore.
On Thu, Feb 4, 2016 at 4:23 PM, Maximilian Michels wrote:
> Hi Flavio,
>
> To address your points:
>
> 1) It runs. That'
Hi Radu,
what does the log of the TaskManager 10.204.62.80:57910 say?
Cheers,
Till
On Wed, Feb 3, 2016 at 6:00 PM, Radu Tudoran
wrote:
> Hello,
>
>
>
>
>
> I am facing an error which for which I cannot figure the cause. Any idea
> what could cause such an error?
>
>
>
>
>
>
>
> java.lang.Exc
Hey,
I am actually facing a similar issue lately, where the job manager release
the task slots as it cannot contact the taskmanager.
Meanwhile the taskmanager is also trying to connect to the Jobmanager and
fails multiple times. This happens on multiple taskmanagers seemingly
randomly. So the TM
Hi,
Well…yesterday when I looked into it there was no additional info than the one
I have send. Today I reproduced the problem and I could see in the log file.
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scal
@Gyula Do you see log messages about quarantined actor systems?
There may be an issue with Akka Death watches that once the connection is
lost, it cannot be re-established unless the TaskManager is restarted
http://doc.akka.io/docs/akka/current/scala/remoting.html#Lifecycle_and_Failure_Recovery_M
Yes exactly , it says it is quarantined.
Gyula
Gyula
On Thu, Feb 4, 2016 at 4:09 PM Stephan Ewen wrote:
> @Gyula Do you see log messages about quarantined actor systems?
>
> There may be an issue with Akka Death watches that once the connection is
> lost, it cannot be re-established unless the
Okay, here are the docs for the Akka version we are using:
http://doc.akka.io/docs/akka/2.3.14/scala/remoting.html#Lifecycle_and_Failure_Recovery_Model
It says that after a remote deathwatch trigger, the actor system must be
restarted before it can connect again.
We probably need to do the follow
Hi Lydia,
Spark and Flink are not identical. Thus, you’ll concepts in both system
which won’t have a corresponding counter part in the other system. For
example, rows.context.broadcast(v) broadcasts the value v so that you can
use it on all Executors. Flink follows a slightly different concept whe
We should probably add to the TaskManager a "restart on quarantined"
strategy anyways.
We can detect it as follows:
http://stackoverflow.com/questions/32471088/akka-cluster-detecting-quarantined-state
On Thu, Feb 4, 2016 at 5:18 PM, Stephan Ewen wrote:
> Okay, here are the docs for the Akka ve
I see. Did you perform a full "mvn clean package -DskipTests" after
you changed the source level to 1.8?
On Thu, Feb 4, 2016 at 4:33 PM, Flavio Pompermaier wrote:
> Flink compiles correctly using java 8 as long as you leave java 1.7 source
> and target in the maven java compiler.
> If you change
yes I did
On Thu, Feb 4, 2016 at 5:44 PM, Maximilian Michels wrote:
> I see. Did you perform a full "mvn clean package -DskipTests" after
> you changed the source level to 1.8?
>
> On Thu, Feb 4, 2016 at 4:33 PM, Flavio Pompermaier
> wrote:
> > Flink compiles correctly using java 8 as long as y
I added a new Ticket: https://issues.apache.org/jira/browse/FLINK-3336
This will implement the data shipping pattern that you mentioned in your
initial mail. I also have the Pull request almost ready.
> On 04 Feb 2016, at 16:25, Gwenhael Pasquiers
> wrote:
>
> Okay ;
>
> Then I guess that t
Hi,
I'm new to Flink. I was working with Flink and Kafka before. It was working
fine.
But now Flink is not able to consume data from Kafka even with newly
created topic.
This is the log:
15:12:59,684 INFO
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer - No prior
offsets found f
Hi,
I don't understand what you are trying to achieve. If you want to read the
topic from the beginning, use a different group.id.
Flink should consume data from the topic when you produce something into it.
As you can see from the log statement, its at the "INFO" log level, hence
its not an issue
Hello,
I have few questions regarding garbage collector’s stats on Taskmanagers and
any help or further documentation would be great.
I have collected “1 second polling requesting" stats on 7 Taskmanagers, through
the relative request (/taskmanagers//) of the Monitoring REST
API while a job, t
For example, I will do aggregate operations with other windows (n-window
aggregations) that are already outputted.
I tried your suggestion and used filesystem sink, outputted to HDFS.
I got k files in HDFS directory where k is the number of parallelism (I
used single machine).
These files get bigg
50 matches
Mail list logo