Re: Tuple performance and the curious JIT compiler

2016-03-07 Thread Stephan Ewen
Hi Greg!

Sounds very interesting.

Do you have a hunch what "virtual" Tuple methods are being used that become
less jit-able? In many cases, tuples use only field accesses (like
"vakle.f1") in the user functions.

I have to dig into the serializers, to see if they could suffer from that.
The "getField(pos)" method for example should always have many overrides
(though few would be loaded at any time, because one usually does not use
all Tuple classes at the same time).

Greetings,
Stephan


On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan  wrote:

> I am noticing what looks like the same drop-off in performance when
> introducing TupleN subclasses as expressed in "Understanding the JIT and
> tuning the implementation" [1].
>
> I start my single-node cluster, run an algorithm which relies purely on
> Tuples, and measure the runtime. I execute a separate jar which executes
> essentially the same algorithm but using Gelly's Edge (which subclasses
> Tuple3 but does not add any extra fields) and now both the Tuple and Edge
> algorithms take twice as long.
>
> Has this been previously discussed? If not I can work up a demonstration.
>
> [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
>
> Greg
>


[jira] [Created] (FLINK-3584) WebInterface: Broken graphical display in Chrome

2016-03-07 Thread Niels Basjes (JIRA)
Niels Basjes created FLINK-3584:
---

 Summary: WebInterface: Broken graphical display in Chrome
 Key: FLINK-3584
 URL: https://issues.apache.org/jira/browse/FLINK-3584
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Reporter: Niels Basjes


When running a Flink (streaming) application there is a very nice graphical 
representation of the flow in the jobtracker webinterface.

We found that when using Firefox we can move and scale this graphics very 
nicely and also there are arrows between the boxes that show which direction 
the data flows.

When using Google Chrome to look at the exact same data we find that we can 
scale the image but not move it around. And the arrows are missing.

Something that doesn't work in both cases (but what we do expect should work) 
is the option to move up/down the separator between the graphical at the top 
and the textual overview at the bottom. Or is this simply a missing feature?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3585) Deploy scripts don't support spaces in paths

2016-03-07 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3585:
-

 Summary: Deploy scripts don't support spaces in paths
 Key: FLINK-3585
 URL: https://issues.apache.org/jira/browse/FLINK-3585
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.0.0, 1.1.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Remote TaskManager Connection Problem

2016-03-07 Thread Deepak Jha
Hi Stephan,
Thanks for the response. I was able to resolve the issue, I was using
localhost in jobmanager name instead of container name... There were few
more issues which I would like to mention
- I'm using S3 for storage/checkpoint in Flink HA mode, I realized that I
have to set fs.hdfs.hadoopconf in conf/flink-conf.yaml and add
core-site.xml in conf/ .. Since I'm deploying it on AWS I had to place
hadoop-aws.jar as well


On Fri, Mar 4, 2016 at 1:22 AM, Stephan Ewen  wrote:

> The  pull request https://github.com/apache/flink/pull/1758 should improve
> the TaskManager's network interface selection.
>
>
> On Fri, Mar 4, 2016 at 10:19 AM, Stephan Ewen  wrote:
>
> > Hi!
> >
> > This registration phase means that the TaskManager tries to tell the
> > JobManager that it is available.
> > If that fails, there can be two reasons
> >
> >   1) Network communication not possible to the port
> >   1.1) JobManager IP really not reachable (not the case, as you
> > described)
> >   1.2) TaskManager selected a wrong network interface to work with
> >   2) JobManager not listening
> >
> >
> > To look into 1.2, can you check the TaskManager log at the beginning,
> > where it says what interface/hostname the TaskManager selected to use?
> >
> > Thanks,
> > Stephan
> >
> >
> >
> >
> >
> >
> > On Fri, Mar 4, 2016 at 2:48 AM, Deepak Jha  wrote:
> >
> >> Hi All,
> >> I've created 2 docker containers on my local machine, one running
> >> JM(192.168.99.104) and other running TM. I was expecting to see TM in
> the
> >> JM UI but it did not happen. On looking into the TM logs I see following
> >> lines
> >>
> >>
> >> 01:29:50,862 DEBUG org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Starting TaskManager process reaper
> >> 01:29:50,868 INFO  org.apache.flink.runtime.filecache.FileCache
> >>  - User file cache uses directory
> >> /tmp/flink-dist-cache-be63f351-2bce-48ef-bbc4-fb0f40fecd49
> >> 01:29:51,093 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Starting TaskManager actor at
> >> akka://flink/user/taskmanager#1222392284.
> >> 01:29:51,095 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - TaskManager data connection information: 140efeb188cc
> >> (dataPort=6122)
> >> 01:29:51,096 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - TaskManager has 1 task slot(s).
> >> 01:29:51,097 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Memory usage stats: [HEAP: 386/494/494 MB, NON HEAP: 30/31/-1 MB
> >> (used/committed/max)]
> >> 01:29:51,104 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 1, timeout: 500
> >> milliseconds)
> >> 01:29:51,633 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 2, timeout: 1000
> >> milliseconds)
> >> 01:29:52,652 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 3, timeout: 2000
> >> milliseconds)
> >> 01:29:54,672 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 4, timeout: 4000
> >> milliseconds)
> >> 01:29:58,693 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 5, timeout: 8000
> >> milliseconds)
> >> 01:30:06,702 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>  - Trying to register at JobManager akka.tcp://
> >> flink@192.168.99.104:6123/user/jobmanager (attempt 6, timeout: 16000
> >> milliseconds)
> >>
> >>
> >> However, from TM i am able to reach JM on port 6123
> >> root@140efeb188cc:/# nc -v 192.168.99.104 6123
> >> Connection to 192.168.99.104 6123 port [tcp/*] succeeded!
> >>
> >>
> >> masters file on TM contains
> >> 192.168.99.104:8080
> >>
> >> Did anyone face this issue with remote JM/TM combination ?
> >>
> >> --
> >> Thanks,
> >> Deepak Jha
> >>
> >
> >
>



-- 
Thanks,
Deepak Jha


Dynamically repartitioned sources

2016-03-07 Thread Maxim
I'm looking at using Flink for a streaming project that has to use some
internal systems as event sources. They are very similar to Kafka in their
semantic. The data is partitioned and each partition can be replayed from a
specified offset.

The first system creates and deletes such partitions dynamically based on
load. It provides an API to get list of partitions as well as their state
(open, closed for append).

The second system has a fixed set of a few thousand partitions, but they
are allocated to a dynamic set of hosts and each host provides poll API
that returns events from all partitions that currently reside on it. The
metadata API that returns current mapping of partitions to hosts is
provided.

I found a thread

that
mentioned that changing parallelism is one of the high priority items for
this year. Has any work started on it? And would it support the type of
dynamic sources we have?

I could try adding such support myself if it would help to speed things up.

Thanks,

Maxim.


Re: Tuple performance and the curious JIT compiler

2016-03-07 Thread Greg Hogan
The issue is not with the Tuple hierarchy (running Gelly examples had no
effect on runtime, and as you note there aren't any subclass overrides) but
with CopyableValue. I had been using IntValue exclusively but had switched
to using LongValue for graph generation. CopyableValueComparator and
CopyableValueSerializer are now working with multiple types.

If I create IntValue- and LongValue-specific versions of
CopyableValueComparator and CopyableValueSerializer and modify
ValueTypeInfo to return these then I see the expected performance.

Greg

On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen  wrote:

> Hi Greg!
>
> Sounds very interesting.
>
> Do you have a hunch what "virtual" Tuple methods are being used that become
> less jit-able? In many cases, tuples use only field accesses (like
> "vakle.f1") in the user functions.
>
> I have to dig into the serializers, to see if they could suffer from that.
> The "getField(pos)" method for example should always have many overrides
> (though few would be loaded at any time, because one usually does not use
> all Tuple classes at the same time).
>
> Greetings,
> Stephan
>
>
> On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan  wrote:
>
> > I am noticing what looks like the same drop-off in performance when
> > introducing TupleN subclasses as expressed in "Understanding the JIT and
> > tuning the implementation" [1].
> >
> > I start my single-node cluster, run an algorithm which relies purely on
> > Tuples, and measure the runtime. I execute a separate jar which executes
> > essentially the same algorithm but using Gelly's Edge (which subclasses
> > Tuple3 but does not add any extra fields) and now both the Tuple and Edge
> > algorithms take twice as long.
> >
> > Has this been previously discussed? If not I can work up a demonstration.
> >
> > [1] https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> >
> > Greg
> >
>


Re: Machine Learning on Apache Fink

2016-03-07 Thread Dmitriy Lyubimov
still in the works (for mahout). but soon.

On Sat, Jan 9, 2016 at 3:46 AM, Ashutosh Kumar 
wrote:

> I see lot of study materials and even book available for ml on spark. Spark
> seems to be more mature for analytics related work as of now. Please
> correct me if I am wrong. As I have already built my collection and data
> pre processing layers on flink , I want to use Flink for analytics as well.
> Thanks in advance.
>
>
> Ashutosh
>
> On Sat, Jan 9, 2016 at 3:32 PM, Ashutosh Kumar  >
> wrote:
>
> > I am looking for some study material and examples on machine learning .
> > Are classification , recommendation and clustering libraries available ?
> > What is the timeline for Flink as backend for Mahout? I am building a
> meta
> > data driven framework over Flink . While building data collection and
> > transformation modules was cool , I am struggling since I started
> analytics
> > module. Thanks in advance.
> > Ashutosh
> >
>


[jira] [Created] (FLINK-3586) Risk of data overflow while use sum/count to calculate AVG value

2016-03-07 Thread Chengxiang Li (JIRA)
Chengxiang Li created FLINK-3586:


 Summary: Risk of data overflow while use sum/count to calculate 
AVG value
 Key: FLINK-3586
 URL: https://issues.apache.org/jira/browse/FLINK-3586
 Project: Flink
  Issue Type: Sub-task
  Components: Table API
Reporter: Chengxiang Li
Priority: Minor


Now, we use {{(sum: Long, count: Long}} to store AVG partial aggregate data, 
which may have data overflow risk, we should use unbounded data type(such as 
BigInteger) to store them for necessary data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)