Here is some code of my current solution, if there is some better way of
doing it let me know.
KeyedStream hostStream = inStream
.keyBy(t -> t.getHost());
KeyedStream,String> freqStream = hostStreams
.timeWindow(Time.of(WINDOW_SIZE, TimeUnit.MILLISECONDS))
.fold(new Tuple2(0, ""), new Count()
Update on the status so far I suspect I found a problem in a secure
setup.
I have created a very simple Flink topology consisting of a streaming
Source (the outputs the timestamp a few times per second) and a Sink (that
puts that timestamp into a single record in HBase).
Running this on a non-
Hi,
where are you storing the results of each window computation to? Maybe you
could also store it from inside a custom WindowFunction where you just count
the elements and then store the results.
On the other hand, adding a (1) field and doing a window reduce (à la
WordCount) is going to be wa
Hi Stephan,
Thanx for the response. Actually I am using Maven and below are all Flink
dependencies I have in my pom.xml file.
/
org.apache.flink
flink-core
0.9.1
provided
Great!
I'll watch the issue and give it a test once I see a working patch.
Niels Basjes
On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels wrote:
> Hi Niels,
>
> Thanks a lot for reporting this issue. I think it is a very common setup
> in corporate infrastructure to have restrictive firewall
Hi Niels,
Thanks a lot for reporting this issue. I think it is a very common setup in
corporate infrastructure to have restrictive firewall settings. For Flink
1.0 (and probably in a minor 0.10.X release) we will have to address this
issue to ensure proper integration of Flink.
I've created a JIR
Hej,
I want to do the following thing:
1. Split a Stream of incoming Logs by host address.
2. For each Key, create time based windows
3. Count the number of items in the window
4. Feed it into a statistical model that is maintained for each host
Since I don't have fields to sum upon, I use a (win
Hi,
I forgot to answer your other question:
On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger wrote:
> so the problem is that you can not submit a job to Flink using the
> "/bin/flink" tool, right?
> I assume Flink and its TaskManagers properly start and connect to each
> other (the number of Task
Hi,
Sounds like RDF problems to me :)
To build an index you could do the following:
triplet :=
(0) build set of all triplets (with strings)
triplets = triplets1.union(triplets2)
(1) assign unique long ids to each vertex
vertices = triplets.flatMap() => [,,...].distinct()
vertexWithID = vertic
Converting String ids into Long ids can be quite expensive, so you should
make sure it pays off.
The save way to do it is to get all unique String ids (project, distinct),
do zipWithUniqueId, and join all DataSets that have the String id with the
new long id. So it is a full sort for the unique an
Hi Martin,
thanks for the suggestion but unfortunately in my use case I have another
problem: I have to join triplets when f2==f0..is there any way to translate
also references? i.e. if I have 2 tuples , when I apply that
function I obtain something like <1,>,<2,>.
I'd like to be able to join the
Hi Flavio,
If you just want to assign a unique Long identifier to each element in
your dataset, you can use the DataSetUtils.zipWithUniqueId() method [1].
Best,
Martin
[1]
https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java#L131
Hi to all,
I was reading the thread about the Neo4j connector and an old question came
to my mind.
In my Flink job I have Tuples with String ids that I use to join on that
I'd like to convert to long (because Flink should improve quite a lot the
memory usage and the processing time if I'm not wro
13 matches
Mail list logo