Hi Fabian,
Ah, that is good stuff, thanks. I’ll have to evaluate, but I believe that
everything I’m calculating can be done this way, though it looks like
FoldFunction + WindowFunction is better suited for my use case.
I may still need a custom trigger as well, though. Some sessions may never
Hi Fabian,
Thanks for the response. It turns out this was a red herring. I knew how many
events I was sending through the process, and the output of each type of window
aggregate was coming out to be about half of what I expected. It turns out,
however, that I hadn’t realized that the job wa
Hi Max,
Thanks for the feedback. The SnappyCodec is specified in
io.compression.codecs. But, for whatever reason, the issue appears to be that
Flink is unable to pick up the native Snappy libraries. Snappy compression
works fine from w/in the Hadoop cluster (e.g. using a Hive query), but not
Is it possible to process the same stream in two different ways? I can’t find
anything in the documentation definitively stating this is possible, but nor do
I find anything stating it isn’t. My attempt had some unexpected results,
which I’ll explain below:
Essentially, I have a stream of dat
Hi Till,
Thanks for the response. I’m hoping not to have to change Flink’s lib folder.
The path I specified does exist on each node, and –C is supposed to add the
path to the classpath on all nodes in the cluster. I might try to bundle the
snappy jar within my job jar to see if that works.
I have flink running on a standalone cluster, but with a Hadoop cluster
available to it. I’m using a RollingSink and a SequenceFileWriter, and I’ve
set compression to Snappy:
.setWriter(new SequenceFileWriter("org.apache.hadoop.io.compress.SnappyCodec",
SequenceFile.CompressionType.BLOCK))
H
HI Aljoscha,
Thanks for the response.
To answer your question, the base path did not exist. But, I think I found the
issue. I believe I had some rogue task managers running. As a troubleshooting
step, I attempted to restart my cluster. However, after shutting down the
cluster I noticed tha
I recently tried enabling checkpointing in a job (that previously works w/o
checkpointing) and received the following failure on job execution:
java.io.FileNotFoundException:
/tmp/flink-io-202fdf67-3f8c-47dd-8ebc-2265430644ed/a426eb27761575b3b79e464719bba96e16a1869d85bae292a2ef7eb72fa8a14c.0.buf
Hi Robert,
Thanks for the info on 1.2.
I copied over all 4 classes in fs.bucketing. I had to make some changes in my
version of BucketingSink. BucketingSink relies on an updated
StreamingRuntimeContext that provides a TimeServiceProvider, which the version
from 1.1.2 doesn’t.
It was easy en
Hi Robert,
Thanks! I’ll likely pursue option #2 and see if I can copy over the code from
org.apache.flink….fs.bucketing.
Do you know a general timeline for when 1.2 will be released or perhaps a
location where I could follow its progress?
Thanks again!
From: Robert Metzger
Reply-To: "user@
Hi Flinksters,
At one stage in my data stream, I want to save the stream to a set of rolling
files where the file name used (i.e. the bucket) is chosen based on an
attribute of each data record. Specifically, I’m using a windowing function to
create aggregates of certain metrics and I want to
11 matches
Mail list logo