Re: multiple processing of streams

2016-10-24 Thread robert.lancaster
Hi Fabian, Ah, that is good stuff, thanks. I’ll have to evaluate, but I believe that everything I’m calculating can be done this way, though it looks like FoldFunction + WindowFunction is better suited for my use case. I may still need a custom trigger as well, though. Some sessions may never

Re: multiple processing of streams

2016-10-24 Thread robert.lancaster
Hi Fabian, Thanks for the response. It turns out this was a red herring. I knew how many events I was sending through the process, and the output of each type of window aggregate was coming out to be about half of what I expected. It turns out, however, that I hadn’t realized that the job wa

Re: native snappy library not available

2016-10-20 Thread robert.lancaster
Hi Max, Thanks for the feedback. The SnappyCodec is specified in io.compression.codecs. But, for whatever reason, the issue appears to be that Flink is unable to pick up the native Snappy libraries. Snappy compression works fine from w/in the Hadoop cluster (e.g. using a Hive query), but not

multiple processing of streams

2016-10-19 Thread robert.lancaster
Is it possible to process the same stream in two different ways? I can’t find anything in the documentation definitively stating this is possible, but nor do I find anything stating it isn’t. My attempt had some unexpected results, which I’ll explain below: Essentially, I have a stream of dat

Re: native snappy library not available

2016-10-19 Thread robert.lancaster
Hi Till, Thanks for the response. I’m hoping not to have to change Flink’s lib folder. The path I specified does exist on each node, and –C is supposed to add the path to the classpath on all nodes in the cluster. I might try to bundle the snappy jar within my job jar to see if that works.

native snappy library not available

2016-10-19 Thread robert.lancaster
I have flink running on a standalone cluster, but with a Hadoop cluster available to it. I’m using a RollingSink and a SequenceFileWriter, and I’ve set compression to Snappy: .setWriter(new SequenceFileWriter("org.apache.hadoop.io.compress.SnappyCodec", SequenceFile.CompressionType.BLOCK)) H

Re: job failure with checkpointing enabled

2016-10-17 Thread robert.lancaster
HI Aljoscha, Thanks for the response. To answer your question, the base path did not exist. But, I think I found the issue. I believe I had some rogue task managers running. As a troubleshooting step, I attempted to restart my cluster. However, after shutting down the cluster I noticed tha

job failure with checkpointing enabled

2016-10-14 Thread robert.lancaster
I recently tried enabling checkpointing in a job (that previously works w/o checkpointing) and received the following failure on job execution: java.io.FileNotFoundException: /tmp/flink-io-202fdf67-3f8c-47dd-8ebc-2265430644ed/a426eb27761575b3b79e464719bba96e16a1869d85bae292a2ef7eb72fa8a14c.0.buf

Re: bucketing in RollingSink

2016-10-13 Thread robert.lancaster
Hi Robert, Thanks for the info on 1.2. I copied over all 4 classes in fs.bucketing. I had to make some changes in my version of BucketingSink. BucketingSink relies on an updated StreamingRuntimeContext that provides a TimeServiceProvider, which the version from 1.1.2 doesn’t. It was easy en

Re: bucketing in RollingSink

2016-10-12 Thread robert.lancaster
Hi Robert, Thanks! I’ll likely pursue option #2 and see if I can copy over the code from org.apache.flink….fs.bucketing. Do you know a general timeline for when 1.2 will be released or perhaps a location where I could follow its progress? Thanks again! From: Robert Metzger Reply-To: "user@

bucketing in RollingSink

2016-10-12 Thread robert.lancaster
Hi Flinksters, At one stage in my data stream, I want to save the stream to a set of rolling files where the file name used (i.e. the bucket) is chosen based on an attribute of each data record. Specifically, I’m using a windowing function to create aggregates of certain metrics and I want to