Re: Error with HBase and Solr job

2016-12-23 Thread Flavio Pompermaier
Found the problem...I compiled my flink dist using maven 3.3.9 but I didn't rerun install from dist directory as specified at https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/building.html#dependency-shading . Why is it necessary to recompile flink-dist? Best, Flavio On Fri, Dec

Re: Streaming pipeline failing to complete checkpoints at scale

2016-12-23 Thread Cliff Resnick
Thanks Stephan, those are great suggestions and I look forward to working through them. The Thread.yield() in the tight loop looks particularly interesting. In a similar vein, I have noticed that the ContinuousFIleReader sometimes at load lags behind the ContinuousFileWatcher and seems to stay that

Re: Error with HBase and Solr job

2016-12-23 Thread Flavio Pompermaier
Ok thans Stephan! On 23 Dec 2016 21:16, "Stephan Ewen" wrote: > I would just google for Guava conflict and "shading". Its a fairly common > problem with Guava, there are quite a few guides out there how to address > that. > > On Fri, Dec 23, 2016 at 9:12 PM, Flavio Pompermaier > wrote: > >> Wha

Re: Error with HBase and Solr job

2016-12-23 Thread Stephan Ewen
I would just google for Guava conflict and "shading". Its a fairly common problem with Guava, there are quite a few guides out there how to address that. On Fri, Dec 23, 2016 at 9:12 PM, Flavio Pompermaier wrote: > What do you meab exactly...? do you have a snippet of how I should edit > the pom

Re: Error with HBase and Solr job

2016-12-23 Thread Flavio Pompermaier
What do you meab exactly...? do you have a snippet of how I should edit the pom? On 23 Dec 2016 19:31, "Stephan Ewen" wrote: > You have the classical guava version conflict. Flink itself shades Guava > away, but there may be multiple conflicting dependencies in your case > (HBase / Solr). > > I

Re: state size in relation to cluster size and processing speed

2016-12-23 Thread Seth Wiesman
Watermarks are generated using the PeriodicWatermarkAssigner using a timestamp field from within the records. We are processing log data from an S3 bucket and logs are always processed in chronological order using a custom ContinuousFileMonitoringFunction but the standard ContinousFileReaderOper

Re: Streaming pipeline failing to complete checkpoints at scale

2016-12-23 Thread Stephan Ewen
Hi Cliff! Sorry to hear that you are running into so much trouble with the checkpointing. Here are a few things we found helping another user that was running into a similar issue: (1) Make sure you use a rather new version of the 1.2 branch (there have been some important fixes added to the net

Dynamic Scaling

2016-12-23 Thread Govindarajan Srinivasaraghavan
Hi, We have a computation heavy streaming flink job which will be processing around 100 million message at peak time and around 1 million messages in non peak time. We need the capability to dynamically scale so that the computation operator can scale up and down during high or low work loads resp

Re: Flink rolling upgrade support

2016-12-23 Thread Gyula Fóra
Hi! I think in many cases it is more convenient to have a savepoint-and-stop operation to use for upgrading the cluster/job but it should not be required. If the output of your job needs to be exactly once and you don't have an external deduplication mechanism than even the current fault-tolerance

Re: Flink rolling upgrade support

2016-12-23 Thread Aljoscha Krettek
Hi Greg, yes certainly, there are more requirements to this than the quick sketch I gave above and that seems to be one of them. Cheers, Aljoscha On Thu, 22 Dec 2016 at 17:54 Greg Hogan wrote: > Aljoscha, > > For the second, possible solution is there also a requirement that the > data sinks ha

Re: RocksDB Windows / Flink 1.1.4 - requires Hadoop?

2016-12-23 Thread Torok, David
Thanks for the prompt reply Stephan It is working perfectly now! Best Regards, Dave >Hi Dave! > >In 1.1.x, the default mode of RocksDB uses some Hadoop utilities, regardless >of the filesystem used. I think that was a design mistake and we rectified >that in the upcoming 1.2.x > >For 1.1.4, I

Re: state size in relation to cluster size and processing speed

2016-12-23 Thread Aljoscha Krettek
Hi, how are you generating your watermarks? Could it be that they advance faster when the job is processing more data? Cheers, Aljoscha On Fri, 16 Dec 2016 at 21:01 Seth Wiesman wrote: > Hi, > > > > I’ve noticed something peculiar about the relationship between state size > and cluster size and

Re: Events are assigned to wrong window

2016-12-23 Thread Aljoscha Krettek
Hi, could you please share code (and example data) for producing this output. I'd like to have a look. Cheers, Aljoscha On Wed, 21 Dec 2016 at 16:29 Nico wrote: > Hi @all, > > I am using a TumblingEventTimeWindows.of(Time.seconds(20)) for testing. > During this I found a strange behavior (at le

Re: Error with HBase and Solr job

2016-12-23 Thread Stephan Ewen
You have the classical guava version conflict. Flink itself shades Guava away, but there may be multiple conflicting dependencies in your case (HBase / Solr). I would try to see which of the tool/libraries (HBase, Solr, ...) depend on Guava and create a shaded version of one of them On Fri, Dec 2

Re: RocksDB Windows / Flink 1.1.4 - requires Hadoop?

2016-12-23 Thread Stephan Ewen
Hi Dave! In 1.1.x, the default mode of RocksDB uses some Hadoop utilities, regardless of the filesystem used. I think that was a design mistake and we rectified that in the upcoming 1.2.x For 1.1.4, I would use "enableFullyAsyncSnapchots()" on the RocksDB state backend - that mode should also not

RocksDB Windows / Flink 1.1.4 - requires Hadoop?

2016-12-23 Thread Torok, David
Hi, I tried to use the newly-supported RocksDB backend (Flink 1.1.4) on my Windows laptop. However, it is not creating any state and is throwing NPE while trying to call: org.apache.flink.streaming.util.HDFSCopyFromLocal$1.run(HDFSCopyFromLocal.java:47) which eventually gets down to no

Error with HBase and Solr job

2016-12-23 Thread Flavio Pompermaier
Hi to all, I have a source HBase table and I have to write to a Solr index. Unfortunately when I try to run the program on the cluster (Flink 1.1.1) I think I have some problem with dependencies. Can someone suggest me some fix? This is the error I have just after launching the job: Caused by: o

[ANNOUNCE] Flink 1.1.4 Released

2016-12-23 Thread Ufuk Celebi
The Flink PMC is pleased to announce the availability of Flink 1.1.4. The official release announcement: https://flink.apache.org/news/2016/12/21/release-1.1.4.html Release binaries: http://apache.lauf-forum.at/flink/flink-1.1.4 Please update your Maven dependencies to the new 1.1.4 version and

Streaming pipeline failing to complete checkpoints at scale

2016-12-23 Thread Cliff Resnick
We are running a DataStream pipeline using Exactly Once/Event Time semantics on 1.2-SNAPSHOT. The pipeline sources from S3 using the ContinuousFileReaderOperator. We use a custom version of the ContinuousFileMonitoringFunction since our source directory changes over time. The pipeline transforms an

Finding the address of the jobmanager cluster leader

2016-12-23 Thread Jim Raney
Hello all, Is there any simple way to get the current address of the jobmanager cluster leader from any of the jobmanagers? I know it gives you a redirect on the webui, but for monitoring and other purposes I'd like to be able to find the address of the current jobmanager leader. I've spent