Re: Kafka state backend?

2016-04-07 Thread Zach Cox
some of the difficulties with this that stem from > the differences in the guarantees that Flink and Samza try to give. > > Cheers, > Aljoscha > > On Tue, 5 Apr 2016 at 22:24 Zach Cox wrote: > >> Hi - as clarified in another thread [1] stateful operators store all of >

Re: Back Pressure details

2016-04-06 Thread Zach Cox
;t think that's the case for your > setup. ;-) > > On Wed, Apr 6, 2016 at 9:27 PM, Zach Cox wrote: > > The new back pressure docs are great, thanks Ufuk! I'm sure those will > help > > others as well. > > > > In the Source => A => B => Sink ex

Re: Checkpoint state stored in backend, and deleting old checkpoint state

2016-04-06 Thread Zach Cox
elated to that is also making the checkpointing asynchronous, so that > normal operations do not see any disruption any more. > > Greetings, > Stephan > > On Tue, Apr 5, 2016 at 10:25 PM, Zach Cox wrote: > >> Thanks for the details Konstantin and Ufuk! >> >> >> On

Re: Back Pressure details

2016-04-06 Thread Zach Cox
). It often works better at the extremes, e.g. when there is no > back pressure at all or very high back pressure. > > – Ufuk > > > On Tue, Apr 5, 2016 at 10:47 PM, Zach Cox wrote: > > Hi - I'm trying to identify bottlenecks in my Flink streaming job, and am > > cur

Back Pressure details

2016-04-05 Thread Zach Cox
Hi - I'm trying to identify bottlenecks in my Flink streaming job, and am curious about the Back Pressure view in the job manager web UI. If there are already docs for Back Pressure please feel free to just point me to those. :) When "Sampling in progress..." is displayed, what exactly is happenin

Re: Checkpoint state stored in backend, and deleting old checkpoint state

2016-04-05 Thread Zach Cox
Thanks for the details Konstantin and Ufuk! On Tue, Apr 5, 2016 at 2:39 PM Konstantin Knauf < konstantin.kn...@tngtech.com> wrote: > Hi Ufuk, > > I thought so, but I am not sure when and where ;) I will let you know, > if I come across it again. > > Cheers, > > Konstantin > > On 05.04.2016 21:10

Kafka state backend?

2016-04-05 Thread Zach Cox
Hi - as clarified in another thread [1] stateful operators store all of their current state in the backend on each checkpoint. Just curious if Kafka topics with log compaction have ever been considered as a possible state backend? Samza [2] uses RocksDB as a local state store, with all writes also

Checkpoint state stored in backend, and deleting old checkpoint state

2016-04-05 Thread Zach Cox
Hi - I have some questions regarding Flink's checkpointing, specifically related to storing state in the backends. So let's say an operator in a streaming job is building up some state. When it receives barriers from all of its input streams, does it store *all* of its state to the backend? I thin

Re: Upserts with Flink-elasticsearch

2016-03-29 Thread Zach Cox
You can just create a new UpdateRequest instance directly using its constructor [1] like this: return new UpdateRequest() .index(index) .type(type) .id(element) .source(json); [1] http://javadoc.kyubu.de/elasticsearch/HEAD/or

Re: Upserts with Flink-elasticsearch

2016-03-28 Thread Zach Cox
Hi Madhukar - with the current Elasticsearch sink in Flink 1.0.0 [1], I don't think an upsert is possible, since IndexRequestBuilder can only return an IndexRequest. In Flink 1.1, the Elasticsearch 2.x sink [2] provides a RequestIndexer [3] that you can pass an UpdateRequest to do an upsert. Than

Accumulators checkpointed?

2016-03-15 Thread Zach Cox
Are accumulators stored in checkpoint state? If a job fails and restarts, are all accumulator values lost, or are they restored from checkpointed state? Thanks, Zach

Re: asm IllegalArgumentException with 1.0.0

2016-03-14 Thread Zach Cox
PM Andrew Whitaker < andrew.whita...@braintreepayments.com> wrote: > We're having the same issue (we also have a dependency on > flink-connector-elasticsearch). It's only happening to us in IntelliJ > though. Is this the case for you as well? > > On Thu, Mar 10, 2

Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Zach Cox
After some poking around I noticed that flink-connector-elasticsearch_2.10-1.0.0.jar contains shaded asm classes. If I remove that dependency from my project then I do not get the IllegalArgumentException. On Thu, Mar 10, 2016 at 11:51 AM Zach Cox wrote: > Here are the jars on the classp

Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Zach Cox
ncy shading changed a bit between RC4 and RC5 - maybe a different > minor ASM version is now included in the "test" scope. > > Can you share the dependencies of the problematic project? > > On Thu, Mar 10, 2016 at 12:26 AM, Zach Cox wrote: > >> I also noticed when I t

Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Zach Cox
>>> > >>> – Ufuk > >>> > >>> > >>> On Thu, Mar 10, 2016 at 10:13 AM, Aljoscha Krettek < > aljos...@apache.org> wrote: > >>>> Hi, > >>>> you’re right, this should be changed to > “setStreamTimeCharacteristic(Ev

Re: asm IllegalArgumentException with 1.0.0

2016-03-09 Thread Zach Cox
provided on the Flink task managers? -Zach On Wed, Mar 9, 2016 at 5:16 PM Zach Cox wrote: > Hi - after upgrading to 1.0.0, I'm getting this exception now in a unit > test: > >IllegalArgumentException: (null:-1) > org.apache.flink.shaded.org.objectweb.asm.ClassVisit

asm IllegalArgumentException with 1.0.0

2016-03-09 Thread Zach Cox
Hi - after upgrading to 1.0.0, I'm getting this exception now in a unit test: IllegalArgumentException: (null:-1) org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown Source) org.apache.flink.shaded.org.objectweb.asm.ClassVisitor.(Unknown Source) org.apache.flink.api.scala.InnerClo

ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-09 Thread Zach Cox
Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer has an enableTimestamps() method. Do we just not need to call that at all now? The docs still say to call it [1] - do they just need to be updated? Thanks, Zach [1] https://ci.apache.org/projects/flink/flink-docs-release-1

Re: InetSocketAddress is not serializable

2016-03-04 Thread Zach Cox
java.net.InetSocketAddress is Serializable [1] and implementations of java.util.List (e.g. java.util.ArrayList [2]) are also usually Serializable. Unfortunately, for some reason InetSocketTransportAddress [3] is no longer Serializable in Elasticsearch 2.x. :( -Zach [1] http://docs.oracle.com/jav

Re: InetSocketAddress is not serializable

2016-03-04 Thread Zach Cox
I ran into the same issue upgrading to Elasticsearch 2, here's how I solved it: https://gist.github.com/zcox/59e486be7aeeca381be0#file-elasticsearch2sink-java-L110 -Zach On Fri, Mar 4, 2016 at 7:30 AM HungChang wrote: > Hi, > > I'm building the connector for ElasticSearch2. One main issue for

Re: Watermarks with repartition

2016-02-26 Thread Zach Cox
ice write up. Would you maybe be interested in integrating > this as some sort of internal documentation in Flink? So that prospective > contributors can get to know this stuff. > > Cheers, > Aljoscha > > On 26 Feb 2016, at 18:32, Zach Cox wrote: > > > > Thanks for t

Re: Watermarks with repartition

2016-02-26 Thread Zach Cox
t; On 26 Feb 2016, at 00:19, Zach Cox wrote: > > > > I think I found the information I was looking for: > > > > RecordWriter broadcasts each emitted watermark to all outgoing channels > [1]. > > > > StreamInputProcessor tracks the max watermark received on

Re: Watermarks with repartition

2016-02-25 Thread Zach Cox
.java#L103 [2] https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/StreamInputProcessor.java#L147 On Thu, Feb 25, 2016 at 3:31 PM Zach Cox wrote: > Hi - how are watermarks passed along parallel tasks where there is a > repartition?

Watermarks with repartition

2016-02-25 Thread Zach Cox
Hi - how are watermarks passed along parallel tasks where there is a repartition? For example, say I have a simple streaming job computing hourly counts per key, something like this: val environment = StreamExecutionEnvironment.getExecutionEnvironment environment.setParallelism(2) environment.setS

Re:

2016-02-23 Thread Zach Cox
t will be very helpful to > include the following: > - exact Chrome and OS X version > - the exectuion plan as JSON (via env.getExecutionPlan()) > - screenshot > > Thanks! > > – Ufuk > > > On Tue, Feb 23, 2016 at 3:46 PM, Zach Cox wrote: > > Hi - I typically

[no subject]

2016-02-23 Thread Zach Cox
Hi - I typically use the Chrome browser on OS X, and notice that with 1.0.0-rc0 the job graph visualization displays the nodes in the graph, but not any of the edges. Also the graph does not move around when dragging the mouse. The job graph visualization seems to work perfectly in Safari and Fire

Re: Using numberOfTaskSlots to control parallelism

2016-02-20 Thread Zach Cox
Thanks for the input Aljoscha and Ufuk! I will try out the #2 approach and report back. Thanks, Zach On Sat, Feb 20, 2016 at 7:26 AM Ufuk Celebi wrote: > On Sat, Feb 20, 2016 at 10:12 AM, Aljoscha Krettek > wrote: > > IMHO the only change for 2) is that you possibly get better machine > utili

Using numberOfTaskSlots to control parallelism

2016-02-19 Thread Zach Cox
What would the differences be between these scenarios? 1) one task manager with numberOfTaskSlots=1 and one job with parallelism=1 2) one task manager with numberOfTaskSlots=10 and one job with parallelism=10 In both cases all of the job's tasks get executed within the one task manager's jvm. Ar

Re: Changing parallelism

2016-02-18 Thread Zach Cox
ond (low parallelism) operator in the intermediate job. > > > Greetings, > Stephan > > > On Thu, Feb 18, 2016 at 9:24 AM, Ufuk Celebi wrote: > >> Hey Zach! >> >> Sounds like a great use case. >> >> On Wed, Feb 17, 2016 at 3:16 PM, Zach Cox wrote: &g

Re: Availability for the ElasticSearch 2 streaming connector

2016-02-18 Thread Zach Cox
Suneel Marthi wrote: > Thanks Zach, I have a few minor changes too locally; I'll push a PR out > tomorrow that has ur changes too. > > On Wed, Feb 17, 2016 at 5:13 PM, Zach Cox wrote: > >> I recently did exactly what Robert described: I copied the code from this >>

Re: Availability for the ElasticSearch 2 streaming connector

2016-02-17 Thread Zach Cox
I recently did exactly what Robert described: I copied the code from this (closed) PR https://github.com/apache/flink/pull/1479, modified it a bit, and just included it in my own project that uses the Elasticsearch 2 java api. Seems to work well. Here are the files so you can do the same: https://

Re: 1.0-SNAPSHOT downloads

2016-02-17 Thread Zach Cox
ds > > Cheers, > Max > > On Mon, Feb 15, 2016 at 6:29 PM, Zach Cox wrote: > > Hi - are there binary downloads of the Flink 1.0-SNAPSHOT tarballs, like > > there are for 0.10.2 [1]? I'm testing out an application built against > the > > 1.0-SNAPSHOT dependencies

Changing parallelism

2016-02-17 Thread Zach Cox
Hi - we are building a stateful Flink streaming job that will run indefinitely. One part of the job builds up state per key in a global window that will need to exist for a very long time. We will definitely be using the savepoints to restore job state after new code deploys. We were planning to b

1.0-SNAPSHOT downloads

2016-02-15 Thread Zach Cox
Hi - are there binary downloads of the Flink 1.0-SNAPSHOT tarballs, like there are for 0.10.2 [1]? I'm testing out an application built against the 1.0-SNAPSHOT dependencies from Maven central, and want to make sure I run them on a Flink 1.0-SNAPSHOT cluster that matches up with those jars. Thanks