Re: Sink - Cassandra

2017-05-16 Thread Nick Dimiduk
Yes the approach works fine for my use-case; has been in "production" for quite some time. My implementation has some untested scenarios around job restarting and failures, so of course your mileage may vary. On Mon, May 15, 2017 at 5:58 AM, nragon wrote: > Hi Nick, > > I'm trying to integrate H

Re: External DB as sink - with processing guarantees

2016-03-11 Thread Nick Dimiduk
Pretty much anything you can write to from a Hadoop MapReduce program can be a Flink destination. Just plug in the OutputFormat and go. Re: output semantics, your mileage may vary. Flink should do you fine for at least once. On Friday, March 11, 2016, Josh wrote: > Hi all, > > I want to use an

Log4j configuration on YARN

2016-03-11 Thread Nick Dimiduk
Can anyone tell me where I must place my application-specific log4j.properties to have them honored when running on a YARN cluster? In my application jar doesn't work. In the log4j files under flink/conf doesn't work. My goal is to set the log level for 'com.mycompany' classes used in my flink app

Re: Frequent exceptions killing streaming job

2016-02-26 Thread Nick Dimiduk
eb 25, 2016 at 6:03 PM, Nick Dimiduk > wrote: > >> For what it's worth, I dug into the TM logs and found that this exception >> was not the root cause, merely a symptom of other backpressure building in >> the flow (actually, lock contention in another part of the stack).

Re: Frequent exceptions killing streaming job

2016-02-25 Thread Nick Dimiduk
ch. > > Sadly, it seems that the Kafka 0.9 consumer API does not yet support > requesting the latest offset of a TopicPartition. I'll ask about this on > their ML. > > > > > On Sun, Jan 17, 2016 at 8:28 PM, Nick Dimiduk wrote: > >> On Sunday, January 17, 20

Re: streaming using DeserializationSchema

2016-02-12 Thread Nick Dimiduk
e file as byte arrays somehow to make it work. > What read function did you use? The mapper is not hard to write but the > byte array stuff gives me a headache. > > cheers Martin > > > > > On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk wrote: > >> Hi Martin, >>

Re: streaming using DeserializationSchema

2016-02-12 Thread Nick Dimiduk
Hi Martin, I have the same usecase. I wanted to be able to load from dumps of data in the same format as is on the kafak queue. I created a new application main, call it the "job" instead of the "flow". I refactored my code a bit for building the flow so all that can be reused via factory method.

Re: OutputFormat vs SinkFunction

2016-02-09 Thread Nick Dimiduk
accordingly. It is not like >> OutputFormats are dangerous but all SinkFunctions are failure-proof. >> >> Consolidating the two interfaces would make sense. It might be a bit late >> for the 1.0 release because I see that we would need to find a consensus >> first and t

Re: OutputFormat vs SinkFunction

2016-02-08 Thread Nick Dimiduk
lerance aware) so > that it can easily be used with something akin to OutputFormats. > > What do you think? > > -Aljoscha > > On 08 Feb 2016, at 19:40, Nick Dimiduk > wrote: > > > > On Mon, Feb 8, 2016 at 9:51 AM, Maximilian Michels > wrote: > > Changing the cl

Re: OutputFormat vs SinkFunction

2016-02-08 Thread Nick Dimiduk
are similar. This one is working fine with my test code. https://gist.github.com/ndimiduk/18820fcd78412c6b4fc3 On Mon, Feb 8, 2016 at 6:07 PM, Nick Dimiduk wrote: > >> In my case, I have my application code that is calling addSink, for which >> I'm writing a test that needs to u

Re: OutputFormat vs SinkFunction

2016-02-08 Thread Nick Dimiduk
I. > > If you need the lifecycle methods in streaming, there is > RichSinkFunction, which implements OutputFormat and SinkFunction. In > addition, it gives you access to the RuntimeContext. You can pass this > directly to the "addSink(sinkFunction)" API method. > > Cheers

OutputFormat vs SinkFunction

2016-02-07 Thread Nick Dimiduk
Heya, Is there a plan to consolidate these two interfaces? They appear to provide identical functionality, differing only in lifecycle management. I found myself writing an adaptor so I can consume an OutputFormat where a SinkFunction is expected; there's not much to it. This seems like code that

Re: Mixing Batch & Streaming

2016-01-28 Thread Nick Dimiduk
If the dataset is too large for a file, you can put it behind a service and have your stream operators query the service for enrichment. You can even support updates to that dataset in a style very similar to the "lambda architecture" discussed elsewhere. On Thursday, January 28, 2016, Fabian Hues

Re: Flink 0.10.1 and HBase

2016-01-25 Thread Nick Dimiduk
Hi Christophe, What HBase version are you using? Have you looked at using the shaded client jars? Those should at least isolate HBase/Hadoop's Guava version from that used by your application. -n On Monday, January 25, 2016, Christophe Salperwyck < christophe.salperw...@gmail.com> wrote: > Hi a

Re: Frequent exceptions killing streaming job

2016-01-17 Thread Nick Dimiduk
own from the consumer, not the runtime. > On Sun, Jan 17, 2016 at 12:42 AM, Nick Dimiduk > wrote: > >> This goes back to the idea that streaming applications should never go >> down. I'd much rather consume at max capacity and knowingly drop some >> portion of

Re: Frequent exceptions killing streaming job

2016-01-16 Thread Nick Dimiduk
catch up. >> What would you as an application developer expect to handle the situation? >> >> >> Just out of curiosity: What's the throughput you have on the Kafka topic? >> >> >> On Fri, Jan 15, 2016 at 10:13 PM, Nick Dimiduk >> wrote: >> &

Frequent exceptions killing streaming job

2016-01-15 Thread Nick Dimiduk
Hi folks, I have a streaming job that consumes from of a kafka topic. The topic is pretty active so the local-mode single worker is obviously not able to keep up with the fire-hose. I expect the job to skip records and continue on. However, I'm getting an exception from the LegacyFetcher which kil

Re: Flink v0.10.2

2016-01-14 Thread Nick Dimiduk
I would also find a 0.10.2 release useful :) On Wed, Jan 13, 2016 at 1:24 AM, Welly Tambunan wrote: > Hi Robert, > > We are on deadline for demo stage right now before production for > management so it would be great to have 0.10.2 for stable version within > this week if possible ? > > Cheers >

Re: Sink - Cassandra

2016-01-05 Thread Nick Dimiduk
Hi Sebastian, I've had preliminary success with a steaming job that is Kafka -> Flink -> HBase (actually, Phoenix) using the Hadoop OutputFormat adapter. A little glue was required but it seems to work okay. My guess it it would be the same for Cassandra. Maybe that can get you started? Good luck

Re: Tiny topology shows '0' for all stats.

2015-12-15 Thread Nick Dimiduk
For my own understanding, are you suggesting the FLINK-2944 (or a subtask) is the appropriate place to implement exposure of metrics such as bytes, records in, out of Streaming sources and sinks? On Tue, Dec 15, 2015 at 5:24 AM, Niels Basjes wrote: > Hi, > > @Ufuk: I added the env.disableOperato

Re: Published test artifacts for flink streaming

2015-12-14 Thread Nick Dimiduk
Hi Alex, How's your infra coming along? I'd love to up my unit testing game with your improvements :) -n On Mon, Nov 23, 2015 at 12:20 AM, lofifnc wrote: > Hi Nick, > > This is easily achievable using the framework I provide. > createDataStream(Input input) does actually return a > DataStreamS

Re: Accumulators/Metrics

2015-12-14 Thread Nick Dimiduk
onment, I hope this will be released as open source > anytime soon since the Otto Group believes in open source ;-) If you would > like to know more about it, feel free to ask ;-) > > Best > Christian (Kreutzfeldt) > > > Nick Dimiduk wrote > > I'm much more interest

Re: Using Kryo for serializing lambda closures

2015-12-08 Thread Nick Dimiduk
serialize with Java Serialization, but not > out of the box with Kryo (especially when immutable collections are > involved). Also classes that have no default constructors, but have checks > on invariants, etc can fail with Kryo arbitrarily. > > > > On Tue, Dec 8, 2015 at 8

Re: Using Kryo for serializing lambda closures

2015-12-08 Thread Nick Dimiduk
not a satisfying solution, if you would like to use Java > 8 lambda functions. > > Best, Fabian > > 2015-12-08 19:38 GMT+01:00 Nick Dimiduk : > >> That's what I feared. IMO this is very limiting when mixing in other >> projects where a user does not have cont

Re: Using Kryo for serializing lambda closures

2015-12-08 Thread Nick Dimiduk
Serializable objects. > The serializer registration only applies to the data which is processed by > the Flink job. Thus, for the moment I would try to get rid of the > ColumnInfo object in your closure. > > Cheers, > Till > ​ > > On Mon, Dec 7, 2015 at 10:02 PM, Nick Dimiduk

Using Kryo for serializing lambda closures

2015-12-07 Thread Nick Dimiduk
Hello, I've implemented a (streaming) flow using the Java API and Java8 Lambdas for various map functions. When I try to run the flow, job submission fails because of an unserializable type. This is not a type of data used within the flow, but rather a small collection of objects captured in the c

Re: Using Hadoop Input/Output formats

2015-12-04 Thread Nick Dimiduk
nputFormat[LongWritable, Text]( >>> > new TextInputFormat, >>> > classOf[LongWritable], >>> > classOf[Text], >>> > new JobConf() >>> > )) >>> > >>> > The Java version is very similar. >>>

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Nick Dimiduk
page. > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html > > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk wrote: > > > > Hello, > > > > Is it possible to use existing Hadoop Input and OutputFormats with > Fli

Using Hadoop Input/Output formats

2015-11-24 Thread Nick Dimiduk
Hello, Is it possible to use existing Hadoop Input and OutputFormats with Flink? There's a lot of existing code that conforms to these interfaces, seems a shame to have to re-implement it all. Perhaps some adapter shim..? Thanks, Nick

Re: Published test artifacts for flink streaming

2015-11-20 Thread Nick Dimiduk
Very interesting Alex! One other thing I find useful in building data flows is using "builder" functions that hide the details of wiring up specific plumbing on generic input parameters. For instance a void wireFoo(DataSource source, SinkFunction sink) { ... }. It would be great to have test tools

Re: Published test artifacts for flink streaming

2015-11-18 Thread Nick Dimiduk
ry for the unclear terminology) is a sink with > parallelism 1, so all data is collected by the same function instance. > > Any of this helpful? > > Stephan > > > On Wed, Nov 18, 2015 at 5:13 PM, Nick Dimiduk wrote: > >> Sorry Stephan but I don't follow how

Re: Published test artifacts for flink streaming

2015-11-18 Thread Nick Dimiduk
athers the elements in a list, and the close() function > validates the result. > > Validating the results may involve sorting the list where elements where > gathered (make the order deterministic) or use a hash set if it is only > about distinct count. > > Hope that helps. &g

Re: Error handling

2015-11-16 Thread Nick Dimiduk
n wrote: >> > >> > Hi Nick! >> > >> > The errors outside your UDF (such as network problems) will be handled >> by Flink and cause the job to go into recovery. They should be >> transparently handled. >> > >> > Just make sure you acti

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
All those should apply for streaming too... On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri < vasilikikala...@gmail.com> wrote: > Hi, > > thanks Nick and Ovidiu for the links! > > Just to clarify, we're not looking into creating a generic streaming > benchmark. We have quite limited time and r

Re: Error handling

2015-11-16 Thread Nick Dimiduk
error cases that cause exceptions, apparently, outside of my UDF try block that cause the whole streaming job to terminate. > > On 11 Nov 2015, at 21:49, Nick Dimiduk wrote: > > > > Heya, > > > > I don't see a section in the online manual dedicated to thi

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
Why not use an existing benchmarking tool -- is there one? Perhaps you'd like to build something like YCSB [0] but for streaming workloads? Apache Storm is the OSS framework that's been around the longest. Search for "apache storm benchmark" and you'll get some promising hits. Looks like IBMStream

Re: Published test artifacts for flink streaming

2015-11-12 Thread Nick Dimiduk
On Thu, Nov 5, 2015 at 6:58 PM, Nick Dimiduk wrote: > Thanks Stephan, I'll check that out in the morning. Generally speaking, it > would be great to have some single-jvm example tests for those of us > getting started. Following the example of WindowingIntegrationTest is > mostl

Re: Accumulators/Metrics

2015-11-12 Thread Nick Dimiduk
messages to query the JobManager on > > a job's status and accumulators. I'm wondering if you two could engage > > in any way. > > > > Cheers, > > Max > > > > On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk > wrote: > >> Hello, > >&g

Re: Flink, Kappa and Lambda

2015-11-11 Thread Nick Dimiduk
The first and 3rd points here aren't very fair -- they apply to all data systems. Systems downstream of your database can lose data in the same way; the database retention policy expires old data, downstream fails, and back to the tapes you must go. Likewise with 3, a bug in any ETL system can caus

Error handling

2015-11-11 Thread Nick Dimiduk
Heya, I don't see a section in the online manual dedicated to this topic, so I want to raise the question here: How should errors be handled? Specifically I'm thinking about streaming jobs, which are expected to "never go down". For example, errors can be raised at the point where objects are seri

Accumulators/Metrics

2015-11-11 Thread Nick Dimiduk
Hello, I'm interested in exposing metrics from my UDFs. I see FLINK-1501 exposes task manager metrics via a UI; it would be nice to plug into the same MetricRegistry to register my own (ie, gauges). I don't see this exposed via runtime context. This did lead me to discovering the Accumulators API.

Re: Implementing samza table/stream join

2015-11-11 Thread Nick Dimiduk
repositories for the >> release candidate code: >> >> http://people.apache.org/~mxm/flink-0.10.0-rc8/ >> >> https://repository.apache.org/content/repositories/orgapacheflink-1055/ >> >> Greetings, >> Stephan >> >> >> On Wed, Nov 11, 2015

Re: Implementing samza table/stream join

2015-11-10 Thread Nick Dimiduk
iables > [3] https://gist.github.com/fhueske/4ea5422edb5820915fa4 > > <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#using-the-keyvalue-state-interface> > > > > 2015-11-10 19:02 GMT+01:00 Nick Dimiduk : > >> Hello, >> >> I&

Implementing samza table/stream join

2015-11-10 Thread Nick Dimiduk
Hello, I'm interested in implementing a table/stream join, very similar to what is described in the "Table-stream join" section of the Samza key-value state documentation [0]. Conceptually, this would be an extension of the example provided in the javadocs for RichFunction#open [1], where I have a

Re: Published test artifacts for flink streaming

2015-11-06 Thread Nick Dimiduk
should do the job. Thus, there shouldn’t be a need to dig deeper > than the DataStream for the first version. > > Cheers, > Till > > > On Fri, Nov 6, 2015 at 3:58 AM, Nick Dimiduk wrote: >> >> Thanks Stephan, I'll check that out in the morning. Generally speakin

Re: Published test artifacts for flink streaming

2015-11-05 Thread Nick Dimiduk
an use "DataStreamUtils.collect(stream)", so you need to > stop reading it once you know the stream is exhausted... > > Stephan > > > On Thu, Nov 5, 2015 at 10:38 PM, Nick Dimiduk > wrote: > >> Hi Robert, >> >> It seems "type" was what

Re: Published test artifacts for flink streaming

2015-11-05 Thread Nick Dimiduk
> tag by test-jar: > > >org.apache.flink >flink-streaming-core >${flink.version} >test-jar >test > > > > On Thu, Nov 5, 2015 at 8:18 PM, Nick Dimiduk wrote: >> >> Hello, >> >> I'm attempting integration tests for my

Published test artifacts for flink streaming

2015-11-05 Thread Nick Dimiduk
Hello, I'm attempting integration tests for my streaming flows. I'd like to produce an input stream of java objects and sink the results into a collection for verification via JUnit asserts. StreamExecutionEnvironment provides methods for the former, however, how to achieve the latter is not evide

Re: IT's with POJO's

2015-11-05 Thread Nick Dimiduk
ide >public void run(SourceContext ctx) throws Exception { > int i = 0; > while (running) { > ctx.collect(i++); > } >} > >@Override >public void cancel() { > running = false; >} > }); > > > Let me know if you need

IT's with POJO's

2015-11-04 Thread Nick Dimiduk
Heya, I'm writing my first flink streaming application and have a flow that passes type checking and complies. I've written a simple end-to-end test with junit, using StreamExecutionEnvironment#fromElements() to provide a stream if valid and invalid test objects; the objects are POJOs. It seems I