Yes the approach works fine for my use-case; has been in "production" for
quite some time. My implementation has some untested scenarios around job
restarting and failures, so of course your mileage may vary.
On Mon, May 15, 2017 at 5:58 AM, nragon wrote:
> Hi Nick,
>
> I'm trying to integrate H
Pretty much anything you can write to from a Hadoop MapReduce program can
be a Flink destination. Just plug in the OutputFormat and go.
Re: output semantics, your mileage may vary. Flink should do you fine for
at least once.
On Friday, March 11, 2016, Josh wrote:
> Hi all,
>
> I want to use an
Can anyone tell me where I must place my application-specific
log4j.properties to have them honored when running on a YARN cluster? In my
application jar doesn't work. In the log4j files under flink/conf doesn't
work.
My goal is to set the log level for 'com.mycompany' classes used in my
flink app
eb 25, 2016 at 6:03 PM, Nick Dimiduk > wrote:
>
>> For what it's worth, I dug into the TM logs and found that this exception
>> was not the root cause, merely a symptom of other backpressure building in
>> the flow (actually, lock contention in another part of the stack).
ch.
>
> Sadly, it seems that the Kafka 0.9 consumer API does not yet support
> requesting the latest offset of a TopicPartition. I'll ask about this on
> their ML.
>
>
>
>
> On Sun, Jan 17, 2016 at 8:28 PM, Nick Dimiduk wrote:
>
>> On Sunday, January 17, 20
e file as byte arrays somehow to make it work.
> What read function did you use? The mapper is not hard to write but the
> byte array stuff gives me a headache.
>
> cheers Martin
>
>
>
>
> On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk wrote:
>
>> Hi Martin,
>>
Hi Martin,
I have the same usecase. I wanted to be able to load from dumps of data in
the same format as is on the kafak queue. I created a new application main,
call it the "job" instead of the "flow". I refactored my code a bit for
building the flow so all that can be reused via factory method.
accordingly. It is not like
>> OutputFormats are dangerous but all SinkFunctions are failure-proof.
>>
>> Consolidating the two interfaces would make sense. It might be a bit late
>> for the 1.0 release because I see that we would need to find a consensus
>> first and t
lerance aware) so
> that it can easily be used with something akin to OutputFormats.
>
> What do you think?
>
> -Aljoscha
> > On 08 Feb 2016, at 19:40, Nick Dimiduk > wrote:
> >
> > On Mon, Feb 8, 2016 at 9:51 AM, Maximilian Michels > wrote:
> > Changing the cl
are similar. This one is working fine with my test code.
https://gist.github.com/ndimiduk/18820fcd78412c6b4fc3
On Mon, Feb 8, 2016 at 6:07 PM, Nick Dimiduk wrote:
>
>> In my case, I have my application code that is calling addSink, for which
>> I'm writing a test that needs to u
I.
>
> If you need the lifecycle methods in streaming, there is
> RichSinkFunction, which implements OutputFormat and SinkFunction. In
> addition, it gives you access to the RuntimeContext. You can pass this
> directly to the "addSink(sinkFunction)" API method.
>
> Cheers
Heya,
Is there a plan to consolidate these two interfaces? They appear to provide
identical functionality, differing only in lifecycle management. I found
myself writing an adaptor so I can consume an OutputFormat where a
SinkFunction is expected; there's not much to it. This seems like code that
If the dataset is too large for a file, you can put it behind a service and
have your stream operators query the service for enrichment. You can even
support updates to that dataset in a style very similar to the "lambda
architecture" discussed elsewhere.
On Thursday, January 28, 2016, Fabian Hues
Hi Christophe,
What HBase version are you using? Have you looked at using the shaded
client jars? Those should at least isolate HBase/Hadoop's Guava version
from that used by your application.
-n
On Monday, January 25, 2016, Christophe Salperwyck <
christophe.salperw...@gmail.com> wrote:
> Hi a
own from the
consumer, not the runtime.
> On Sun, Jan 17, 2016 at 12:42 AM, Nick Dimiduk > wrote:
>
>> This goes back to the idea that streaming applications should never go
>> down. I'd much rather consume at max capacity and knowingly drop some
>> portion of
catch up.
>> What would you as an application developer expect to handle the situation?
>>
>>
>> Just out of curiosity: What's the throughput you have on the Kafka topic?
>>
>>
>> On Fri, Jan 15, 2016 at 10:13 PM, Nick Dimiduk
>> wrote:
>>
&
Hi folks,
I have a streaming job that consumes from of a kafka topic. The topic is
pretty active so the local-mode single worker is obviously not able to keep
up with the fire-hose. I expect the job to skip records and continue on.
However, I'm getting an exception from the LegacyFetcher which kil
I would also find a 0.10.2 release useful :)
On Wed, Jan 13, 2016 at 1:24 AM, Welly Tambunan wrote:
> Hi Robert,
>
> We are on deadline for demo stage right now before production for
> management so it would be great to have 0.10.2 for stable version within
> this week if possible ?
>
> Cheers
>
Hi Sebastian,
I've had preliminary success with a steaming job that is Kafka -> Flink ->
HBase (actually, Phoenix) using the Hadoop OutputFormat adapter. A little
glue was required but it seems to work okay. My guess it it would be the
same for Cassandra. Maybe that can get you started?
Good luck
For my own understanding, are you suggesting the FLINK-2944 (or a subtask)
is the appropriate place to implement exposure of metrics such as bytes,
records in, out of Streaming sources and sinks?
On Tue, Dec 15, 2015 at 5:24 AM, Niels Basjes wrote:
> Hi,
>
> @Ufuk: I added the env.disableOperato
Hi Alex,
How's your infra coming along? I'd love to up my unit testing game with
your improvements :)
-n
On Mon, Nov 23, 2015 at 12:20 AM, lofifnc wrote:
> Hi Nick,
>
> This is easily achievable using the framework I provide.
> createDataStream(Input input) does actually return a
> DataStreamS
onment, I hope this will be released as open source
> anytime soon since the Otto Group believes in open source ;-) If you would
> like to know more about it, feel free to ask ;-)
>
> Best
> Christian (Kreutzfeldt)
>
>
> Nick Dimiduk wrote
> > I'm much more interest
serialize with Java Serialization, but not
> out of the box with Kryo (especially when immutable collections are
> involved). Also classes that have no default constructors, but have checks
> on invariants, etc can fail with Kryo arbitrarily.
>
>
>
> On Tue, Dec 8, 2015 at 8
not a satisfying solution, if you would like to use Java
> 8 lambda functions.
>
> Best, Fabian
>
> 2015-12-08 19:38 GMT+01:00 Nick Dimiduk :
>
>> That's what I feared. IMO this is very limiting when mixing in other
>> projects where a user does not have cont
Serializable objects.
> The serializer registration only applies to the data which is processed by
> the Flink job. Thus, for the moment I would try to get rid of the
> ColumnInfo object in your closure.
>
> Cheers,
> Till
>
>
> On Mon, Dec 7, 2015 at 10:02 PM, Nick Dimiduk
Hello,
I've implemented a (streaming) flow using the Java API and Java8 Lambdas
for various map functions. When I try to run the flow, job submission fails
because of an unserializable type. This is not a type of data used within
the flow, but rather a small collection of objects captured in the c
nputFormat[LongWritable, Text](
>>> > new TextInputFormat,
>>> > classOf[LongWritable],
>>> > classOf[Text],
>>> > new JobConf()
>>> > ))
>>> >
>>> > The Java version is very similar.
>>>
page.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html
>
> > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk wrote:
> >
> > Hello,
> >
> > Is it possible to use existing Hadoop Input and OutputFormats with
> Fli
Hello,
Is it possible to use existing Hadoop Input and OutputFormats with Flink?
There's a lot of existing code that conforms to these interfaces, seems a
shame to have to re-implement it all. Perhaps some adapter shim..?
Thanks,
Nick
Very interesting Alex!
One other thing I find useful in building data flows is using "builder"
functions that hide the details of wiring up specific plumbing on generic
input parameters. For instance a void wireFoo(DataSource source,
SinkFunction sink) { ... }. It would be great to have test tools
ry for the unclear terminology) is a sink with
> parallelism 1, so all data is collected by the same function instance.
>
> Any of this helpful?
>
> Stephan
>
>
> On Wed, Nov 18, 2015 at 5:13 PM, Nick Dimiduk wrote:
>
>> Sorry Stephan but I don't follow how
athers the elements in a list, and the close() function
> validates the result.
>
> Validating the results may involve sorting the list where elements where
> gathered (make the order deterministic) or use a hash set if it is only
> about distinct count.
>
> Hope that helps.
&g
n wrote:
>> >
>> > Hi Nick!
>> >
>> > The errors outside your UDF (such as network problems) will be handled
>> by Flink and cause the job to go into recovery. They should be
>> transparently handled.
>> >
>> > Just make sure you acti
All those should apply for streaming too...
On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri <
vasilikikala...@gmail.com> wrote:
> Hi,
>
> thanks Nick and Ovidiu for the links!
>
> Just to clarify, we're not looking into creating a generic streaming
> benchmark. We have quite limited time and r
error cases that cause exceptions,
apparently, outside of my UDF try block that cause the whole streaming job
to terminate.
> > On 11 Nov 2015, at 21:49, Nick Dimiduk wrote:
> >
> > Heya,
> >
> > I don't see a section in the online manual dedicated to thi
Why not use an existing benchmarking tool -- is there one? Perhaps you'd
like to build something like YCSB [0] but for streaming workloads?
Apache Storm is the OSS framework that's been around the longest. Search
for "apache storm benchmark" and you'll get some promising hits. Looks like
IBMStream
On Thu, Nov 5, 2015 at 6:58 PM, Nick Dimiduk wrote:
> Thanks Stephan, I'll check that out in the morning. Generally speaking, it
> would be great to have some single-jvm example tests for those of us
> getting started. Following the example of WindowingIntegrationTest is
> mostl
messages to query the JobManager on
> > a job's status and accumulators. I'm wondering if you two could engage
> > in any way.
> >
> > Cheers,
> > Max
> >
> > On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk
> wrote:
> >> Hello,
> >&g
The first and 3rd points here aren't very fair -- they apply to all data
systems. Systems downstream of your database can lose data in the same way;
the database retention policy expires old data, downstream fails, and back
to the tapes you must go. Likewise with 3, a bug in any ETL system can
caus
Heya,
I don't see a section in the online manual dedicated to this topic, so I
want to raise the question here: How should errors be handled? Specifically
I'm thinking about streaming jobs, which are expected to "never go down".
For example, errors can be raised at the point where objects are seri
Hello,
I'm interested in exposing metrics from my UDFs. I see FLINK-1501 exposes
task manager metrics via a UI; it would be nice to plug into the same
MetricRegistry to register my own (ie, gauges). I don't see this exposed
via runtime context. This did lead me to discovering the Accumulators API.
repositories for the
>> release candidate code:
>>
>> http://people.apache.org/~mxm/flink-0.10.0-rc8/
>>
>> https://repository.apache.org/content/repositories/orgapacheflink-1055/
>>
>> Greetings,
>> Stephan
>>
>>
>> On Wed, Nov 11, 2015
iables
> [3] https://gist.github.com/fhueske/4ea5422edb5820915fa4
>
> <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#using-the-keyvalue-state-interface>
>
>
>
> 2015-11-10 19:02 GMT+01:00 Nick Dimiduk :
>
>> Hello,
>>
>> I&
Hello,
I'm interested in implementing a table/stream join, very similar to what is
described in the "Table-stream join" section of the Samza key-value state
documentation [0]. Conceptually, this would be an extension of the example
provided in the javadocs for RichFunction#open [1], where I have a
should do the job. Thus, there shouldn’t be a need to dig deeper
> than the DataStream for the first version.
>
> Cheers,
> Till
>
>
> On Fri, Nov 6, 2015 at 3:58 AM, Nick Dimiduk wrote:
>>
>> Thanks Stephan, I'll check that out in the morning. Generally speakin
an use "DataStreamUtils.collect(stream)", so you need to
> stop reading it once you know the stream is exhausted...
>
> Stephan
>
>
> On Thu, Nov 5, 2015 at 10:38 PM, Nick Dimiduk > wrote:
>
>> Hi Robert,
>>
>> It seems "type" was what
> tag by test-jar:
>
>
>org.apache.flink
>flink-streaming-core
>${flink.version}
>test-jar
>test
>
>
>
> On Thu, Nov 5, 2015 at 8:18 PM, Nick Dimiduk wrote:
>>
>> Hello,
>>
>> I'm attempting integration tests for my
Hello,
I'm attempting integration tests for my streaming flows. I'd like to
produce an input stream of java objects and sink the results into a
collection for verification via JUnit asserts.
StreamExecutionEnvironment provides methods for the former, however,
how to achieve the latter is not evide
ide
>public void run(SourceContext ctx) throws Exception {
> int i = 0;
> while (running) {
> ctx.collect(i++);
> }
>}
>
>@Override
>public void cancel() {
> running = false;
>}
> });
>
>
> Let me know if you need
Heya,
I'm writing my first flink streaming application and have a flow that
passes type checking and complies. I've written a simple end-to-end
test with junit, using StreamExecutionEnvironment#fromElements() to
provide a stream if valid and invalid test objects; the objects are
POJOs.
It seems I
50 matches
Mail list logo