d almost
>> completely
>> > > lose insight into what was going on if there was a slow down.
>> > > 3. The current approach we are using for creating dynamic jobs is
>> > > building a common jar and then starting it with the
>> configuration
>> > > data for the individual job. Does this sound reasonable?
>> > >
>> > >
>> > > If any of these questions are answered elsewhere I apologize. I
>> > couldn't
>> > > find any of this being discussed elsewhere.
>> > >
>> > > Thanks for your help.
>> > >
>> > > David
>> >
>> > --
>> > Konstantin Knauf * konstantin.kn...@tngtech.com
>> > <mailto:konstantin.kn...@tngtech.com> * +49-174-3413182
>> >
>> > TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> > Sitz: Unterföhring * Amtsgericht München * HRB 135082
>> >
>> >
>>
>> --
>> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
en I use start-cluster.sh in git cmd, it asks me to type in
>> password for other machines. After I did it, it showed nohup:command not
>> found.
>>
>>
>> I am just wondering if it is possible to set up clusters on Windows
>> machines? If yes, can anyone point me
rom both streams?
>
> 3. Is state are ‘operator’ scoped?
>
> - source -> KeyBy -> mapWithStateX_1 -> keyBy -> mapWithStateX_2
>
> Assume both map try to use state with same name
> (ValueStateDescriptor(“StateX”…)). But despite of that, they will have
> differen
aggregating the logs
> for
> one hour before starting to process?
>
> There is no direct way to turn a DataStream into a DataSet. I addressed
the point about doing the computation incrementally above, though. You do
this with a ReduceFunction. But again, there doesn't exist an
me. As the input streams update their event time, so does the operator."
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/event_time.html#watermarks-in-parallel-streams
>
> Thanks for your help,
> Chris
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
eto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>
--
Jamie Grier
chedEnd;
> }
> }
>
> This class returns the content of the whole file as a string.
>
> Is this the right approach?
> It seems to work when run locally with local files but I wonder if it would
> run into problems when tested in a cluster.
>
> Thanks in advance.
&g
Hi all,
I'm looking to learn if/how others are running Flink jobs in such a way
that they can survive failure of a single Amazon AWS availability zone.
If you're currently doing this I would love a reply detailing your setup.
Thanks!
-Jamie
Hey Cliff, can you provide the stack trace of the issue you were seeing?
We recently ran into a similar issue that we're still debugging. Did it
look like this:
java.lang.IllegalStateException: Could not initialize operator state
> backend.
> at
> org.apache.flink.streaming.api.operators.Abstract
Anybody else seen this? I'm running both the JM and TM on the same host in
this setup. This was working fine w/ Flink 1.5.3.
On the TaskManager:
00:31:30.268 INFO o.a.f.r.t.TaskExecutor - Could not resolve
ResourceManager address akka.tcp://flink@localhost:6123/user/resourcemanager,
retrying i
Anybody else seen this and know the solution? We're dead in the water with
Flink 1.5.4.
On Sun, Sep 23, 2018 at 11:46 PM alex wrote:
> We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We
> have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers
> somet
interpreted
as the hostname for the jobmanager to bind to.
The solution was just to remove `cluster` from that command.
On Tue, Sep 25, 2018 at 10:15 AM Jamie Grier wrote:
> Anybody else seen this and know the solution? We're dead in the water
> with Flink 1.5.4.
>
> On Sun, Sep 23,
Hi Vishal,
No, there is no way to do this currently.
On Wed, Nov 21, 2018 at 10:22 AM Vishal Santoshi
wrote:
> Any one ?
>
> On Tue, Nov 20, 2018 at 12:48 PM Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> Is it possible to have checkpointing but reset the kafka offsets to
>> latest
Hi Chang,
The partitioning steps, like keyBy() are not operators. In general you can
let Flink's fluent-style API tell you the answer. If you can call .uid()
in the API and it compiles then the thing just before that is an operator ;)
-Jamie
On Wed, Nov 21, 2018 at 5:59 AM Chang Liu wrote:
Hi Avi,
The typical approach would be as you've described in #1. #2 is not
necessary -- #1 is already doing basically exactly that.
-Jamie
On Wed, Nov 21, 2018 at 3:36 AM Avi Levi wrote:
> Hi ,
> I am very new to flink so please be gentle :)
>
> *The challenge:*
> I have a road sensor that s
What you're describing is not possible. There is no runtime context or
metrics you can use at that point.
The best you can probably do (at least for start time) is just keep a flag
in your function and log a metric once and only once when it first starts
executing.
On Wed, Nov 21, 2018 at 5:18 A
Flink is designed such that local state is backed up to a highly available
system such as HDFS or S3. When a TaskManager fails state is recovered
from there.
I suggest reading this:
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html
On Fri, Jan 11, 2019 at
There are a lot of different ways to deploy Flink. It would be easier to
answer your question with a little more context about your use case but in
general I would advocate the following:
1) Don't run a "permanent" Flink cluster and then submit jobs to it.
Instead what you should do is run an "ep
e, Jan 15, 2019 at 6:27 AM bastien dine wrote:
> Hello Jamie,
>
> Does #1 apply to batch jobs too ?
>
> Regards,
>
> --
>
> Bastien DINE
> Data Architect / Software Engineer / Sysadmin
> bastiendine.io
>
>
> Le lun. 14 janv. 2019 à 20:39,
Avi,
The stack trace there is pretty much a red herring. That happens whenever
a job shuts down for any reason and is not a root cause. To diagnose this
you will want to look at all the TaskManager logs as well as the JobManager
logs. If you have a way to easily grep these (all of them at once)
+1 to what Zhenghua said. You're abusing the metrics system I think.
Rather just do a stream.keyBy().sum() and then write a Sink to do something
with the data -- for example push it to your metrics system if you wish.
However, from experience, many metrics systems don't like that sort of
thing.
I don't think I understood all of your question but with regard to the
watermarking and keys.. You are correct that watermarking (event time
advancement) is not per key. Event-time is a local property of each Task
in an executing Flink job. It has nothing to do with keys. It has only to
do with
If I'm understanding you correctly you're just trying to do some data
reduction so that you write data for each key once every five minutes
rather than for every CDC update.. Is that correct? You also want to keep
the state for most recent key you've ever seen so you don't apply writes
out of ord
I'm not sure if this is required. It's quite convenient to be able to just
grab a single tarball and you've got everything you need.
I just did this for the latest binary release and it was 273MB and took
about 25 seconds to download. Of course I know connection speeds vary
quite a bit but I don
e. Timestams are from one months ago. And I’m searching a way on how
> to dump this data into a working flink application which already processed
> this data (watermarks are far away from those dates).
>
> On Fri 18. Jan 2019 at 03:22, Jamie Grier wrote:
>
>> I don't think I
Oh sorry.. Logically, since the ContinuousProcessingTimeTrigger never
PURGES but only FIRES what I said is semantically true. The window
contents are never cleared.
What I missed is that in this case since you're using a function that
incrementally reduces on the fly rather than processing all t
Sorry my earlier comment should read: "It would just read all the files in
order and NOT worry about which data rows are in which files"
On Fri, Jan 18, 2019 at 10:00 AM Jamie Grier wrote:
> Hmm.. I would have to look into the code for the StreamingFileSink more
> closely t
gt; Keep in mind that every snapshot is written to a different folder. And
> they are supposed to represent the state of the whole table at a point in
> time.
>
> On Fri, Jan 18, 2019, 8:26 AM Jamie Grier
>> Oh sorry.. Logically, since the ContinuousProcessingTimeTrigger neve
Run each job individually as described here:
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn
Yes they will run concurrently and be completely isolated from each other.
-Jamie
On Sun, Jan 27, 2019 at 6:08 AM Eran Twili
wr
Vishal, that answer to your question about IngestionTime is "no".
Ingestion time in this context means the time the data was read by Flink
not the time it was written to Kafka.
To get the effect you're looking for you have to set
TimeCharacteristic.EventTime and follow the instructions here:
https
This is awesome, Stephan! Thanks for doing this.
-Jamie
On Tue, Feb 26, 2019 at 9:29 AM Stephan Ewen wrote:
> Here is the pull request with a draft of the roadmap:
> https://github.com/apache/flink-web/pull/178
>
> Best,
> Stephan
>
> On Fri, Feb 22, 2019 at 5:18 AM Hequn Cheng wrote:
>
>> H
database
> h2
> runtime
>
>
> org.apache.activemq
> activemq-broker
>
>
> org.apache.activemq
> activemq-pool
>
>
>
> org.apache.flink
> flink-streaming-java_2.11
> ${flink.version}
>
.api.java.tuple.Tuple]*
>
> Is this a Scala issue? Should I switch over to Java?
>
>
> Thanks!
> Eamon
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
user data for training
> models and for aggregating real time user events into features for model
> execution.
>
> thanks, Mindis
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
Pike St, Suite 2000, Seattle, WA 98101
>
> *47° 36' 41" N. 122° 19' 57" W
> <http://here.com/usa/seattle/98101/pike-st/701/map=47.611439,-122.332741,17/title=HERE%20Seattle%20-%20701%20Pike%20Street>*
>
>
>
> <http://360.here.com/> <https://twitter.com/here>
> <https://www.facebook.com/here><https://linkedin.com/company/heremaps>
> <https://www.instagram.com/here>
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
gt;
> Thanks,
> Prabhu
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Failed-job-restart-flink-on-yarn-tp7764.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Na
recovery a function
> of HA?
>
> 5. How would I migrate the RocksDB state once I move to HA mode? Is there
> a straight forward path?
>
> Thanks for your time,
>
> Ryan
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
E Seattle*
>
> 701 Pike St, Suite 2000, Seattle, WA 98101
>
> *47° 36' 41" N. 122° 19' 57" W
> <http://here.com/usa/seattle/98101/pike-st/701/map=47.611439,-122.332741,17/title=HERE%20Seattle%20-%20701%20Pike%20Street>*
>
>
>
> <http://360.here.com/> <https://twitter.com/here>
> <https://www.facebook.com/here><https://linkedin.com/company/heremaps>
> <https://www.instagram.com/here>
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
ble
> estimation. This approach is based on estimation and may add execution
> latency to those windows.
>
> Which would be suggested way in general?
>
> Thanks,
> Chen
>
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
gger a save point. I am not sure this is the way to go. My
>> state
>> > backend is HDFS and I can see that the checkpoint path has the data
>> that has
>> > been buffered in the window.
>> >
>> > I want to start the job in a way such that it will read the
es (>4MB).
>
> How can I disable from my Java code (through the Configuration object)
> the progress messages displayed in console?
>
> Thanks,
> Andres
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
n this, for every symbol i
> want to find max of stock price in last 10 mins. I want to generate
> watermarks specific to key rather than across the stream. Is this possible
> in flink?
>
> --
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>
--
Jamie Grier
d
ning application? Should I start multiple execution contexts?
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Adding-and-removing-operations-after-execute-tp7863.html
> Sent from the Apache Flink User Mailing Li
t; way to set the jvm opts for yarn ?
>
> Thanks,
> Prabhu
>
> On Wed, Aug 3, 2016 at 7:03 PM, Prabhu V wrote:
>
>> Hi,
>>
>> Is there a way to set jvm options on the yarn application-manager and
>> task-manager with flink ?
>>
>> Thanks,
ontainers.
>
> Thanks,
> Prabhu
>
> On Thu, Aug 4, 2016 at 3:07 PM, Jamie Grier
> wrote:
>
>> Use *env.java.opts*
>>
>> This will be respected by the YARN client.
>>
>>
>>
>> On Thu, Aug 4, 2016 at 11:10 AM, Prabhu
ome hints on how Flink manage window buffer and
> how streaming manages its memory. I see this page on batch api memory
> management and wonder what is the equivalent for streaming?
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
>
> --
> Cheers,
>
this be done?
>
The typical way to do this is to consume that configuration as a stream and
hold the configuration internally in the state of a particular user
function.
>
> Thanks
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
. Please advise on the better approach to handle these kind of
> scenarios and how other applications are handling it. Thanks.
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
;task_name"
>
> PS: $subtask is the templating variable that I'm using in order to have
> multiple subtask values. I have tried the 'All' option for this templating
> variable- This give me an incorrect plot showing me negative values while
> the individual selection of subtask values when selected from the
> templating variable drop down yields correct result.
>
> Thank you!
>
> Regards,
> Anchit
>
>
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
Another note. In the example the template variable type is "custom" and
the values have to be enumerated manually. So in your case you would have
to configure all the possible values of "subtask" to be 0-49.
On Tue, Nov 1, 2016 at 2:43 PM, Jamie Grier wrote:
> This
PM, Jamie Grier wrote:
> Another note. In the example the template variable type is "custom" and
> the values have to be enumerated manually. So in your case you would have
> to configure all the possible values of "subtask" to be 0-49.
>
> On Tue, Nov 1, 201
er-mailing-list-archive.2336050.
> n4.nabble.com/file/n9816/Correct_for_specific_subtask.png>
>
> Regards,
> Anchit
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Flink-Metrics-
> InfluxDB-Grafana-H
link-user-
> mailing-list-archive.2336050.n4.nabble.com/Flink-Metrics-
> InfluxDB-Grafana-Help-with-query-influxDB-query-for-
> Grafana-to-plot-numRecordsIn-numRen-tp9775p9819.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
&g
er that fires on every new element, with up
> to 10 elements at a time. The result would be windows of sizes: 1 element,
> then 2, 3, ..., 9, 10, 10, 10, Is there a way to achieve this with
> predefined triggers or a custom trigger is the only way to go here?
>
> Best reg
er, not the way it assigns windows, it makes sense
> now.
>
> Regarding #4, after doing some more tests I think it's more complex than I
> first thought. I'll probably create another thread explaining more that
> specific question.
>
> Thanks,
> Matt
>
> On Wed, De
ple sinks as well?
>
> On Dec 14, 2016 10:46 PM, wrote:
>
>> Got it. Thanks!
>>
>> On Dec 15, 2016, at 02:58, Jamie Grier wrote:
>>
>> Ahh, sorry, for #2: A single Flink job can have as many sources as you
>> like. They can be combined in multiple wa
flink-docs-release-1.1/concepts/concepts.html
On Thu, Dec 15, 2016 at 3:21 PM, Jamie Grier
wrote:
> All streams can be parallelized in Flink even with only one source. You
> can have multiple sinks as well.
>
> On Thu, Dec 15, 2016 at 7:56 AM, Meghashyam Sandeep V <
> vr1megh
here would
> be any potential scaling limitations as the processing capacity increases.
>
> Thanks
> Govind
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
itted. You should scan
> attachments (if any) for viruses.
>
> Northgate Public Services (UK) Limited, registered in England and Wales
> under number 00968498 with a registered address of Peoplebuilding 2,
> Peoplebuilding Estate, Maylands Avenue, Hemel Hempstead, Hertfordshire, H
code on github (tests files) where it’s done using the
> underlying akka framework, I don’t mind doing it the same way and creating
> an actor to get notifications messages, but I don’t know the best way, and
> there probably is a better one.
>
>
>
> Thanks in advance,
>
>
>
essions identified for windows.
>
> 4. I also may have an additional requirement of writing out each event
> enriched with current session and profile data. I basically could do this
> again with generic window function and write out each event with collector
> when iterating, b
--
> While this executes, it breaks the assignment of the keys to the tasks:
> The "ExpensiveOperation" is now not executed on the same nodes anymore all
> the time (visible by the prefixes in the print()).
>
> What am I doing wrong? Is the only chance to set the whole parallel
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/keyBy-called-
> twice-Second-time-INetAddress-and-Array-Byte-are-empty-tp10907p10947.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at N
e topic to write to and for that I need to be able to read
> the key. Is there a way to do this?
>
>
> Is there a better way to do this, rather than using a KeyedStream.
>
>
> Paul
>
--
Jamie Grier
data Artisans, Director of Applications Engineering
@jamiegrier <https://twitter.com/jamiegrier>
ja...@data-artisans.com
64 matches
Mail list logo