You are right Aljoscha . Jog graph is splitted after introducing partitioner
.
I was under impression that If parallelism is set everything will be chained
together .
Can you explain how data will flow for map -> partitioner -> flatmap if
parallelism or It would be great if point me to right doc
Hi Gordon,
any news on this?
On Mon, Jun 12, 2017 at 9:54 AM, Tzu-Li (Gordon) Tai
wrote:
> This seems like a shading problem then.
> I’ve tested this again with Maven 3.0.5, even without building against CDH
> Hadoop binaries the flink-dist jar contains non-shaded Guava dependencies.
>
> Let me
Hi Aljoscha,
we're still investigating possible solutions here. Yes, as you correctly
said there are links between data of different keys so we can only proceed
with the next job only once we are sure at 100% that all input data has
been consumed and no other data will be read until this last jobs
Hi Aljosha,
thanks for the great suggestions, I wasn't aware of
AsyncDataStream.unorderedWait
and BucketingSink setBucketer().
Most probably that's exactly what I was looking for...(I should just have
the time to test it.
Just one last question: what are you referring to with "you could use a
diffe
Hi Stefan
I could solve the issue by building frocksdb with a patch for ppc
architecture. How ever this patch is already applied to the latest version
of rocksdb, where as frocksdb seems not updated in a while. It would be
nice to have it updated with this patch.
Thanks
Ziyad
On Tue, Jun 6, 201
Thanks Aljoscha Krettek I will try the same.
On Thu, Jun 15, 2017 at 3:11 PM, Aljoscha Krettek
wrote:
> Hi,
>
> How would you evaluate such a query? I think the answer could be that you
> have to keep all that older data around so that you can evaluate when a new
> event arrives. In Flink, you c
Hi Greg,
I wanna ask if there was any news about the implementation or opportunities?
Thanks and best regards,
Marc
Am 12.06.2017 um 19:28 schrieb Kaepke, Marc
mailto:marc.kae...@haw-hamburg.de>>:
I’m working on an implementation of SemiClustering [1].
I used two graph models (Pregel aka. vert
Hi,
Before adding the partitioner all your functions are chained together. That is,
everything is executed in one Thread and sending elements from one function to
the next is essentially just a method call. By introducing a partitioner you
break this chain and therefore your job now has to send
Hi,
I have streaming job which is running with parallelism 1 as of now . (This
job will run with parallelism > 1 in future )
So I have added custom partitioner to partition the data based on one tuple
field .
The flow is :
source -> map -> partitioner -> flatmap -> sink
The partitioner is ad
So your suggestion is I create an archive of all the file in the resources.
Then I get the distributed cache of this file and extracted it to a path.
Use this path as my resource path?
But in which time I should clear the temp path?
--
View this message in context:
http://apache-flink-user-mail
I think the code that submits the job can create an archive of all the files in
the “resources”, this making sure that they stay together. This file would then
be placed in the distributed cache. When executing the contents of the archive
can be extracted again and be used, since they still main
Ok, just trying to make sure I understand everything: You have this:
1. A bunch of data in HDFS that you want to enrich
2. An external service (Solr/ES) that you query for enriching the data rows
stored in 1.
3. You need to store the enriched rows in HDFS again
I think you could just do this (ro
Yes.
My resource file is python or other script reference to each other by
relative path.
What I want is all my resource file in one job place in one directory.
And the resource files in different jobs can't place in one directory.
The distributedCache can not guarantee this.
--
View this messag
That returns a String specific the resource path.
Any suggestion about this?
What I want is copy the resource to specific path in task manger, and pass
the specific path to my operator.
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Storm-to
I’m sensing this is related to your other question about adding a method to the
RuntimeContext. Would it be possible to extract the resources from the Jar when
submitting the program and placing them in the distributed cache? Files can be
registered using StreamExecutionEnvironment.registerCache
What would getUserResourceDir() return? In general, Flink devs are very
reluctant when adding new methods to such central interfaces because it’s not
easy to fix them if they’re broken, unneeded.
Best,
Aljoscha
> On 15. Jun 2017, at 12:40, yunfan123 wrote:
>
> It seems ok, but flink-storm not
Hi,
Yes, I can’t think of cases right now where placing the extractor after a union
makes sense. In general, I think it’s always best to place the timestamp
extractor as close to the sources (or in the sources, for Kafka) as possible.
Right now it would be quite hard (and probably a bit hacky)
Hello Aljoscha,
Fortunately, I found the program in Google's caches :) I've attached
below for reference. I'm stunned by how accurately you have hit the
point given the few pieces of information I left in the original text. +1
Yes, it's exactly as you explained. Can you think of a scenario where
It seems ok, but flink-storm not support storm codeDir.
I'm working on to make the flink-storm support the codeDir.
To support the code dir, I have to add a new funtion such as
getUserResourceDir(may be return null) in flink RuntimeContext.
I know this may be a big change. What do you think of this
Thank you a lot Carst, Flink runs at an higher level than I imagined.
I will try with some experiments!
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-divide-streams-on-key-basis-and-deliver-them-tp13743p13755.html
Sent from the Apa
Hi,
@Ted:
> Is it possible to prune (unneeded) field(s) so that heap requirement is
> lower ?
The XmlInputFormat [0] splits the raw data into smaller chunks, which
are then further processed. I don't think I can reduce the field's
(Tuple2) sizes. The major difference to Mahout's
XmlInputFormat i
Hi,
Trying to revive this somewhat older thread: have you made any progress? I
think going with a ProcessFunction that keeps all your state internally and
periodically outputs to, say, Elasticsearch using a sink seems like the way to
go? You can do the periodic emission using timers in the Proc
Hi,
I’m afraid the Storm compatibility layer has not been touched in a while and
was never more than a pet project. Do you have an actual use case for running
Storm topologies on top of Flink? Do be honest, I think it might be easier to
simply port spouts to Flink SourceFunctions and bolts to a
Hi,
How would you evaluate such a query? I think the answer could be that you have
to keep all that older data around so that you can evaluate when a new event
arrives. In Flink, you could use a ProcessFunction for that and use a MapState
that keeps events bucketed into one-week intervals. This
Hi Ninad,
I discussed a bit with Gordon and we have some follow-up questions and some
theories as to what could be happening.
Regarding your first case (the one where data loss is observed): How do you
ensure that you only shut down the brokers once Flink has read all the data
that you expect
I think the only way is adding more managed memory.
The large record handler only take effects in reduce side which used by the
merge sorter. According to
the exception, it is thrown during the combing phase which only uses an
in-memory sorter, which doesn't
have large record handle mechanism.
Be
26 matches
Mail list logo