Re: How to share text file across tasks at run time in flink.

Robert Metzger Mon, 29 Aug 2016 04:45:00 -0700

Hi,

you could use Zookeeper if you want to dynamically change the DB name /
credentials at runtime.


The GlobalJobParameters are immutable at runtime, so you can not pass
updates through it to the cluster.
They are intended for parameters for all operators/the entire job and the
web interface.

Regards,
Robert


On Thu, Aug 25, 2016 at 7:03 AM, Baswaraj Kasture <kbaswar...@gmail.com>
wrote:

> Thanks to all for your inputs.
> Yeah, I could put all these common configurations/rules in DB and workers
> can pick it up dynamically at run time.
> In this case DB configuration/connection details need to be hard coded  ?
> Is there any way worker can pickup  DB name/credentials etc at run time
> dynamically ?
>
> I am going through the feature/API documentation, but how about using
> function closer  and setGlobalJobParameters/getGlobalJobParameters ?
>
> +Baswaraj
>
> On Wed, Aug 24, 2016 at 5:17 PM, Maximilian Michels <m...@apache.org>
> wrote:
>
>> Hi!
>>
>> 1. The community is working on adding side inputs to the DataStream
>> API. That will allow you to easily distribute data to all of your
>> workers.
>>
>> 2. In the meantime, you could use `.broadcast()` on a DataSet to
>> broadcast data to all workers. You still have to join that data with
>> another stream though.
>>
>> 3. The easiest method of all is to simply load your file in the
>> RichMapFunction's open() method. The file can reside in a distributed
>> file system which is accessible by all workers.
>>
>> Cheers,
>> Max
>>
>> On Wed, Aug 24, 2016 at 6:45 AM, Jark Wu <wuchong...@alibaba-inc.com>
>> wrote:
>> > Hi,
>> >
>> > I think what Bswaraj want is excatly something like Storm Distributed
>> Cache
>> > API[1] (if I’m not misunderstanding).
>> >
>> > The distributed cache feature in storm is used to efficiently distribute
>> > files (or blobs, which is the equivalent terminology for a file in the
>> > distributed cache and is used interchangeably in this document) that are
>> > large and can change during the lifetime of a topology, such as
>> geo-location
>> > data, dictionaries, etc. Typical use cases include phrase recognition,
>> > entity extraction, document classification, URL re-writing,
>> location/address
>> > detection and so forth. Such files may be several KB to several GB in
>> size.
>> > For small datasets that don't need dynamic updates, including them in
>> the
>> > topology jar could be fine. But for large files, the startup times could
>> > become very large. In these cases, the distributed cache feature can
>> provide
>> > fast topology startup, especially if the files were previously
>> downloaded
>> > for the same submitter and are still in the cache. This is useful with
>> > frequent deployments, sometimes few times a day with updated jars,
>> because
>> > the large cached files will remain available without changes. The large
>> > cached blobs that do not change frequently will remain available in the
>> > distributed cache.
>> >
>> >
>> > We can look into this whether it is a common use case and how to
>> implement
>> > it in Flink.
>> >
>> > [1] http://storm.apache.org/releases/2.0.0-SNAPSHOT/distcache-
>> blobstore.html
>> >
>> >
>> > - Jark Wu
>> >
>> > 在 2016年8月23日，下午9:45，Lohith Samaga M <lohith.sam...@mphasis.com> 写道：
>> >
>> > Hi
>> > May be you could use Cassandra to store and fetch all such reference
>> data.
>> > This way the reference data can be updated without restarting your
>> > application.
>> >
>> > Lohith
>> >
>> > Sent from my Sony Xperia™ smartphone
>> >
>> >
>> >
>> > ---- Baswaraj Kasture wrote ----
>> >
>> > Thanks Kostas !
>> > I am using DataStream API.
>> >
>> > I have few config/property files (key vale text file) and also have
>> business
>> > rule files (json).
>> > These rules and configurations are needed when we process incoming
>> event.
>> > Is there any way to share them to task nodes from driver program ?
>> > I think this is very common use case and am sure other users may face
>> > similar issues.
>> >
>> > +Baswaraj
>> >
>> > On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas
>> > <k.klou...@data-artisans.com> wrote:
>> >>
>> >> Hello Baswaraj,
>> >>
>> >> Are you using the DataSet (batch) or the DataStream API?
>> >>
>> >> If you are in the first, you can use a broadcast variable for your
>> task.
>> >> If you are using the DataStream one, then there is no proper support
>> for
>> >> that.
>> >>
>> >> Thanks,
>> >> Kostas
>> >>
>> >> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kbaswar...@gmail.com>
>> >> wrote:
>> >>
>> >> Am running Flink standalone cluster.
>> >>
>> >> I have text file that need to be shared across tasks when i submit my
>> >> application.
>> >> in other words , put this text file in class path of running tasks.
>> >>
>> >> How can we achieve this with flink ?
>> >>
>> >> In spark, spark-submit has --jars option that puts all the files
>> specified
>> >> in class path of executors (executors run in separate JVM and spawned
>> >> dynamically, so it is possible).
>> >>
>> >> Flink's task managers run tasks in separate thread under taskmanager
>> JVM
>> >> (?) , how can we make this text file to be accessible on all tasks
>> spawned
>> >> by current application ?
>> >>
>> >> Using HDFS, NFS or including file in program jar is one way that i
>> know,
>> >> but am looking for solution that can allows me to provide text file at
>> run
>> >> time and still accessible in all tasks.
>> >> Thanks.
>> >>
>> >>
>> >
>> >
>> > Information transmitted by this e-mail is proprietary to Mphasis, its
>> > associated companies and/ or its customers and is intended
>> > for use only by the individual or entity to which it is addressed, and
>> may
>> > contain information that is privileged, confidential or
>> > exempt from disclosure under applicable law. If you are not the intended
>> > recipient or it appears that this mail has been forwarded
>> > to you without proper authority, you are notified that any use or
>> > dissemination of this information in any manner is strictly
>> > prohibited. In such cases, please notify us immediately at
>> > mailmas...@mphasis.com and delete this mail from your records.
>> >
>> >
>>
>
>

Re: How to share text file across tasks at run time in flink.

Reply via email to