Adding to Fabian's and Sebastian's answer:
Variable in Closure (global variable) ------------------------------------------------------ - Happens when you reference some variable in the program from a function. The variable becomes part of the Function's closure. - The variable is distributed with the CODE. It is part of the function object and is distributed with by the TaskDeployment messages. - Data needs to be available in the driver program (cannot be a Flink DataSet, which lives distributedly) - Should be used for constants or config parameters or simple scalar values. Summary: Small data that is available on the client (driver program) Broadcast set ------------------------------------------------------ - Refers to data that is produced by a Flink operation (DataSet) and that lives in the cluster, rather than on the client (or in the driver program) - Data distribution is part of the distributed data flow and happens through the Flink network stack - Can be much larger than the closure variables. - Should be used when you want to make an intermediate result of a Flink computation accessible to all functions. Greetings, Stephan On Mon, Apr 27, 2015 at 10:23 AM, Fabian Hueske <fhue...@gmail.com> wrote: > You should also be aware that the value of a static variable is only > accessible within the same JVM. > Flink is a distributed system and runs in multiple JVMs. So if you set a > value in one JVM it is not visible in another JVM (on a different node). > > In general, I would avoid to use static variables in Flink programs. > > Best, Fabian > > 2015-04-26 9:54 GMT+02:00 Sebastian <s...@apache.org>: > >> Hi Hung, >> >> A broadcast variable can also refer to an intermediate result of a Flink >> computation. >> >> Best, >> Sebastian >> >> >> On 25.04.2015 21:10, HungChang wrote: >> >>> Hi, >>> >>> What would be the difference between using global variable and >>> broadcasting >>> it? >>> >>> A toy example: >>> >>> // Using global >>> {{... >>> private static int num = 10; >>> } >>> >>> public class DivByTen implements FlatMapFunction<Tuple1<Double>, >>> Tuple1<Double>> { >>> @Override >>> public void flatMap(Tuple1<Double>value, Collector<Tuple1<Double>> >>> out) >>> { >>> out.collect(new Tuple1<Double>(value/ num)); >>> } >>> }} >>> >>> // Using broadcasting : >>> {... >>> public static class DivByTen extends >>> RichGMapFunction<Tuple1<Double>, >>> Tuple1<Double>>{ >>> >>> private long num; >>> >>> @Override >>> public void open(Configuration parameters) throws >>> Exception { >>> super.open(parameters); >>> num = getRuntimeContext().<Integer> >>> getBroadcastVariable( >>> "num").get(0); >>> } >>> >>> @Override >>> public void map(Tuple1<Double>value, >>> Collector<Tuple1<Double>> out)) >>> throws Exception{ >>> out.collect(new Tuple1<Double>(value/num)); >>> } >>> } >>> } >>> >>> Best regards, >>> >>> Hung >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128.html >>> Sent from the Apache Flink User Mailing List archive. mailing list >>> archive at Nabble.com. >>> >>> >