Hi hagersaleh,
You should know why the error occurred with large scale data. Broadcast
variables can handle only data of which size is fit for single machine.
I meant that using an external system such as Redis, HBase, …, etc. The
connection with the external system could be initialized in `ope
Hi Chiwan Park
not understand this solution please explain more
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2676.html
Sent from the Apache Flink User Ma
Chiwan has a good point. Once the data that needs to be available to all
machines is too large for one machine, there is no good solution any more.
The best approach is an external store to which all nodes have access. It
is not going to be terribly fast, though.
If you are in the situation that y
Hi hagersaleh,
Sorry for late reply.
I think using an external system could be a solution for large scale data. To
use an external system, you have to implement rich functions such as
RichFilterFunction, RichMapFunction, …, etc.
Regards,
Chiwan Park
> On Aug 30, 2015, at 1:30 AM, hagersaleh
where are any ways for use broadcast variable with bigdata
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2566.html
Sent from the Apache Flink User Mailing
Note: As the content of broadcast variables is kept in-memory on each node,
it should not become too large. For simpler things like scalar values you
can simply make parameters part of the closure of a function, or use the
withParameters(...) method to pass in a configuration.
--
View this messa
When to use broadcast variable?
Distribute data with a broadcast variable when
The data is large
The data has been produced by some form of computation and is already a
DataSet (distributed result)
Typical use case: Redistribute intermediate results, such as trained
models
from link
Because the broadcasted variable is completely stored at each operator.
If you use a hash join, then both inputs can be hash partitioned. This reduces
the amount of memory needed for each operator, I think.
> Am 20.08.2015 um 12:14 schrieb hagersaleh :
>
> why this is not good broadcast v
why this is not good broadcast variable use in bigdata
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2468.html
Sent from the Apache Flink User Mailing List
As you can see from the exceptions your broadcast variable is too large to fit
into the main memory.
I think storing that amount of data in a broadcast variable is not the best
approach. Try to use a dataset for this, I would suggest.
> Am 20.08.2015 um 11:56 schrieb hagersaleh :
>
> pleas
please help
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2461.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.c
please help
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/when-use-broadcast-variable-and-run-on-bigdata-display-this-error-please-help-tp2455p2456.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.c
12 matches
Mail list logo