Thanks for your answers :)

Best regards/祝好,

Chang Liu 刘畅


> On 17 Sep 2018, at 17:25, Kostas Kloudas <k.klou...@data-artisans.com> wrote:
> 
> Hi Chiang,
> 
> Some of the answers you can find in line:
> 
>> On Sep 17, 2018, at 3:47 PM, Chang Liu <fluency...@gmail.com 
>> <mailto:fluency...@gmail.com>> wrote:
>> 
>> Dear All,
>> 
>> I am helping my team setup a Flink cluster and we would like to have high 
>> availability and easy to scale.
>> 
>> We would like to setup a minimal cluster environment but can be easily 
>> scaled in the future. This is my simple proposal: 
>> 2 nodes
>> each node is running a Flink instance, a YARN, and a HDFS
>> Flink, YARN and HDFS are all running in cluster mode.
>> 
>> <image.jpeg>
>> 
>> Based on it, my questions are:
>> By using HDFS as the file system, we can achieve fault tolerant (by 
>> recovering the checkpoint states when job fails). Question: so Flink itself 
>> is not capable of keeping and maintaining distributed state persistence just 
>> using local Linux file system, right?
>> Then, my follow-up is: if you have a Flink cluster (multiple nodes), and you 
>> use local Linux file system keeping the state checkpoints, what will happen 
>> if Flink job failed and Flink start to restart the job and recover the state 
>> from checkpoints?
> 
> For both the above:
> When a task fails, the whole job (all the tasks) are restarted, and are 
> rescheduled on different machines.
> If you use a local FS and you try to fetch state remotely upon recovery, how 
> would the new nodes be able to locate
> the state on a remote machine?
> 
> This is why Flink uses a distributed file system.
> 
>> If the Flink is deployed and managed on YARN, does that mean: if YARN is 
>> down, Flink is down?
> 
> Well, it depends on which component fails. And I am not sure about all of 
> them, but you could try it and see.
> 
>> If we have Flink managed by YARN, what is the purpose of using ZooKeeper? I 
>> did not really understand this part: 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/jobmanager_high_availability.html
>>  
>> <https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/jobmanager_high_availability.html>.
>>  My question is: why YARN cannot provide this JobManager HA, and we have to 
>> add ZooKeeper?
> 
> YARN can make sure that a new job master starts, but that master will have to 
> fetch the state of the previous job master in order to know which jobs are 
> running, their
> progress, etc.
> 
>> How do you think I can keep different components of the architecture in 
>> different nodes (servers)? Do I keep every instance of Flink/YARN/HDFS on 
>> every single server, or I put each of them on completely different servers. 
>> Some of my considerations:
>> if we put them on different servers, there will be many latency over the 
>> network between Flink <-> HDFS, and YARN <-> HDFS
>> But if I each all of the 3 components Flink/YARN/HDFS on every server, they 
>> can also fight against each other for resources, right?
> 
> You are right that you have to consider the above before deciding on your 
> setup.
> 
>> Correct me if i am wrong: one thing for sure is that, for every new where 
>> there is a Flink instance running, there should be a YARN running right?
>> 
>> 
>> Many thanks in advance!
>> 
>> Best regards/祝好,
>> 
>> Chang Liu 刘畅
> 
> I hope this helps,
> Kostas

Reply via email to