Re: Revisiting Python / pandas UDF (new proposal)

2020-01-03 Thread Li Jin
Hyukjin, Thanks for putting this together. I took a look at the proposal and left some comments. At the high level I like using type hints to specify input/output types but not so use about type hints for cordiality. I have commented on more details in the doc. Li On Thu, Jan 2, 2020 at 9:42 AM

Checkpoint and recomputation

2020-01-03 Thread Li Jin
Hi dear devs, I recently came across checkpoint functionality in Spark and found (a little surprising) that checkpoint causes the DataFrame to be computed twice unless cache is called before checkpoint. My guess is that this is probably hard to fix and/or maybe checkpoint feature is not very freq

Question about spark on k8s

2020-01-03 Thread JackyLee
Hello, devs. In our scenario, we run spark on Kata-like containers, and found the code had written the Kube-DNS domain. If Kube-DNS is not configured in environment, tasks would run failed. My question is, why we wrote the domain name of Kube-DNS in the code? Isn't it better to read domain name f