Hi, We have a cluster with 4 nodes. The cluster uses CDH 5.4 for the past two days I have been trying to connect my laptop to the server using spark <master ip:port> but its been unsucessful. The server contains data that needs to be cleaned and analysed. The cluster and the nodes are on linux environment. To connect to the nodes I am usnig SSH
Question: Would it be better if I work directly on the nodes rather than trying to connect my laptop to them ? Question 2: If yes, then can you suggest any python and R IDE that I can install on the nodes to make it work? Thanks for your help Sincerely, Ashish Dutt