Re: Question about Spark cluster memory usage monitoring

2018-09-23 Thread Jan Brabec (janbrabe)
Hello, Maybe Ganglia http://ganglia.info/ might be useful for you? I have only shallow experience with it, but it might be what you are looking for. Best, Jan From: Muhib Khan Date: Thursday 20 September 2018 at 23:39 To: "Liu, Jialin" Cc: "user@spark.apache.org" Subject: Re: Question about

Is it possible to implement Vector Space Model using PySpark

2018-09-23 Thread Soheil Pourbafrani
Hi, I want to implement the Vector Space Model for texts using Spark. At the first step, I calculate the Vector of the files (dictionary) and I made it a broadcast variable to be accessible for all executors. Vector_of_Words = selected_data.select('full_text').rdd\ .map(lambda x : x[0].encode("

Re: Lightweight pipeline execution for single eow

2018-09-23 Thread Michael Artz
Are you using the scheduler in fair mode instead of fifo mode? Sent from my iPhone > On Sep 22, 2018, at 12:58 AM, Jatin Puri wrote: > > Hi. > > What tactics can I apply for such a scenario. > > I have a pipeline of 10 stages. Simple text processing. I train the data with > the pipeline and