Re: Is spark a right tool for updating a dataframe repeatedly

2016-10-17 Thread Mike Metzger
n memory and there is no spill > > over to disk, it might not be a big issue (ofcourse there will still be > > memory, CPU and network overhead/latency). > > > > If you are looking at storing the data on disk (e.g. as part of a > checkpoint > > or explicit storage), t

Re: Is spark a right tool for updating a dataframe repeatedly

2016-10-17 Thread Mungeol Heo
nt > or explicit storage), then there can be substantial I/O activity. > > > > > > > > From: Xi Shen > Date: Monday, October 17, 2016 at 2:54 AM > To: Divya Gehlot , Mungeol Heo > > Cc: "user @spark" > Subject: Re: Is spark a right tool for updati

Re: Is spark a right tool for updating a dataframe repeatedly

2016-10-17 Thread Thakrar, Jayesh
17, 2016 at 2:54 AM To: Divya Gehlot , Mungeol Heo Cc: "user @spark" Subject: Re: Is spark a right tool for updating a dataframe repeatedly I think most of the "big data" tools, like Spark and Hive, are not designed to edit data. They are only designed to query data. I won

Re: Is spark a right tool for updating a dataframe repeatedly

2016-10-17 Thread Xi Shen
I think most of the "big data" tools, like Spark and Hive, are not designed to edit data. They are only designed to query data. I wonder in what scenario you need to update large volume of data repetitively. On Mon, Oct 17, 2016 at 2:00 PM Divya Gehlot wrote: > If my understanding is correct a

Re: Is spark a right tool for updating a dataframe repeatedly

2016-10-16 Thread Divya Gehlot
If my understanding is correct about your query In spark Dataframes are immutable , cant update the dataframe. you have to create a new dataframe to update the current dataframe . Thanks, Divya On 17 October 2016 at 09:50, Mungeol Heo wrote: > Hello, everyone. > > As I mentioned at the tile,