date:20170218

Re: Will .count() always trigger an evaluation of each row?

2017-02-18 Thread Matei Zaharia

Count is different on DataFrames and Datasets from RDDs. On RDDs, it always evaluates everything, but on DataFrame/Dataset, it turns into the equivalent of "select count(*) from ..." in SQL, which can be done without scanning the data for some data formats (e.g. Parquet). On the other hand thoug

Re: Design document - MLlib's statistical package for DataFrames

2017-02-18 Thread Holden Karau

It's at the bottom of every message (although some mail clients hide it for some reason), send an email to dev-unsubscr...@spark.apache.org On Sat, Feb 18, 2017 at 11:07 AM Pritish Nawlakhe < prit...@nirvana-international.com> wrote: > Hi > > Would anyone know how to unsubscribe to this list? > >

RE: Design document - MLlib's statistical package for DataFrames

2017-02-18 Thread Pritish Nawlakhe

Hi Would anyone know how to unsubscribe to this list? Thank you!! Regards Pritish Nirvana International Inc. Big Data, Hadoop, Oracle EBS and IT Solutions VA - SWaM, MD - MBE Certified Company prit...@nirvana-international.com http://www.nirvana-international.com Twitter: @nirvanainternat

Re: Will .count() always trigger an evaluation of each row?

2017-02-18 Thread Sean Owen

I think the right answer is "don't do that" but if you really had to you could trigger a Dataset operation that does nothing per partition. I presume that would be more reliable because the whole partition has to be computed to make it available in practice. Or, go so far as to loop over every elem

Re: Will .count() always trigger an evaluation of each row?

Re: Design document - MLlib's statistical package for DataFrames

RE: Design document - MLlib's statistical package for DataFrames

Re: Will .count() always trigger an evaluation of each row?

4 matches

Site Navigation

Mail list logo

Footer information