Count is different on DataFrames and Datasets from RDDs. On RDDs, it always
evaluates everything, but on DataFrame/Dataset, it turns into the equivalent of
"select count(*) from ..." in SQL, which can be done without scanning the data
for some data formats (e.g. Parquet). On the other hand thoug
It's at the bottom of every message (although some mail clients hide it for
some reason), send an email to dev-unsubscr...@spark.apache.org
On Sat, Feb 18, 2017 at 11:07 AM Pritish Nawlakhe <
prit...@nirvana-international.com> wrote:
> Hi
>
> Would anyone know how to unsubscribe to this list?
>
>
Hi
Would anyone know how to unsubscribe to this list?
Thank you!!
Regards
Pritish
Nirvana International Inc.
Big Data, Hadoop, Oracle EBS and IT Solutions
VA - SWaM, MD - MBE Certified Company
prit...@nirvana-international.com
http://www.nirvana-international.com
Twitter: @nirvanainternat
I think the right answer is "don't do that" but if you really had to you
could trigger a Dataset operation that does nothing per partition. I
presume that would be more reliable because the whole partition has to be
computed to make it available in practice. Or, go so far as to loop over
every elem