If you want to ensure the persisted RDD has been calculated first,
just run foreach with a dummy function first to force evaluation.
--
Michael Mior
michael.m...@gmail.com
Le jeu. 24 sept. 2020 à 00:38, Arya Ketan a écrit :
>
> Thanks, we were able to validate the same behaviour.
>
>
It's fairly common for adapters (Calcite's abstraction of a data
source) to push down predicates. However, the API certainly looks a
lot different than Catalyst's.
--
Michael Mior
mm...@apache.org
Le lun. 13 janv. 2020 à 09:45, Jason Nerothin
a écrit :
>
> The implementatio
If you put a * in the path, Spark will look for a file or directory named
*. To read all the files in a directory, just remove the star.
--
Michael Mior
michael.m...@gmail.com
On Jun 22, 2017 17:21, "saatvikshah1994" wrote:
> Hi,
>
> I've downloaded and kept the same
arted yet.
That said, I'm not very familiar with either project, so perhaps there are
some big concerns I'm not aware of.
--
Michael Mior
mm...@apache.org
2017-06-21 3:19 GMT-04:00 Rick Moritz :
> Keeping it inside the same program/SparkContext is the most performant
> s
It's still in the early stages, but check out Deep Learning Pipelines from
Databricks
https://github.com/databricks/spark-deep-learning
--
Michael Mior
mm...@apache.org
2017-06-20 0:36 GMT-04:00 Gaurav1809 :
> Hi All,
>
> Similar to how we have machine learning library called
able WHERE (mycolumn BETWEEN 1 AND 2) AND
(myudfsearchfor(\"start\\\"end\"))"
--
Michael Mior
mm...@apache.org
2017-06-15 12:05 GMT-04:00 mark.jenki...@baesystems.com <
mark.jenki...@baesystems.com>:
> *Hi,*
>
>
>
> *I have a query **sqlContext.sql(“**SELECT *
While I'm not sure why you're seeing an increase in partitions with such a
small data file, it's worth noting that the second parameter to textFile is
the *minimum* number of partitions so there's no guarantee you'll get
exactly that number.
--
Michael Mior
mm...@apache