Dataframes have a partitionBy function too.
You can avoid a shuffle if one of your datasets is small enough to
broadcast.
On Thu., 4 Jul. 2019, 7:34 am Mkal, wrote:
> Please keep in mind i'm fairly new to spark.
> I have some spark code where i load two textfiles as datasets and after
> some
>
Hello,
I also second Gourav's point regarding "Spark the definitive guide" book.
This is great for learning both Scala and python based SPARK. But as others
mentioned, you will need to continuously read the documentation as SPARK is
still undergoing a lot of improvements. I list additional resourc
okay this is all something which I would disagree with.
Dr. Matei Zaharia created SPARK
Then he and Bill Chambers wrote a book on SPARK recently
He is still the main thinking power behind SPARK (look at his research in
Stanford)
The name of the book is "SPARK the definitive guide", its the best ev
Thanks!!!
On Fri, 5 Jul 2019 at 15:38, Chris Teoh wrote:
> Scala is better suited to data engineering work. It also has better
> integration with other components like HBase, Kafka, etc.
>
> Python is great for data scientists as there are more data science
> libraries available in Python.
>
> O
Scala is better suited to data engineering work. It also has better
integration with other components like HBase, Kafka, etc.
Python is great for data scientists as there are more data science
libraries available in Python.
On Fri., 5 Jul. 2019, 7:40 pm Vikas Garg, wrote:
> Is there any disadva
Is there any disadvantage of using Python? I have gone through multiple
articles which says that Python has advantages over Scala.
Scala is super fast in comparison but Python has more pre-built libraries
and options for analytics.
Still should I go with Scala?
On Fri, 5 Jul 2019 at 13:07, Kurt
Since you are a data engineer I would start by learning Scala. The parts of
Scala you would need to learn are pretty basic. Start with the examples on
the Spark website, which gives examples in multiple languages. Think of
Scala as a typed version of Python. You will find that the error messages
te