Hi There,
I am trying to process millions of data with spark/scala integrated with
stanford NLP (3.4.1).
Since I am using social media data I have to use NLP for the themes
generation (pos tagging) and Sentiment calulation.
I have to deal with Twitter data and NON Twitter data separately.So I
Evan,
could you please look into this post.Below is the link.Any thoughts or
suggestion is really appreciated
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-partition-issue-with-Stanford-NLP-td23048.html
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabb
Hi There,
I am using cache mapPartition to do some processing and cache the result as
below
I am storing the file as both format (parquet and textfile) where
recomputing is happening both time.Eventhough i put the cache its not
working as expected.
below is the code snippet.Any help is really