Re: Spark partition issue with Stanford NLP

2015-05-27 Thread mathewvinoj
Hi There, I am trying to process millions of data with spark/scala integrated with stanford NLP (3.4.1). Since I am using social media data I have to use NLP for the themes generation (pos tagging) and Sentiment calulation. I have to deal with Twitter data and NON Twitter data separately.So I

Re: Spark and Stanford CoreNLP

2015-05-27 Thread mathewvinoj
Evan, could you please look into this post.Below is the link.Any thoughts or suggestion is really appreciated http://apache-spark-user-list.1001560.n3.nabble.com/Spark-partition-issue-with-Stanford-NLP-td23048.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabb

spark cache issue while doing saveAsTextFile and saveAsParquetFile

2015-07-14 Thread mathewvinoj
Hi There, I am using cache mapPartition to do some processing and cache the result as below I am storing the file as both format (parquet and textfile) where recomputing is happening both time.Eventhough i put the cache its not working as expected. below is the code snippet.Any help is really