Re: NLTK with Spark Streaming

2017-12-01 Thread ashish rawat
Thanks Nicholas, but the problem for us is that we want to use NLTK Python library, since our data scientists are training using that. Rewriting the inference logic using some other library would be time consuming and in some cases, it may not even work because of unavailability of some functions.

Re: NLTK with Spark Streaming

2017-11-28 Thread Nicholas Hakobian
Depending on your needs, its fairly easy to write a lightweight python wrapper around the Databricks spark-corenlp library: https://github.com/databricks/spark-corenlp Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Sun, Nov 26, 2017 at 8:

Re: NLTK with Spark Streaming

2017-11-26 Thread ashish rawat
Thanks Holden and Chetan. Holden - Have you tried it out, do you know the right way to do it? Chetan - yes, if we use a Java NLP library, it should not be any issue in integrating with spark streaming, but as I pointed out earlier, we want to give flexibility to data scientists to use the language

Re: NLTK with Spark Streaming

2017-11-26 Thread Chetan Khatri
But you can still use Stanford NLP library and distribute through spark right ! On Sun, Nov 26, 2017 at 3:31 PM, Holden Karau wrote: > So it’s certainly doable (it’s not super easy mind you), but until the > arrow udf release goes out it will be rather slow. > > On Sun, Nov 26, 2017 at 8:01 AM a

Re: NLTK with Spark Streaming

2017-11-26 Thread Holden Karau
So it’s certainly doable (it’s not super easy mind you), but until the arrow udf release goes out it will be rather slow. On Sun, Nov 26, 2017 at 8:01 AM ashish rawat wrote: > Hi, > > Has someone tried running NLTK (python) with Spark Streaming (scala)? I > was wondering if this is a good idea a

NLTK with Spark Streaming

2017-11-25 Thread ashish rawat
Hi, Has someone tried running NLTK (python) with Spark Streaming (scala)? I was wondering if this is a good idea and what are the right Spark operators to do this? The reason we want to try this combination is that we don't want to run our transformations in python (pyspark), but after the transfo