Thanks Nicholas, but the problem for us is that we want to use NLTK Python
library, since our data scientists are training using that. Rewriting the
inference logic using some other library would be time consuming and in
some cases, it may not even work because of unavailability of some
functions.
Depending on your needs, its fairly easy to write a lightweight python
wrapper around the Databricks spark-corenlp library:
https://github.com/databricks/spark-corenlp
Nicholas Szandor Hakobian, Ph.D.
Staff Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com
On Sun, Nov 26, 2017 at 8:
Thanks Holden and Chetan.
Holden - Have you tried it out, do you know the right way to do it?
Chetan - yes, if we use a Java NLP library, it should not be any issue in
integrating with spark streaming, but as I pointed out earlier, we want to
give flexibility to data scientists to use the language
But you can still use Stanford NLP library and distribute through spark
right !
On Sun, Nov 26, 2017 at 3:31 PM, Holden Karau wrote:
> So it’s certainly doable (it’s not super easy mind you), but until the
> arrow udf release goes out it will be rather slow.
>
> On Sun, Nov 26, 2017 at 8:01 AM a
So it’s certainly doable (it’s not super easy mind you), but until the
arrow udf release goes out it will be rather slow.
On Sun, Nov 26, 2017 at 8:01 AM ashish rawat wrote:
> Hi,
>
> Has someone tried running NLTK (python) with Spark Streaming (scala)? I
> was wondering if this is a good idea a
Hi,
Has someone tried running NLTK (python) with Spark Streaming (scala)? I was
wondering if this is a good idea and what are the right Spark operators to
do this? The reason we want to try this combination is that we don't want
to run our transformations in python (pyspark), but after the
transfo