https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet
*change spark = sparknlp.start()*
to
spark = sparknlp.start(spark32=True)
tir. 19. apr. 2022 kl. 21:10 skrev Bjørn Jørgensen :
> Yes, there are some that have that issue.
>
> Please open a new issue at
> https://github.com/JohnSnowLa
Yes, there are some that have that issue.
Please open a new issue at https://github.com/JohnSnowLabs/spark-nlp/issues
and they will help you.
tir. 19. apr. 2022 kl. 20:33 skrev Xavier Gervilla <
xavier.gervi...@datapta.com>:
> Thank you for your advice, I had small knowledge of Spark NLP and
I don't want to groupBy since i want the rows separate for the subsequent
transformations. But i want to groupBy (i am using partitionBy here) using
many attributes while counting the frequency for each different group of
records (with respect to the the attributes first mentioned)
Le mar. 19 avr.
Just .groupBy(...).count() ?
On Tue, Apr 19, 2022 at 6:24 AM marc nicole wrote:
> Hello guys,
>
> I want to group by certain column attributes (e.g.,List
> groupByQidAttributes) a dataset (initDataset) and then count the
> occurrences of associated grouped rows, how do i achieve that neatly?
> I
Don't collect() - that pulls all data into memory. Use count().
On Tue, Apr 19, 2022 at 5:34 AM wilson wrote:
> Hello,
>
> Do you know for a big dataset why the general RDD job can be done, but
> the collect() failed due to memory overflow?
>
> for instance, for a dataset which has xxx million o
I have no context on ML, but your "streaming" query exposes the possibility
of memory issues.
*flattenedNER.registerTempTable(**"df"**)
>>>
>>>
>>> querySelect = **"SELECT col as entity, avg(sentiment) as sentiment,
>>> count(col) as count FROM df GROUP BY col"**
>>> finalDF = spark.sql(querySele
Hello guys,
I want to group by certain column attributes (e.g.,List
groupByQidAttributes) a dataset (initDataset) and then count the
occurrences of associated grouped rows, how do i achieve that neatly?
I tried through the following code:
Dataset groupedRowsDF = initDataset.withColumn("qidsFreqs",