[window aggregate][debug] Rows not dropping with watermark and window

2022-04-27 Thread Xavier Gervilla
Hi team, With your help last week I was able to adapt a project I'm developing and apply a sentiment analysis and NER retrieval to streaming tweets. One of the next steps in order to ensure that memory doesn't collapse is applying windows and watermarks to discard tweets after some time. Howe

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-20 Thread Xavier Gervilla
_df = df.select(explode_outer(map_keys(col(col_name.distinct()             keys = list(map(lambda row: row[0], keys_df.collect()))             key_cols = list(map(lambda f: col(col_name).getItem(f)             .alias(str(col_name + sep + f)), keys))             drop_column_list = [col_nam

[Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-18 Thread Xavier Gervilla
Hi Team,https://stackoverflow.com/questions/71841814/is-there-a-way-to-prevent-excessive-ram-consumption-with-the-spark-configuration I'm developing a project that retrieves tweets on a 'host' app, streams them to Spark and with different operations with DataFrames obtains the Sentiment of th