Hi team,
With your help last week I was able to adapt a project I'm developing and apply
a sentiment analysis and NER retrieval to streaming tweets. One of the next
steps in order to ensure that memory doesn't collapse is applying windows and
watermarks to discard tweets after some time. Howe
_df =
df.select(explode_outer(map_keys(col(col_name.distinct()
keys = list(map(lambda row: row[0], keys_df.collect()))
key_cols = list(map(lambda f: col(col_name).getItem(f)
.alias(str(col_name + sep + f)), keys))
drop_column_list = [col_nam
Hi
Team,https://stackoverflow.com/questions/71841814/is-there-a-way-to-prevent-excessive-ram-consumption-with-the-spark-configuration
I'm developing a project that retrieves tweets on a 'host' app, streams them to
Spark and
with different operations with DataFrames obtains the Sentiment of th