Running pyspark job from virtual environment

2021-01-16 Thread rajat kumar
Hey Users, I want to run spark job from virtual environment using Python. Please note I am creating virtual env (using python3 -m venv env) I see that there are 3 variables for PYTHON which we have to set: PYTHONPATH PYSPARK_DRIVER_PYTHON PYSPARK_PYTHON I have 2 doubts: 1. If i want to use Virt

Dynamic Spark metrics creation

2021-01-16 Thread יורי אולייניקוב
Hi all, I have a spark application with Arbitrary Stateful Aggregation implemented with FlatMapGroupsWithStateFunction. I want to make some statistics about incoming events inside FlatMapGroupsWithStateFunction. The statistics are made from some event property which on the one hand has dynamic val