Unable to load MongoDB atlas data via PySpark because of BsonString error

2024-06-08 Thread Perez
Hi Team, this is the problem https://stackoverflow.com/questions/78593858/unable-to-load-mongodb-atlas-data-via-pyspark-jdbc-in-glue I can't go ahead with *StructType* approach since my input record is huge and if the underlying attributes are added or removed my code might fail. I can't change

Re: [SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)

2024-06-08 Thread Someshwar Kale
Hi Chhavi, Currently there is no way to handle backtick(`) spark StructType. Hence the field name a.b and `a.b` are completely different within StructType. To handle that, I have added a custom implementation fixing StringIndexer# validateAndTransformSchema. You can refer to the code on my github

Re: [SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)

2024-06-08 Thread Chhavi Bansal
Hi Someshwar, Thanks for the response, I have added my comments to the ticket . Thanks, Chhavi Bansal On Thu, 6 Jun 2024 at 17:28, Someshwar Kale wrote: > As a fix, you may consider adding a transformer to rename columns (perhaps > replace all

Re: OOM issue in Spark Driver

2024-06-08 Thread Andrzej Zera
Hey, do you perform stateful operations? Maybe your state is growing indefinitely - a screenshot with state metrics would help (you can find it in Spark UI -> Structured Streaming -> your query). Do you have a driver-only cluster or do you have workers too? What's the memory usage profile at worker