Re: Spark Parquet write OOM

2022-03-01 Thread Yang,Jie(INF)
This is a DirectByteBuffer OOM,so plan 2 may not work, we can increase the capacity of DirectByteBuffer size by configuring `-XX:MaxDirectMemorySize` and this is a Java opts. However, we'd better check the length of memory to be allocated, because `-XX:MaxDirectMemorySize` and `-Xmx` should

Spark Parquet write OOM

2022-03-01 Thread Anil Dasari
Hello everyone, We are writing Spark Data frame to s3 in parquet and it is failing with below exception. I wanted to try following to avoid OOM 1. increase the default sql shuffle partitions to reduce load on parquet writer tasks to avoid OOM and 2. Increase user memory (reduce memory f

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-03-01 Thread Mich Talebzadeh
I checked this process of gracefully terminating the topic when the flag is set to terminate the topic. In this case the topic is called md => market data. The first two batches and then you set the termination flag on Topic market data => md, batchId is 236, at 2022-03-01 20:52:00.099259 +---

Re: Spark 3.1.2 full thread dumps

2022-03-01 Thread Lalwani, Jayesh
This (https://www.elastic.co/blog/benchmarking-and-sizing-your-elasticsearch-cluster-for-logs-and-metrics) has the math for sizing the cluster. There is a similar document (https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html) on sizing your cluster on AWS.

Re: can dataframe API deal with subquery

2022-03-01 Thread Gourav Sengupta
Hi, why would you want to do that? Regards, Gourav On Sat, Feb 26, 2022 at 8:00 AM wrote: > such as this table definition: > > > desc people; > +---+---+--+ > | col_name | data_type | comment | > +---+---