Hello All,
Hope this email finds you well. I have a dataframe of size 8TB (parquet
snappy compressed), however I group it by a column and get a much smaller
aggregated dataframe of size 700 rows (just two columns, key and count).
When I use it like below to broadcast this aggregated result, it thr
Ivan,
Although this is kubernetes-related docs it might apply to your use case:
https://spark.apache.org/docs/latest/running-on-kubernetes.html#docker-images
There is a script that can create the image for you in spark distribution,
it was added in 2.3. So if you downloaded a spark 2.3+ distribu
I have the same issue. Do you have a solution? Maybe spark stream not support
transaction message. I use Kafka stream to retrieve the transaction message.
Maybe we can ask Spark support this feature.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Hi!
I have an Structured Streaming Application that reads from kafka, performs
some aggregations and writes in S3 in parquet format.
Everything seems to work great except that from time to time I get a
checkpoint error, at the beginning I thought it was a random error but it
happened more than 3
Hi,
looking for a ready to use docker-container that has inside:
- spark 2.4 or higher
- yarn 2.8.2 or higher
I'm looking for a way to submit spark jobs on yarn.