Spark Partitions Size control

vijay khatri Sun, 27 Nov 2022 19:46:41 -0800

Hi Team,

I am reading data from sql server tables through pyspark and storing data
into S3 as parquet file format.


In some table I have lots of data so I am getting file size in S3 for those
tables in GBs.

I need help on this following:

I want to assign 128 MB to each partition. How we can assign?

I don't know the data size in tables. Some tables have 2 column but
billions records and some tables have 200 columns but thousands of records.


Thanks in advance for your help.

Regards,
Vijay


On Wed, 23 Nov, 2022, 10:05 pm Mitch Shepherd, <mitch.sheph...@marklogic.com>
wrote:

> Hello,
>
>
>
> I’m wondering if anyone can point me in the right direction for a Spark
> connector developer guide.
>
>
>
> I’m looking for information on writing a new connector for Spark to move
> data between Apache Spark and other systems.
>
>
>
> Any information would be helpful. I found a similar thing for Kafka
> <https://docs.confluent.io/platform/current/connect/devguide.html> but
> haven’t been able to track down documentation for Spark.
>
>
>
> Best,
>
> Mitch
>
> This message and any attached documents contain information of MarkLogic
> and/or its customers that may be confidential and/or privileged. If you are
> not the intended recipient, you may not read, copy, distribute, or use this
> information. If you have received this transmission in error, please notify
> the sender immediately by reply e-mail and then delete this message. This
> email may contain pricing or other suggested contract terms related to
> MarkLogic software or services. Any such terms are not binding on MarkLogic
> unless and until they are included in a definitive agreement executed by
> MarkLogic.
>

Spark Partitions Size control

Reply via email to