Hi Team, I am reading data from sql server tables through pyspark and storing data into S3 as parquet file format.
In some table I have lots of data so I am getting file size in S3 for those tables in GBs. I need help on this following: I want to assign 128 MB to each partition. How we can assign? I don't know the data size in tables. Some tables have 2 column but billions records and some tables have 200 columns but thousands of records. Thanks in advance for your help. Regards, Vijay On Wed, 23 Nov, 2022, 10:05 pm Mitch Shepherd, <mitch.sheph...@marklogic.com> wrote: > Hello, > > > > I’m wondering if anyone can point me in the right direction for a Spark > connector developer guide. > > > > I’m looking for information on writing a new connector for Spark to move > data between Apache Spark and other systems. > > > > Any information would be helpful. I found a similar thing for Kafka > <https://docs.confluent.io/platform/current/connect/devguide.html> but > haven’t been able to track down documentation for Spark. > > > > Best, > > Mitch > > This message and any attached documents contain information of MarkLogic > and/or its customers that may be confidential and/or privileged. If you are > not the intended recipient, you may not read, copy, distribute, or use this > information. If you have received this transmission in error, please notify > the sender immediately by reply e-mail and then delete this message. This > email may contain pricing or other suggested contract terms related to > MarkLogic software or services. Any such terms are not binding on MarkLogic > unless and until they are included in a definitive agreement executed by > MarkLogic. >