Hello experts,

I was just wondering if I could leverage the below thing to expedite the
loading of the data process in Spark.


def extract_data_from_mongodb(mongo_config): df =
glueContext.create_dynamic_frame.from_options( connection_type="mongodb",
connection_options=mongo_config ) return df

mongo_config = { "connection.uri": "mongodb://url", "database": "",
"collection": "", "username": "", "password": "", "partitionColumn":"_id",
"lowerBound": str(lower_bound), "upperBound": str(upper_bound) }
lower_bound = 0 upper_bound = 200 segment_size = 10 segments = [(i, min(i +
segment_size, upper_bound)) for i in range(lower_bound, upper_bound,
segment_size)] with ThreadPoolExecutor() as executor: futures =
[executor.submit(execution, segment) for segment in segments] for future in
as_completed(futures): try: future.result() except Exception as e:
print(f"Error: {e}")

I am trying to leverage the parallel threads to pull data in parallel. So
is it effective?

Reply via email to