Re: API Problem

2022-06-13 Thread Enrico Minack
Hi, you can parallelize sending the requests as follows (just sketching code): # gets an iterable of Pandas DataFrames def send(pdfs: Iterable[pd.DataFrame]) -> Iterable[pd.DataFrame]:     responses = []     # for each Pandas DataFrame (could be smaller than 4000 rows, reduce parallelism in tha

Re: API Problem

2022-06-11 Thread Sid
Hi Enrico, Thanks for helping me to understand the mistakes. My end goal is to achieve is something like the below: 1. Generate the data 2. Send the data in the batch of 4k records in one batch since the API can accept the 4k records at once. 3. The record would be as the below:

Re: API Problem

2022-06-10 Thread Enrico Minack
Hi, This adds a column with value "1" (string) *in all rows*: |df = df.withColumn("uniqueID", lit("1")) | ||This counts the rows for all rows that have the same |uniqueID|, *which are all rows*. The window does not make much sense. And it orders all rows that have the same |uniqueID |by |uniqu

Re: API Problem

2022-06-10 Thread Sid
Hi Enrico, Thanks for your time. Much appreciated. I am expecting the payload to be as a JSON string to be a record like below: {"A":"some_value","B":"some_value"} Where A and B are the columns in my dataset. On Fri, Jun 10, 2022 at 6:09 PM Enrico Minack wrote: > Sid, > > just recognized yo

Re: API Problem

2022-06-10 Thread Enrico Minack
Sid, just recognized you are using Python API here. Then ||struct(*colsListToBePassed))|| should be correct, given it takes a list of strings. Your method |call_to_cust_bulk_api| takes argument |payload|, which is a ||Column||. This is then used in |custRequestBody|. That is pretty strange

Re: API Problem

2022-06-10 Thread Enrico Minack
Hi Sid, ||finalDF = finalDF.repartition(finalDF.rdd.getNumPartitions()) .withColumn("status_for_batch", call_to_cust_bulk_api(policyUrl, to_json(struct(*colsListToBePassed | | You are calling ||withColumn|| with the result of ||call_to_cust_bulk_api|| as the second argument. That result

Re: API Problem

2022-06-10 Thread Sid
Hi Stelios, Thank you so much for your help. If I use lit it gives an error of column not iterable. Can you suggest a simple way of achieving my use case? I need to send the entire column record by record to the API in JSON format. TIA, Sid On Fri, Jun 10, 2022 at 2:51 PM Stelios Philippou w

Re: API Problem

2022-06-10 Thread Stelios Philippou
Sid Then the issue is on the data in the way you are creating them for that specific column. call_to_cust_bulk_api(policyUrl,to_json(struct(*colsListToBePassed))) Perhaps wrap that in a lit(call_to_cust_bulk_api(policyUrl,to_json(struct(*colsListToBePassed else you will need to start sendin

Re: API Problem

2022-06-10 Thread Sid
Still, it is giving the same error. On Fri, Jun 10, 2022 at 5:13 AM Sean Owen wrote: > That repartition seems to do nothing? But yes the key point is use col() > > On Thu, Jun 9, 2022, 9:41 PM Stelios Philippou wrote: > >> Perhaps >> >> >> finalDF.repartition(finalDF.rdd.getNumPartitions()).wi

Re: API Problem

2022-06-09 Thread Sean Owen
That repartition seems to do nothing? But yes the key point is use col() On Thu, Jun 9, 2022, 9:41 PM Stelios Philippou wrote: > Perhaps > > > finalDF.repartition(finalDF.rdd.getNumPartitions()).withColumn("status_for_batch > > To > > finalDF.repartition(finalDF.rdd.getNumPartitions()).withColum

Re: API Problem

2022-06-09 Thread Stelios Philippou
Perhaps finalDF.repartition(finalDF.rdd.getNumPartitions()).withColumn("status_for_batch To finalDF.repartition(finalDF.rdd.getNumPartitions()).withColumn(col("status_for_batch") On Thu, 9 Jun 2022, 22:32 Sid, wrote: > Hi Experts, > > I am facing one problem while passing a column to the m

API Problem

2022-06-09 Thread Sid
Hi Experts, I am facing one problem while passing a column to the method. The problem is described in detail here: https://stackoverflow.com/questions/72565095/how-to-pass-columns-as-a-json-record-to-the-api-method-using-pyspark TIA, Sid

Re: SparkR API problem with subsetting distributed data frame

2016-09-11 Thread Bene
? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27692.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
if (abs(dLon) > pi) { if (dLon > 0) { dLon <- -(2 * pi - dLon); } else { dLon <- (2 * pi + dLon); } } bearing <- radians.to.degrees((atan2(dLon, dPhi) + 360 )) %% 360; return (bearing); } Anything more you need? -- View this message in context: http://

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Bene
essage in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27691.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
error object of type 'S4' is not subsettable Is there any way to do such a thing in SparkR? Any help would be greatly appreciated! Also let me know if you need more information, code etc. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.c

SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Bene
f you need more information, code etc. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688.html Sent from the Apache Spark User List mailing list archive at