Hi,
you can parallelize sending the requests as follows (just sketching code):
# gets an iterable of Pandas DataFrames
def send(pdfs: Iterable[pd.DataFrame]) -> Iterable[pd.DataFrame]:
responses = []
# for each Pandas DataFrame (could be smaller than 4000 rows,
reduce parallelism in tha
Hi Enrico,
Thanks for helping me to understand the mistakes.
My end goal is to achieve is something like the below:
1. Generate the data
2. Send the data in the batch of 4k records in one batch since the API
can accept the 4k records at once.
3. The record would be as the below:
Hi,
This adds a column with value "1" (string) *in all rows*:
|df = df.withColumn("uniqueID", lit("1")) |
||This counts the rows for all rows that have the same |uniqueID|,
*which are all rows*. The window does not make much sense.
And it orders all rows that have the same |uniqueID |by |uniqu
Hi Enrico,
Thanks for your time. Much appreciated.
I am expecting the payload to be as a JSON string to be a record like below:
{"A":"some_value","B":"some_value"}
Where A and B are the columns in my dataset.
On Fri, Jun 10, 2022 at 6:09 PM Enrico Minack
wrote:
> Sid,
>
> just recognized yo
Sid,
just recognized you are using Python API here. Then
||struct(*colsListToBePassed))|| should be correct, given it takes a
list of strings.
Your method |call_to_cust_bulk_api| takes argument |payload|, which is a
||Column||. This is then used in |custRequestBody|. That is pretty
strange
Hi Sid,
||finalDF = finalDF.repartition(finalDF.rdd.getNumPartitions())
.withColumn("status_for_batch", call_to_cust_bulk_api(policyUrl,
to_json(struct(*colsListToBePassed | |
You are calling ||withColumn|| with the result of
||call_to_cust_bulk_api|| as the second argument. That result
Hi Stelios,
Thank you so much for your help.
If I use lit it gives an error of column not iterable.
Can you suggest a simple way of achieving my use case? I need to send the
entire column record by record to the API in JSON format.
TIA,
Sid
On Fri, Jun 10, 2022 at 2:51 PM Stelios Philippou
w
Sid
Then the issue is on the data in the way you are creating them for that
specific column.
call_to_cust_bulk_api(policyUrl,to_json(struct(*colsListToBePassed)))
Perhaps wrap that in a
lit(call_to_cust_bulk_api(policyUrl,to_json(struct(*colsListToBePassed
else you will need to start sendin
Still, it is giving the same error.
On Fri, Jun 10, 2022 at 5:13 AM Sean Owen wrote:
> That repartition seems to do nothing? But yes the key point is use col()
>
> On Thu, Jun 9, 2022, 9:41 PM Stelios Philippou wrote:
>
>> Perhaps
>>
>>
>> finalDF.repartition(finalDF.rdd.getNumPartitions()).wi
That repartition seems to do nothing? But yes the key point is use col()
On Thu, Jun 9, 2022, 9:41 PM Stelios Philippou wrote:
> Perhaps
>
>
> finalDF.repartition(finalDF.rdd.getNumPartitions()).withColumn("status_for_batch
>
> To
>
> finalDF.repartition(finalDF.rdd.getNumPartitions()).withColum
Perhaps
finalDF.repartition(finalDF.rdd.getNumPartitions()).withColumn("status_for_batch
To
finalDF.repartition(finalDF.rdd.getNumPartitions()).withColumn(col("status_for_batch")
On Thu, 9 Jun 2022, 22:32 Sid, wrote:
> Hi Experts,
>
> I am facing one problem while passing a column to the m
Hi Experts,
I am facing one problem while passing a column to the method. The problem
is described in detail here:
https://stackoverflow.com/questions/72565095/how-to-pass-columns-as-a-json-record-to-the-api-method-using-pyspark
TIA,
Sid
?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
if (abs(dLon) > pi) {
if (dLon > 0) {
dLon <- -(2 * pi - dLon);
} else {
dLon <- (2 * pi + dLon);
}
}
bearing <- radians.to.degrees((atan2(dLon, dPhi) + 360 )) %% 360;
return (bearing);
}
Anything more you need?
--
View this message in context:
http://
essage in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688p27691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
error
object of type 'S4' is not subsettable
Is there any way to do such a thing in SparkR? Any help would be greatly
appreciated! Also let me know if you need more information, code etc.
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.c
f you need more information, code etc.
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-API-problem-with-subsetting-distributed-data-frame-tp27688.html
Sent from the Apache Spark User List mailing list archive at
17 matches
Mail list logo