Re: How to perform distributed compute in similar way to Spark vector UDF

2019-11-20 Thread Denis Magda
If you need to traverse over the local data on all the nodes then broadcast a compute task to all of them and use ScanQuery with setLocal flag set to true. Also, you can load balance the load by going for a similar approach with an affinity call per partition: https://www.gridgain.com/docs/latest/

RE: Re: How to perform distributed compute in similar way to Spark vector UDF

2019-11-18 Thread Alexandr Shapkin
e/data-modeling/affinity-collocation#configuring-affinity-key From: camer314Sent: Monday, November 18, 2019 6:43 AMTo: user@ignite.apache.orgSubject: Re: How to perform distributed compute in similar way to Spark vector UDF Reading a little more in the Java docs about AffinityKey, I am thinking that,muc

Re: How to perform distributed compute in similar way to Spark vector UDF

2019-11-17 Thread camer314
Reading a little more in the Java docs about AffinityKey, I am thinking that, much like vector UDF batch sizing, one way I could easily achieve my result is to batch my rows into affinity keys. That is, for every 100,000 rows the affinity key changes for example. So cache keys [0...9] have aff

How to perform distributed compute in similar way to Spark vector UDF

2019-11-17 Thread camer314
I asked this question on StackOverflow However I probably put too much weight on Spark. My question really is, how can I load in a large CSV file to t