Hi list,
Is there any documentation about how to approach cluster sizing. How do you
approach a new deployment?
Thanks,
Hi Danny,
You might need to reduce the number of partitions (or set userBlocks
and productBlocks directly in ALS). Using a large number of partitions
increases shuffle size and memory requirement. If you have 16 x 16 =
256 cores. I would recommend 64 or 128 instead of 2048.
model.recommendProduct
Hi,
I'm having trouble building a recommender and would appreciate a few
pointers.
I have 350,000,000 events which are stored in roughly 500,000 S3 files and
are formatted as semi-structured JSON. These events are not all relevant to
making recommendations.
My code is (roughly):
case class Even
w.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>> On Fri, Mar 28, 2014 at 4:55 PM, Sonal Goyal wrote:
>>
>>> Hi,
>>>
>>> I am looking for any guidelines for Spark Cluster Sizing - are there any
&g
Fri, Mar 28, 2014 at 4:55 PM, Sonal Goyal wrote:
>
>> Hi,
>>
>> I am looking for any guidelines for Spark Cluster Sizing - are there any
>> best practices or links for estimating the cluster specifications based on
>> input data size, transformations etc
onalgoyal>
On Fri, Mar 28, 2014 at 4:55 PM, Sonal Goyal wrote:
> Hi,
>
> I am looking for any guidelines for Spark Cluster Sizing - are there any
> best practices or links for estimating the cluster specifications based on
> input data size, transformations etc?
>
>
Hi,
I am looking for any guidelines for Spark Cluster Sizing - are there any
best practices or links for estimating the cluster specifications based on
input data size, transformations etc?
Thanks in advance for helping out.
Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>