>
> When folks have an app that needs more RAM than a typical server, what is 
> the first and most common steps they take? 
>

First step would be to put more RAM, if you've maxed out physical RAM, you 
can add swap.

I'd also look at reducing the RAM usage first. Are you sure you can't 
stream the processing? Like do you need all the data all at once to perform 
the transformation? If not, doing it in chunk would lower the RAM needed at 
any given time. Make sure you release all objects as soon as you don't need 
them anymore too.

Also, are you sure it needed 10 GB, versus it used 10 GB ? The JVM won't 
garbage collect if it doesn't need too. Did you try lowering max heap size? 
Was it failing?

I am not sure distributing this would help. Because you can only distribute 
if you can chunk your operation. And if you can chunk your operation, you 
can also transform it in chunk on a single computer by reading parts of it 
one at a time. The distribution will be of help only for speed at which you 
can process the data.

My 2 cents

On Sunday, 12 November 2017 05:18:50 UTC-8, lawrence...@gmail.com wrote:
>
> I recently worked on a minor project that nevertheless needed to use 10 
> gigs of RAM. It ran on a reasonably powerful server, yet it taxed that 
> server. And I wondered, how are people scaling up such processes? If my 
> approach was naive, what does the less naive approach look like? 
>
> I wrote a simple app that pulled data from a MySQL database, denormalized 
> it, and then stored it in ElasticSearch. It pulled about 4 million 
> documents from MySQL. Parts of the data needed to be built up into complex 
> structures (maps, vectors) before being put into ElasticSearch. In the end, 
> the 4 million rows from MySQL became 1.5 million documents in ElasticSearch.
>
> I was wondering, what if, instead of 4 million documents, I needed to 
> process 400 million documents? I assume I would have to distribute the work 
> over several machines? I'm curious what are some of the most common routes 
> for doing so? Would this be the situation where people would start to use 
> something like Onyx or Storm or Hadoop? I looked at Spark but it seems to 
> be for a different use case, more about querying that denormalizing. 
> Likewise, dumping everything onto S3 and then using something like Athena 
> seems to be more for querying than denormalizing. 
>
> For unrelated reasons, I am moving toward the architecture where all data 
> is stored in Kafka. I suppose I could write a denormalizing app that reads 
> over Kafka and builds up the data and then inserts it to ElasticSearch, 
> though I suppose, on the narrow issue of memory usage, using Kafka is no 
> different than using using MySQL.
>
> So, I'm asking about common patterns here. When folks have an app that 
> needs more RAM than a typical server, what is the first and most common 
> steps they take? 
>
>
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to