I don't know about MySQL specifics, but I did a similar job on top of
postgresql and relied on its built in temporary tables, indexes and
spill-to-disk functionality, the dataset/computation in question was
already blowing up a Ruby process. There's no need to reinvent it in
clojure if the database can do the work and it's already
deployed/operationalized.

We also made heavy use of js and plv8 on that project for complex map ops.
The tests were written in clojure :).

On Nov 12, 2017 8:18 AM, <lawrence.krub...@gmail.com> wrote:

> I recently worked on a minor project that nevertheless needed to use 10
> gigs of RAM. It ran on a reasonably powerful server, yet it taxed that
> server. And I wondered, how are people scaling up such processes? If my
> approach was naive, what does the less naive approach look like?
>
> I wrote a simple app that pulled data from a MySQL database, denormalized
> it, and then stored it in ElasticSearch. It pulled about 4 million
> documents from MySQL. Parts of the data needed to be built up into complex
> structures (maps, vectors) before being put into ElasticSearch. In the end,
> the 4 million rows from MySQL became 1.5 million documents in ElasticSearch.
>
> I was wondering, what if, instead of 4 million documents, I needed to
> process 400 million documents? I assume I would have to distribute the work
> over several machines? I'm curious what are some of the most common routes
> for doing so? Would this be the situation where people would start to use
> something like Onyx or Storm or Hadoop? I looked at Spark but it seems to
> be for a different use case, more about querying that denormalizing.
> Likewise, dumping everything onto S3 and then using something like Athena
> seems to be more for querying than denormalizing.
>
> For unrelated reasons, I am moving toward the architecture where all data
> is stored in Kafka. I suppose I could write a denormalizing app that reads
> over Kafka and builds up the data and then inserts it to ElasticSearch,
> though I suppose, on the narrow issue of memory usage, using Kafka is no
> different than using using MySQL.
>
> So, I'm asking about common patterns here. When folks have an app that
> needs more RAM than a typical server, what is the first and most common
> steps they take?
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to