On Fri, Apr 16, 2010 at 9:06 AM, Kevin Smith <ksm...@basho.com> wrote: > The only restrictions on map/reduce functions are a) they must return lists > and b) the entire job must execute before the timeout period elapses (60 > seconds is the default). Javascript functions have the additional > restriction of not being able to call back into Erlang code due to the > current state of Erlang/Javascript integration.
Ok. > The easiest way to do this would be to write your function in Erlang and use > either httpc (packaged with Erlang) or ibrowse > (http://github.com/cmullaparthi/ibrowse) to make the HTTP calls. If you're > comfortable with Erlang & OTP I'd recommend making a separate OTP application > to handle the HTTP calls and provide an API for your map/reduce functions to > use This design moves the HTTP calls out of the query flow and prevents a > hanging HTTP call from timing out a query. Make sense. Thanks for the pointers. I definitely have to do some experimentation with this idea and see if adding such triggers directly from map functions can have a significant impact on the data post-processing throughput out of the mapreduce framework. Another option, in line with your separate OTP application suggestion, would be to use intermediate queuing (rabbitmq, redis, ...) and just queue results which would be picked up by another external process in charge of feeding into the elasticsearch indexer. The process can then be tuned independently to parallelize documents inserts and optimize this for your specific elasticsearch cloud characteristics. I think this approach could be more efficient, in the case of very large result sets, than doing a simple result set aggregation and re-feeding. There is also the option of chunked/streaming results set to consider. In the same line of thoughts I could just setup a listener on the result stream and feed it back into the intermediate queuing. Thanks, Colin > > --Kevin > On Apr 15, 2010, at 11:20 PM, Colin Surprenant wrote: > >> I'll rephrase my question: >> >> Is it possible to call external http services from a map function in >> JavaScript and/or Erlang? Any comments/pointers appreciated. >> >> Thanks, >> Colin >> >> On Thu, Apr 15, 2010 at 12:44 PM, Colin Surprenant <colin....@gmail.com> >> wrote: >>> Hi, >>> >>> I am trying to figure what the fastest way would be to send a >>> mapreduce result set for indexing into a searchengine system like >>> elasticsearch. >>> >>> Of course, the trivial way to do it would be to simply gather the >>> result set and push it back into the indexer using their http/rest >>> api. >>> >>> Now, elasticsearch is distributed by nature and will allow parallel >>> queries for document insertion for indexing. One way to leverage this >>> would be to actually directly push a document from within a map >>> function into the indexer using their rest api. This would completely >>> distribute the index creation process and leverage the parallelism of >>> elasticsearch. >>> >>> Would this be possible? >>> >>> Is this something I could do using the JavaScript mapreduce? and/or Erlang? >>> >>> Thanks, >>> Colin >>> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com