On Mon, Jun 9, 2014 at 1:51 AM, Mikhail Khludnev <[email protected]
> wrote:

> - joins/caching - seem possible with Morphlines but still there is no such
> command
> - delta import - scenario we don't need to forget to handle it
> - threads (it's completely out Morphline's concerns)
> - distributed processing - it would be great if we can partition
> datasource

Here are few things to followup. The gap is that Morphline is build to be
invoked at map stage of Hadoop, hence it's really slim itself and relies on
core H's features. Thus, we either need to build such harness yourselves,
reuse old DIH ones, or check Flume (tbc). So, TODO list also includes:
- web IDE for editing DSL;
- long running task tracking/status check and heartbeat with REST access;
- let's think one step forward - consider threads. It suppose, the most
efficient and safe idea is to: partition datasource, spawn few threads with
own Morphline pipe in it. Then, it's better to call SolrServer concurrently
via
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/loadSolr

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <[email protected]>

Reply via email to