We’ve been using Apache Camel. It doesn’t scale like Flume would, but it does have lots of nice orchestration. It’s kind of between DIH and Flume, and while not focused on Solr like some of the other pipelines out there, has a lot of general purpose features that can be useful.
On Jun 12, 2014, at 12:48 AM, [email protected] wrote: > LOL I had the very same reaction Alexandre. Most of us don’t have all this > big data software sitting around, even if it is free. Complexity. > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley > > > On Thu, Jun 12, 2014 at 12:44 AM, Alexandre Rafalovitch <[email protected]> > wrote: > On Thu, Jun 12, 2014 at 7:43 AM, Wolfgang Hoschek <[email protected]> > wrote: > > On Hadoop, even the JDBC/SQL portion of DIH now seems mostly covered by a > > combination of Sqoop and MapReduceIndexerTool, and perhaps a bit of Hive. > > I appreciate that if you are in the Big Data space, you already have > most of these pieces and the installation space is not a concern > either. > > But for the others, the statement above is probably why DIH is still > around. It's an easy way to cover those essential "read from > database", "partial update from database" scenario. If one has to > setup Sqoop+Hive+other bits to get it, it's probably too much to ask > and might be too heavy to install. Certainly when they are starting > with Solr. > > The question to me is: what is the _minimum_ set of technologies > needed to be brought together to replace what DIH provides now. And > what very Solr-specific gaps it leaves (includes progress indicator, > SolrCloud, etc). And what's the space/complexity trade-off. Then, > there is the rest of the questions. Such as: "Which tool/framework has > the strongest overlapping community with Solr, so that everybody would > benefit from adopting their platform". > > I think Morphline covers most, possibly all of the Entity Processors > and Transformers in DIH. And maybe XML/File data sources too. But SQL > data source is the main issue here. I can't tell whether Flume covers > the DataSources scenario for SQL and makes it worth the upgrade. > > Regards, > Alex. > > Personal website: http://www.outerthoughts.com/ > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > ----------------------------------------------------- Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy Co-Author: Apache Solr 3 Enterprise Search Server This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
