On Wed, Jul 31, 2013 at 11:20 AM, Pat Ferrel <[email protected]> wrote:

> A few architectural questions: http://bit.ly/18vbbaT
>
> I created a local instance of the LucidWorks Search on my dev machine. I
> can quite easily save the similarity vectors from the DRMs into docs at
> special locations and index them with LucidWorks. But to ingest the docs
> and put them in separate fields of the same index we need some new code
> (unless I've missed some Lucid config magic) that does the indexing and
> integrates with LucidWorks.
>
> I imagine two indexes. One index for the similarity matrix and optionally
> the cross-similairty matrix in two fields of type 'string'. Another index
> for users' history--we could put the docs there for retrieval by user ID.
> The user history docs then become the query on the similarity index and
> would return recommendations. Or any realtime collected or generated
> history could be used too.
>
> Is this what you imagined Ted? Especially WRT Lucid integration?
>

Yes.  And I note in a later email that you discovered how Lucid provides
lots of connectors for different formats.  XML is fine.  I have also used
CSV.


> Someone could probably donate their free tier EC2 instance and set this up
> pretty easily. Not sure if this would fit given free tier memory but maybe
> for small data sets.
>

It should fit, actually.

I can donate a real-ish machine as well.


>
> To get this available for actual use we'd need:
> 1-- An instance with an IP address somewhere to run the ingestion and
> customized LucidWorks Search.
> 2-- Synthetic data created using Ted's tool.
> 3-- Customized Solr indexing code for integration with LucidWorks? Not
> sure how this is done. I can do the Solr part but have not looked into
> Lucid integration yet.
> 4-- Flesh out the rest of Ted's outline but 1-3 will give a minimally
> running example.
>
> Assuming I've got this right, does someone want to help with these?
>

I will work on synthetic data later today.  I have a tool that does this
for drill.  I plan to pull down musicBrainz and use the tags on artists as
hidden variables to drive synthetic user behavior.  Should produce
reasonable looking recommendations.

Another way to approach this is to create a stand alone codebase that
> requires Mahout and Solr and supplies an API something like the proposed
> Mahout SGD online recommender or Myrrix. This would be easier to consume
> but would lack all the UI and inspection code of LucidWorks.
>

I think that for a demo, the inspection is crucial.

Adding the API is easy and can even be done in the same instance as LW is
running.

Reply via email to