The input, which we need synthesized is a log file tsv or csv that looks like 
this:

u1      purchase        iphone
u1      purchase        ipad
u2      purchase        nexus-tablet
u2      purchase        galaxy
u3      purchase        surface
u4      purchase        iphone
u4      purchase        ipad
u1      view    iphone
u1      view    ipad
u1      view    nexus-tablet
u1      view    galaxy
u2      view    iphone
u2      view    ipad
u2      view    nexus-tablet
u2      view    galaxy
u3      view    surface
u4      view    iphone
u4      view    ipad
u4      view    nexus-tablet

This is the example in the github project 
solr-recommender/src/test/resources/logged-preferences/*

The columns can be in any order and can have other columns interspersed.

For testing it would be nice to have one action, two, and several. This 
implementation is in-memory for mapping ids so nothing huge as far as how many 
ids are generated. 

Ted can talk about the distribution of actions.

On Jul 31, 2013, at 11:42 AM, B Lyon <[email protected]> wrote:

I'm interested in helping as well.
Btw I thought that what was stored in the solr fields were the llr-filtered
items (ids I guess) for the could-be-recommended things.
On Jul 31, 2013 2:31 PM, "Andrew Psaltis" <[email protected]>
wrote:

>> Assuming I've got this right, does someone want to help with these?
> Pat -- I would be interested in helping in anyway needed. I believe Ted's
> tool is a start, but does not handle all the case envisioned in the design
> doc, although I could be wrong on this. Anyway I'm pretty open to helping
> wherever needed.
> 
> Thanks,
> Andrew
> 
> 
> 
> 
> 
> On 7/31/13 12:20 PM, "Pat Ferrel" <[email protected]> wrote:
> 
>> A few architectural questions: http://bit.ly/18vbbaT
>> 
>> I created a local instance of the LucidWorks Search on my dev machine. I
>> can quite easily save the similarity vectors from the DRMs into docs at
>> special locations and index them with LucidWorks. But to ingest the docs
>> and put them in separate fields of the same index we need some new code
>> (unless I've missed some Lucid config magic) that does the indexing and
>> integrates with LucidWorks.
>> 
>> I imagine two indexes. One index for the similarity matrix and optionally
>> the cross-similairty matrix in two fields of type 'string'. Another index
>> for users' history--we could put the docs there for retrieval by user ID.
>> The user history docs then become the query on the similarity index and
>> would return recommendations. Or any realtime collected or generated
>> history could be used too.
>> 
>> Is this what you imagined Ted? Especially WRT Lucid integration?
>> 
>> Someone could probably donate their free tier EC2 instance and set this
>> up pretty easily. Not sure if this would fit given free tier memory but
>> maybe for small data sets.
>> 
>> To get this available for actual use we'd need:
>> 1-- An instance with an IP address somewhere to run the ingestion and
>> customized LucidWorks Search.
>> 2-- Synthetic data created using Ted's tool.
>> 3-- Customized Solr indexing code for integration with LucidWorks? Not
>> sure how this is done. I can do the Solr part but have not looked into
>> Lucid integration yet.
>> 4-- Flesh out the rest of Ted's outline but 1-3 will give a minimally
>> running example.
>> 
>> Assuming I've got this right, does someone want to help with these?
>> 
>> Another way to approach this is to create a stand alone codebase that
>> requires Mahout and Solr and supplies an API something like the proposed
>> Mahout SGD online recommender or Myrrix. This would be easier to consume
>> but would lack all the UI and inspection code of LucidWorks.
>> 
>> 
>> 
>> 
> 
> 

Reply via email to