The input, which we need synthesized is a log file tsv or csv that looks like this:
u1 purchase iphone u1 purchase ipad u2 purchase nexus-tablet u2 purchase galaxy u3 purchase surface u4 purchase iphone u4 purchase ipad u1 view iphone u1 view ipad u1 view nexus-tablet u1 view galaxy u2 view iphone u2 view ipad u2 view nexus-tablet u2 view galaxy u3 view surface u4 view iphone u4 view ipad u4 view nexus-tablet This is the example in the github project solr-recommender/src/test/resources/logged-preferences/* The columns can be in any order and can have other columns interspersed. For testing it would be nice to have one action, two, and several. This implementation is in-memory for mapping ids so nothing huge as far as how many ids are generated. Ted can talk about the distribution of actions. On Jul 31, 2013, at 11:42 AM, B Lyon <[email protected]> wrote: I'm interested in helping as well. Btw I thought that what was stored in the solr fields were the llr-filtered items (ids I guess) for the could-be-recommended things. On Jul 31, 2013 2:31 PM, "Andrew Psaltis" <[email protected]> wrote: >> Assuming I've got this right, does someone want to help with these? > Pat -- I would be interested in helping in anyway needed. I believe Ted's > tool is a start, but does not handle all the case envisioned in the design > doc, although I could be wrong on this. Anyway I'm pretty open to helping > wherever needed. > > Thanks, > Andrew > > > > > > On 7/31/13 12:20 PM, "Pat Ferrel" <[email protected]> wrote: > >> A few architectural questions: http://bit.ly/18vbbaT >> >> I created a local instance of the LucidWorks Search on my dev machine. I >> can quite easily save the similarity vectors from the DRMs into docs at >> special locations and index them with LucidWorks. But to ingest the docs >> and put them in separate fields of the same index we need some new code >> (unless I've missed some Lucid config magic) that does the indexing and >> integrates with LucidWorks. >> >> I imagine two indexes. One index for the similarity matrix and optionally >> the cross-similairty matrix in two fields of type 'string'. Another index >> for users' history--we could put the docs there for retrieval by user ID. >> The user history docs then become the query on the similarity index and >> would return recommendations. Or any realtime collected or generated >> history could be used too. >> >> Is this what you imagined Ted? Especially WRT Lucid integration? >> >> Someone could probably donate their free tier EC2 instance and set this >> up pretty easily. Not sure if this would fit given free tier memory but >> maybe for small data sets. >> >> To get this available for actual use we'd need: >> 1-- An instance with an IP address somewhere to run the ingestion and >> customized LucidWorks Search. >> 2-- Synthetic data created using Ted's tool. >> 3-- Customized Solr indexing code for integration with LucidWorks? Not >> sure how this is done. I can do the Solr part but have not looked into >> Lucid integration yet. >> 4-- Flesh out the rest of Ted's outline but 1-3 will give a minimally >> running example. >> >> Assuming I've got this right, does someone want to help with these? >> >> Another way to approach this is to create a stand alone codebase that >> requires Mahout and Solr and supplies an API something like the proposed >> Mahout SGD online recommender or Myrrix. This would be easier to consume >> but would lack all the UI and inspection code of LucidWorks. >> >> >> >> > >
