I'd vote for csv then. On Jul 31, 2013, at 12:00 PM, Ted Dunning <[email protected]> wrote:
On Wed, Jul 31, 2013 at 11:20 AM, Pat Ferrel <[email protected]> wrote: A few architectural questions: http://bit.ly/18vbbaT I created a local instance of the LucidWorks Search on my dev machine. I can quite easily save the similarity vectors from the DRMs into docs at special locations and index them with LucidWorks. But to ingest the docs and put them in separate fields of the same index we need some new code (unless I've missed some Lucid config magic) that does the indexing and integrates with LucidWorks. I imagine two indexes. One index for the similarity matrix and optionally the cross-similairty matrix in two fields of type 'string'. Another index for users' history--we could put the docs there for retrieval by user ID. The user history docs then become the query on the similarity index and would return recommendations. Or any realtime collected or generated history could be used too. Is this what you imagined Ted? Especially WRT Lucid integration? Yes. And I note in a later email that you discovered how Lucid provides lots of connectors for different formats. XML is fine. I have also used CSV.
