Re: implementation of context-aware recommender in Mahout

2015-03-10 Thread Efi Koulouri
Things got clearier with your help!

Thank you very much

On 9 March 2015 at 01:50, Ted Dunning  wrote:

> Efi,
>
> Only you can really tell which is best for your efforts.  All the rest is
> our own partially informed opinions.
>
> Pre-filtering can often be accomplished in the search context by creating
> more than one indicator field and using different combinations of
> indicators for different tasks.  For instance, you could create indicators
> for last one, two, three, five and seven days.  Then when you query the
> engine, you can pick which indicators to try.  That way the same search
> engine can embody multiple recommendation engines.
>
> I would also tend toward search-based approaches for your testing, if only
> because any deployed system is likely to use a search approach and thus
> testing that approach in your off-line testing gives you the most realistic
> results.
>
>
> On Sun, Mar 8, 2015 at 10:21 AM, Efi Koulouri 
> wrote:
>
> > Thanks for your help!
> >
> > Actually, I want to build a recommender for experimental purposes
> following
> > the pre-filtering and post-filtering approaches that I described. I have
> > already two datasets and I want to show the benefits of using a
> > "context-aware" recommender. So,the recommender is going to work offline.
> >
> > I saw that the search engine approach is very interesting but in my case
> I
> > think that building the recommender using the java classes is more
> > appropriate as I need to use both approaches (post filtering,pre
> > filtering). Am I right ?
> >
> > On 8 March 2015 at 16:08, Ted Dunning  wrote:
> >
> > > The by far easiest way to build a recommender (especially for
> production)
> > > is to use the search engine approach (what Pat was recommending).
> > >
> > > Post filtering can be done using the search engine far more easily than
> > > using Java classes.
> > >
>


Re: implementation of context-aware recommender in Mahout

2015-03-10 Thread Ted Dunning
Glad to help.

You can help us by reporting your results when you get them.

We look forward to that!


On Tue, Mar 10, 2015 at 4:22 AM, Efi Koulouri  wrote:

> Things got clearier with your help!
>
> Thank you very much
>
> On 9 March 2015 at 01:50, Ted Dunning  wrote:
>
> > Efi,
> >
> > Only you can really tell which is best for your efforts.  All the rest is
> > our own partially informed opinions.
> >
> > Pre-filtering can often be accomplished in the search context by creating
> > more than one indicator field and using different combinations of
> > indicators for different tasks.  For instance, you could create
> indicators
> > for last one, two, three, five and seven days.  Then when you query the
> > engine, you can pick which indicators to try.  That way the same search
> > engine can embody multiple recommendation engines.
> >
> > I would also tend toward search-based approaches for your testing, if
> only
> > because any deployed system is likely to use a search approach and thus
> > testing that approach in your off-line testing gives you the most
> realistic
> > results.
> >
> >
> > On Sun, Mar 8, 2015 at 10:21 AM, Efi Koulouri 
> > wrote:
> >
> > > Thanks for your help!
> > >
> > > Actually, I want to build a recommender for experimental purposes
> > following
> > > the pre-filtering and post-filtering approaches that I described. I
> have
> > > already two datasets and I want to show the benefits of using a
> > > "context-aware" recommender. So,the recommender is going to work
> offline.
> > >
> > > I saw that the search engine approach is very interesting but in my
> case
> > I
> > > think that building the recommender using the java classes is more
> > > appropriate as I need to use both approaches (post filtering,pre
> > > filtering). Am I right ?
> > >
> > > On 8 March 2015 at 16:08, Ted Dunning  wrote:
> > >
> > > > The by far easiest way to build a recommender (especially for
> > production)
> > > > is to use the search engine approach (what Pat was recommending).
> > > >
> > > > Post filtering can be done using the search engine far more easily
> than
> > > > using Java classes.
> > > >
> >
>


Re: mahout spark-itemsimilarity from command line

2015-03-10 Thread Jeff Isenhart
OK, so the solution to the issue was to add the following to my core-site.xml
    fs.file.impl    
org.apache.hadoop.fs.LocalFileSystem    The 
FileSystem for file: uris. 
     fs.hdfs.impl    
org.apache.hadoop.hdfs.DistributedFileSystem    The 
FileSystem for hdfs: uris.  

 On Monday, March 9, 2015 11:38 AM, Pat Ferrel  
wrote:
   

 Mahout is on Spark 1.1.0 (before last week) and 1.1.1 as of current master. 
Running locally should use these but make sure these are installed if you run 
with anything other than —master local

The next thing to try is see which versions of Hadoop both Mahout and Spark are 
compiled for, they must be the one you have installed. Check build instructions 
for Spark https://spark.apache.org/docs/latest/building-spark.html this is for 
1.2.1 but make sure you have source for 1.1.0 or 1.1.1
and Mahout http://mahout.apache.org/developers/buildingmahout.html

On Mar 9, 2015, at 11:20 AM, Jeff Isenhart  wrote:

Here is what I get with hadoop fs -ls
-rw-r--r--  1 username supergroup    5510526 2015-03-09 11:10 transactions.csv
Yes, I am trying to run a local version of Spark (trying to run everything 
local at the moment)
and when I run 
./bin/mahout spark-itemsimilarity -i transactions.csv -o output -fc 1 -ic 2
15/03/09 11:18:30 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: 
akka.tcp://sparkDriver@10.0.1.20:50565/user/HeartbeatReceiverException in 
thread "main" java.io.IOException: No FileSystem for scheme: hdfs at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at 
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at 
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at 
org.apache.mahout.common.HDFSPathSearch.(HDFSPathSearch.scala:36) at 
org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114)
 at scala.Option.map(Option.scala:145) at 
org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala) 

    On Monday, March 9, 2015 10:51 AM, Pat Ferrel  
wrote:


>From the command line can you run:

    hadoop fs -ls

And see SomeDir/transactions.csv? It looks like HDFS is not accessible from 
wherever you are running spark-itemsimilarity.

Are you trying to run a local version of Spark because the default is "--master 
local” This can still access a clustered HDFS if you are configured to access 
it from your machine.


On Mar 9, 2015, at 10:35 AM, Jeff Isenhart  wrote:

bump...anybody??? 

    On Wednesday, March 4, 2015 9:22 PM, Jeff Isenhart 
 wrote:


I am having issue getting a simple itemsimilarity example to work. I know 
hadoop is up and functional (ran the example mapreduce program anyway)
But when I run either of these
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o 
"hdfs://localhost:9000/users/someuser/output" -fc 1 -ic 2
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "SomeDir/output" 
-fc 1 -ic 2
and get
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs 
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at 
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at 
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at 
org.apache.mahout.common.HDFSPathSearch.(HDFSPathSearch.scala:36) at 
org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114)
 at scala.Option.map(Option.scala:145) at 
org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114)
 at 
org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
I am guessing there are some config settings I am missing
Usingmahout 1.0 Snapshothadoop 2.6.0





   

spark-item-similarity incremental update

2015-03-10 Thread Kevin Zhang
Hi,

Does anybody have any idea about how to do incremental update for the item 
similarity? I mean how I can apply latest user action data for example today's 
data? Do I have to run it again for the entire dataset?

Thanks,
Kevin

Re: spark-item-similarity incremental update

2015-03-10 Thread Pat Ferrel
The latest user actions work just fine as the query against the last time you 
ran spark-itemsimilairty. Go to the Demo site https://guide.finderbots.com and 
run through the “trainer” those things you pick are instantly used to make 
recs. spark-itemsimilarity was not re-run. The only time you really have to 
re-run it is:
1) you have new items with interactions. You can only recommend what you 
trained with.
2) you have enough new user data to significantly change the model.

There is no incremental way to update the model (yet) but it can be rerun in a 
few minutes and as I said you get recs with realtime user history, even for new 
users not in the training data.

On Mar 10, 2015, at 3:07 PM, Kevin Zhang  
wrote:

Hi,

Does anybody have any idea about how to do incremental update for the item 
similarity? I mean how I can apply latest user action data for example today's 
data? Do I have to run it again for the entire dataset?

Thanks,
Kevin



Re: spark-item-similarity incremental update

2015-03-10 Thread Kevin Zhang
I see. Thank you, Pat. 




On Tuesday, March 10, 2015 3:17 PM, Pat Ferrel  wrote:
 


The latest user actions work just fine as the query against the last time you 
ran spark-itemsimilairty. Go to the Demo site https://guide.finderbots.com and 
run through the “trainer” those things you pick are instantly used to make 
recs. spark-itemsimilarity was not re-run. The only time you really have to 
re-run it is:
1) you have new items with interactions. You can only recommend what you 
trained with.
2) you have enough new user data to significantly change the model.

There is no incremental way to update the model (yet) but it can be rerun in a 
few minutes and as I said you get recs with realtime user history, even for new 
users not in the training data.


On Mar 10, 2015, at 3:07 PM, Kevin Zhang  
wrote:

Hi,

Does anybody have any idea about how to do incremental update for the item 
similarity? I mean how I can apply latest user action data for example today's 
data? Do I have to run it again for the entire dataset?

Thanks,
Kevin

Re: spark-item-similarity incremental update

2015-03-10 Thread Pat Ferrel
Just to be clear #1 was about new items, not users. New users will work as long 
as you have history for them.

On Mar 10, 2015, at 3:34 PM, Kevin Zhang  
wrote:

I see. Thank you, Pat. 




On Tuesday, March 10, 2015 3:17 PM, Pat Ferrel  wrote:



The latest user actions work just fine as the query against the last time you 
ran spark-itemsimilairty. Go to the Demo site https://guide.finderbots.com and 
run through the “trainer” those things you pick are instantly used to make 
recs. spark-itemsimilarity was not re-run. The only time you really have to 
re-run it is:
1) you have new items with interactions. You can only recommend what you 
trained with.
2) you have enough new user data to significantly change the model.

There is no incremental way to update the model (yet) but it can be rerun in a 
few minutes and as I said you get recs with realtime user history, even for new 
users not in the training data.


On Mar 10, 2015, at 3:07 PM, Kevin Zhang  
wrote:

Hi,

Does anybody have any idea about how to do incremental update for the item 
similarity? I mean how I can apply latest user action data for example today's 
data? Do I have to run it again for the entire dataset?

Thanks,
Kevin