Re: demo spark-itemsimilarity; empty output

Jeff Isenhart Wed, 18 Mar 2015 16:11:58 -0700

Thanks for the input Pat. I ran the following command 
./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 
--filter1 purchase --filter2 view
on data
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus
and now seeing this error
java.lang.reflect.InvocationTargetException at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at 
org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at 
org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at 
org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) at 
org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at 
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
 at 
org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
 at 
org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) 
at 
org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) 
at 
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
 at 
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
 at 
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727) at 
org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159)
 at 
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) 
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
at org.apache.spark.scheduler.Task.run(Task.scala:54) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745)Caused by: 
java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at 
java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at 
java.lang.Runtime.loadLibrary0(Runtime.java:849) at 
java.lang.System.loadLibrary(System.java:1088) at 
org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) 
... 26 more




     On Thursday, March 12, 2015 10:35 AM, Pat Ferrel <[email protected]> 
wrote:
   

 There are many ways to structure the input. The spark-itemsimilarity driver 
can take only two actions, though the internal code, if you want to use it as a 
library, will take any number. The CLI driver can optionally take input of the 
for you mention but will extract a primary and single secondary action per 
execution. If you have more than two actions you can run the driver once for 
every secondary action or use the lib interface.

You can have your interactions in separate dirs of the form I mentioned in the 
original answer, in which case you pass in -i and -i2 params. If you want to 
mix actions in the same files, use the format you describe:

u1,item1,action1
u1,item10,action2
u1,item500,action3
u2,item2,action1
u2,item500,action3
...

The columns can be moved around and specified on the CLI. To use the above with 
the CLI you would have to process action1 and action2 with one execution, and 
action1 and action3 with another execution. This will create 4 outputs the two 
“similarity-matrix” dirs will be identical. This would give you indicators for 
action1 (actually two identical indicators) action2 and action3

On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <[email protected]> wrote:

<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
 For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>


    On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <[email protected]> 
wrote:


spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from 
something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <[email protected]> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?

Re: demo spark-itemsimilarity; empty output

Reply via email to