Thanks for the input Pat. I ran the following command
./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2
--filter1 purchase --filter2 view
on data
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus
and now seeing this error
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at
org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at
org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) at
org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
at
org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
at
org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
at
org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
at
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
at
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
at
org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
at scala.collection.Iterator$class.foreach(Iterator.scala:727) at
org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159)
at
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)Caused by:
java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at
java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at
java.lang.Runtime.loadLibrary0(Runtime.java:849) at
java.lang.System.loadLibrary(System.java:1088) at
org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 26 more
On Thursday, March 12, 2015 10:35 AM, Pat Ferrel <[email protected]>
wrote:
There are many ways to structure the input. The spark-itemsimilarity driver
can take only two actions, though the internal code, if you want to use it as a
library, will take any number. The CLI driver can optionally take input of the
for you mention but will extract a primary and single secondary action per
execution. If you have more than two actions you can run the driver once for
every secondary action or use the lib interface.
You can have your interactions in separate dirs of the form I mentioned in the
original answer, in which case you pass in -i and -i2 params. If you want to
mix actions in the same files, use the format you describe:
u1,item1,action1
u1,item10,action2
u1,item500,action3
u2,item2,action1
u2,item500,action3
...
The columns can be moved around and specified on the CLI. To use the above with
the CLI you would have to process action1 and action2 with one execution, and
action1 and action3 with another execution. This will create 4 outputs the two
“similarity-matrix” dirs will be identical. This would give you indicators for
action1 (actually two identical indicators) action2 and action3
On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <[email protected]> wrote:
<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>
On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <[email protected]>
wrote:
spark-itemsimilarity takes tuples
user-id,item-id
You are looking at the collected input as a matrix. it would be collected from
something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...
On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <[email protected]> wrote:
I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files
output2/indicator-matrix
Any ideas?