@Pat, I am aware of your blog and of Ted practical machine learning books and webinars. I have learn a lot from you guys ;)
@Ted, It is 3 nodes small cluster for POC. Spark executer is given 2g and yarn is configured accordingly. I am trying to avoid spark memory caching. @Simon, I am using mahout and not spark because I need similarity not matrix factorization. Actually, the appoach of spark-itemsimilarity is giving a good way for augmenting content recommendations with collaborative features. I found their approach more suitable in case of building lambda architecture supporting recommendations based on content, collaborative features and recent interactive events in addition to other injected rules. I think predefined recommendation server cant fit all requirement at once, for these reasons I am trying to use mahout. Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/> Mob: +962 790471101 | Phone: +962 65821236 | Skype: [email protected] | [email protected] <[email protected]> | www.souq.com Nouh Al Romi Street, Building number 8, Amman, Jordan On Tue, Dec 23, 2014 at 5:23 PM, AlShater, Hani <[email protected]> wrote: > @Pat, Thanks for your answers. It seems that I have cloned the snapshot > before the feature of configuring spark was added. It worked now in the > local mode. Unfortunately, after trying the new snapshot and spark, > submitting to the cluster in yarn-client mode raise the following error: > Exception in thread "main" java.lang.AbstractMethodError > at org.apache.spark.Logging$class.log(Logging.scala:52) > at org.apache.spark.deploy.yarn.Client.log(Client.scala:39) > at org.apache.spark.Logging$class.logInfo(Logging.scala:59) > at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:39) > at > org.apache.spark.deploy.yarn.Client.logClusterResourceDetails(Client.scala:103) > at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:60) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:81) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:323) > at > org.apache.mahout.sparkbindings.package$.mahoutSparkContext(package.scala:95) > at > org.apache.mahout.drivers.MahoutSparkDriver.start(MahoutSparkDriver.scala:81) > at > org.apache.mahout.drivers.ItemSimilarityDriver$.start(ItemSimilarityDriver.scala:128) > at > org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:211) > at > org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) > at > org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) > at scala.Option.map(Option.scala:145) > at > org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) > at > org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala) > > and submitting in yarn-cluster mode raise this error: > Exception in thread "main" org.apache.spark.SparkException: YARN mode not > available ? > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1571) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:310) > at > org.apache.mahout.sparkbindings.package$.mahoutSparkContext(package.scala:95) > at > org.apache.mahout.drivers.MahoutSparkDriver.start(MahoutSparkDriver.scala:81) > at > org.apache.mahout.drivers.ItemSimilarityDriver$.start(ItemSimilarityDriver.scala:128) > at > org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:211) > at > org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) > at > org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) > at scala.Option.map(Option.scala:145) > at > org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) > at > org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend > at java.net.URLClassLoader$1.run(URLClassLoader.java:217) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:205) > at java.lang.ClassLoader.loadClass(ClassLoader.java:323) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) > at java.lang.ClassLoader.loadClass(ClassLoader.java:268) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1566) > ... 10 more > > My cluster consists from 3 nodes, andi using hadoop 2.4.0. I have get > spark 1.1.0 and mahout-snapshot, compile, package and install them to the > local maven repo. Am I missing something ? > > Thanks again > > > > Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/> > Mob: +962 790471101 | Phone: +962 65821236 | Skype: > [email protected] | [email protected] <[email protected]> | > www.souq.com > Nouh Al Romi Street, Building number 8, Amman, Jordan > > > On Tue, Dec 23, 2014 at 11:17 AM, hlqv <[email protected]> wrote: > >> Hi Pat Ferrel >> Use option --omitStrength to output indexable data but this lead to less >> accuracy while querying due to omit similar values between items. >> Whether can put these values in order to improve accuracy in a search >> engine >> >> On 23 December 2014 at 02:17, Pat Ferrel <[email protected]> wrote: >> >> > Also Ted has an ebook you can download: >> > mapr.com/practical-machine-learning >> > >> > On Dec 22, 2014, at 10:52 AM, Pat Ferrel <[email protected]> wrote: >> > >> > Hi Hani, >> > >> > I recently read about Souq.com. A vey promising project. >> > >> > If you are looking at the spark-itemsimilarity for ecommerce type >> > recommendations you may be interested in some slide decs and blog posts >> > I’ve done on the subject. >> > Check out: >> > >> > >> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ >> > >> > >> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ >> > >> > >> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ >> > >> > Also I put up a demo site that uses some of these techniques: >> > https://guide.finderbots.com >> > >> > Good luck, >> > Pat >> > >> > On Dec 21, 2014, at 11:44 PM, AlShater, Hani <[email protected]> >> wrote: >> > >> > Hi All, >> > >> > I am trying to use spark-itemsimilarity on 160M user interactions >> dataset. >> > The job launches and running successfully for small data 1M action. >> > However, when trying for the larger dataset, some spark stages >> continuously >> > fail with out of memory exception. >> > >> > I tried to change the spark.storage.memoryFraction from spark default >> > configuration, but I face the same issue again. How could I configure >> spark >> > when using spark-itemsimilarity, or how to overcome this out of memory >> > issue. >> > >> > Can you please advice ? >> > >> > Thanks, >> > Hani.​​ >> > ​ >> > >> > Hani Al-Shater | Data Science Manager - Souq.com <http://souq.com/> >> > Mob: +962 790471101 | Phone: +962 65821236 | Skype: >> > [email protected] | [email protected] <[email protected]> | >> > www.souq.com >> > Nouh Al Romi Street, Building number 8, Amman, Jordan >> > >> > -- >> > >> > >> > *Download free Souq.com <http://souq.com/> mobile apps for iPhone >> > <https://itunes.apple.com/us/app/id675000850>, iPad >> > <https://itunes.apple.com/ae/app/souq.com/id941561129?mt=8>, Android >> > <https://play.google.com/store/apps/details?id=com.souq.app> or Windows >> > Phone >> > < >> > >> http://www.windowsphone.com/en-gb/store/app/souq/63803e57-4aae-42c7-80e0-f9e60e33b1bc >> > >> > **and never >> > miss a deal! * >> > >> > >> > >> > > -- *Download free Souq.com <http://souq.com/> mobile apps for iPhone <https://itunes.apple.com/us/app/id675000850>, iPad <https://itunes.apple.com/ae/app/souq.com/id941561129?mt=8>, Android <https://play.google.com/store/apps/details?id=com.souq.app> or Windows Phone <http://www.windowsphone.com/en-gb/store/app/souq/63803e57-4aae-42c7-80e0-f9e60e33b1bc> **and never miss a deal! *
