from:"Ted Dunning"

Re: distributed cholesky on mahout

2018-04-19 Thread Ted Dunning

There was a variant of cholesky decomposition in Mahout at one time not so long ago. I would guess that it is still there. It is difficult to make a truly distributed version of QR decomposition, but for the purposes of the randomized SVD in Mahout, it wasn't actually necessary to have a true QR.

Re: "LLR with time"

2017-11-12 Thread Ted Dunning

ffic driven > from external sources. > > Thanks for the detailed hints - now it's time to see what comes out of > this. > > Johannes > > On Sun, Nov 12, 2017 at 7:52 AM, Ted Dunning > wrote: > > > Events have the natural good quality that having a cold star

Re: "LLR with time"

2017-11-11 Thread Ted Dunning

talk from trevor grant but I'm really eager to attack > this after years of batch :) > > Thanks for your thoughts, I am happy I can rule something out given the > domain (poisson llr). Luckily the domain I'm working on is event > recommendations, so there is a natural de

Re: "LLR with time"

2017-11-11 Thread Ted Dunning

yield “hot in > Greece” > I think that this is a good approach. > > Ted’s “Christmas video” tag is what I was calling a business rule and can > be added to either of the above techniques. > But the (not) hotness feature might help with automated this. > > On Nov 11, 2

Re: "LLR with time"

2017-11-11 Thread Ted Dunning

So ... there are a few different threads here. 1) LLR but with time. Quite possible, but not really what Johannes is talking about, I think. See http://bit.ly/poisson-llr for a quick discussion. 2) time varying recommendation. As Johannes notes, this can make use of windowed counts. The problem i

Re: spark-itemsimilarity scalability / Spark parallelism issues (SimilarityAnalysis.cooccurrencesIDSs)

2017-08-16 Thread Ted Dunning

It is common with large numerical codes that things run faster in memory on just a few cores if the communication required outweighs the parallel speedup. The issue is that memory bandwidth is slower than the arithmetic speed by a very good amount. If you just have to move stuff into the CPU and m

Re: New logo

2017-05-06 Thread Ted Dunning

On Sat, May 6, 2017 at 2:43 PM, Scott C. Cote wrote: > Will you be wearing “one of those t-shirts” on Monday in Houston :) ? > Not likely. It is in the archive.

Re: New logo

2017-05-06 Thread Ted Dunning

nologies used it back in the 90s, however they used a > >very > >> > specific red one, and this isn't a deal breaker for me. > >> > > >> > Other thoughts: > >> > Based on the tattoo I saw- one could make an Enso using old mahout > >col

Re: New logo

2017-04-27 Thread Ted Dunning

I haven't been active enough to feel good about an out and out -1. Put me as -0 On Thu, Apr 27, 2017 at 3:54 PM, Pat Ferrel wrote: > Fair enough, I think Trevor feels the same. > > The blue man can continue, all it takes is a -1 > > > On Apr 27, 2017, at 3:50 PM, Te

Re: New logo

2017-04-27 Thread Ted Dunning

or opinion is welcome input) or > would you like to discontinue the contest. If the later, -1 now. > > > On Apr 27, 2017, at 3:42 PM, Ted Dunning wrote: > > I thought that none of the proposals were worth continuing with. > > > > On Thu, Apr 27, 2017 at 3:36 PM, Pat

Re: New logo

2017-04-27 Thread Ted Dunning

I thought that none of the proposals were worth continuing with. On Thu, Apr 27, 2017 at 3:36 PM, Pat Ferrel wrote: > Yes, -1 means you hate them all or think the designers are not worth > paying. We have to pay to continue, I’ll foot the bill (donations > appreciated) but don’t want to unles

Re: Reg:-Integrating Mahout with Solr

2017-04-02 Thread Ted Dunning

nts to be indexed by Solr has fairly large content in it and > 100+ users searching within it(once the solr search tool goes live). > Kindly guide me on the integration steps for mahout with Solr(with respect > all the stats mentioned). > > Thanks and Regards, > Arun > > On 2

Re: Reg:-Integrating Mahout with Solr

2017-04-01 Thread Ted Dunning

to use the LAN path for configurations and > index.I can use the larger document base. > > Thanks and Regards, > Arun > > On 2 April 2017 at 07:00, Ted Dunning wrote: > > > On Sat, Apr 1, 2017 at 6:21 PM, arun abraham > > wrote: > > > > > As

Re: Reg:-Integrating Mahout with Solr

2017-04-01 Thread Ted Dunning

On Sat, Apr 1, 2017 at 6:21 PM, arun abraham wrote: > As a first step I am trying to recommend min of two documents(As my > Solr document index is ~100 docs). > This is kind of weird. Can you say why you have so very few documents? There may be something special going on that will make this w

Re: Marketing

2017-03-24 Thread Ted Dunning

On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel wrote: > maybe we should drop the name Mahout altogether. I have been told that there is a cool secondary interpretation of Mahout as well. I think that the Hebrew word is pronounced roughly like Mahout. מַהוּת The cool thing is that this word mean

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Ted Dunning

>From my perspective, the state of the art of machine learning is with systems like Tensorflow and dl4j. If you can deal with the limits of a non-clustered GPU system, then Theano and Cafe are very useful. Keras papers over the difference between different back-ends nicely. Tensorflow and Theano c

Re: Scaling up spark Iitem similarity on big data data sets

2016-06-23 Thread Ted Dunning

This actually sounds like a very small problem. My guess is that there are bad settings for the interaction and frequency cuts. On Thu, Jun 23, 2016 at 11:07 AM, Pat Ferrel wrote: > In addition to increasing downsampling there are some other things to > note. The original OOM was caused by th

Re: mahout tf-idf vs lucene tf-idf

2016-06-04 Thread Ted Dunning

On Sat, Jun 4, 2016 at 10:14 AM, forme book wrote: > On the (Lucene side) has already by default this implementations, what I do > struggle to understand what is the advantage of having lucene.vector in > mahout when Lucene offer that feature out of the box ? > > Maybe I'm missing something big b

Re: LLR quick clarification

2016-05-12 Thread Ted Dunning

It just means that there is an association. Causation is much more difficult to ascertain. On Wed, May 4, 2016 at 6:06 AM, Nikaash Puri wrote: > Hi, > > Just wanted to clarify a small doubt. On running LLR with primary > indicator as view and secondary indicator as purchase. Say, one line of t

Re: Matrix inversion

2016-05-05 Thread Ted Dunning

Mahout is considerably better at sparse operations and optimizations than dense ones. Beyond that, I would expect that you would do better with traditional math libraries. And, are you really trying to invert a matrix? The common maxim is that this implies an error in your method because inversio

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Ted Dunning

On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal wrote: > Actually, I need to use fuzzy clustering to cluster the sentence in my > research. I found fuzzy k clustering algorithm in Apache Mahout, thus, I > am trying to use it for my purpose. > That's great. But that code is no longer supporte

Re: New Mahout "Samsara" Book

2016-02-25 Thread Ted Dunning

The project has moved in various ways since MiA was first published, but just covering Samsara leaves a lot of recommendation code that needs to be covered. There is room for another book. On Thu, Feb 25, 2016 at 9:32 AM, Suneel Marthi wrote: > The Mahout project has diverged from 'Mahout in

Re: Algorithms of prediction

2016-02-25 Thread Ted Dunning

On Thu, Feb 25, 2016 at 6:52 AM, wrote: > Thank you for your answer > What other tools you advise me to use? > Do you recommend Rhadoop? > Try h2o instead. Good R interface. Decent model building.

Re: What's the mr item-based recommend algorithm essay?

2016-02-20 Thread Ted Dunning

See here: https://ssc.io/pdf/rec11-schelter.pdf On Fri, Feb 19, 2016 at 3:16 AM, Lee S wrote: > Hi: >Does anybody know which paper the mr algorithm is based on? >

Re: Document similarity

2016-02-14 Thread Ted Dunning

Did you want textual similarity? Or semantic similarity? The actual semantics of a message can be opaque from the content, but clear from the usage. On Sun, Feb 14, 2016 at 5:29 AM, Charles Earl wrote: > David, > LDA or LSI can work quite nicely for similarity (YMMV of course depending > on

Re: Mahout - Recommenditemvalue with magnitude of 1

2015-11-29 Thread Ted Dunning

On Sun, Nov 29, 2015 at 9:36 PM, Niklas Ekvall wrote: > My conclusion is that recommenditembased in Mahout works better for ratings > than binary data, what is your conclusions? > Still operator error somewhere. Binary data works much better as a real recommender.

Re: Efficiently writing all the recommendation to a file

2015-11-20 Thread Ted Dunning

There are a few problems that you have. 1) user-based recommendation is often slower than item-based (sometimes MUCH slower). This can make a 2-10x difference in practice 2) pre-computing recommendations is usually much less efficient than computing them on the fly (because typically few users w

Re: Haters get Love too

2015-11-03 Thread Ted Dunning

On Tue, Nov 3, 2015 at 3:20 PM, Pat Ferrel wrote: > For the strict out there we did not directly isolate the two actions, > which is work remaining so some of the lift might be due to just having > more data but it’s a really good first step because more data doesn't > always translate to better

Re: Haters get Love too

2015-11-03 Thread Ted Dunning

No. Not entirely surprising, but it is *really* nice to get some public results on this. The treatment of the negatives as a separate cross term instead of just lumping them together is a very significant difference. On Tue, Nov 3, 2015 at 3:42 PM, Peter Jaumann wrote: > Fascinating!!! Not too

Re: matrix inversion in plan ?

2015-10-08 Thread Ted Dunning

> > > On Monday, October 5, 2015 2:25 PM, Ted Dunning < > ted.dunn...@gmail.com> wrote: > > > That isn't enough detail. > > How do you mean to compute degrees of freedom? WHy do you need the inverse > to do this? > > Where did you get this al

Re: matrix inversion in plan ?

2015-10-04 Thread Ted Dunning

ore than interested to extend to complex double, when the > solver is ready for double data type. thanks, canal > > > On Monday, October 5, 2015 2:02 PM, Ted Dunning < > ted.dunn...@gmail.com> wrote: > > > On Sun, Oct 4, 2015 at 10:32 PM, go canal > wrote: &g

Re: matrix inversion in plan ?

2015-10-04 Thread Ted Dunning

On Sun, Oct 4, 2015 at 10:32 PM, go canal wrote: > in fact i need to support both double and complex double for either > distributed memory based or out-of-core. Ahh... Well Mahout doesn't support complex anything. So this isn't going to help you.

Re: matrix inversion in plan ?

2015-10-04 Thread Ted Dunning

Jaumann < > peter.jauma...@gmail.com> wrote: > > > This should be done with a matrix solver indeed!!! > > > > On Oct 4, 2015 11:53 AM, "Ted Dunning" wrote: > > > > > > It is almost certain that starting with an inversion is a serious e

Re: matrix inversion in plan ?

2015-10-04 Thread Ted Dunning

version of a very large matrix. will have to revert back to scalapack or MR > based solutions I guess. > thanks, canal > > > On Saturday, October 3, 2015 11:31 PM, Ted Dunning > wrote: > > > I doubt seriously that Samsara will support matrix inversion per se.

Re: matrix inversion in plan ?

2015-10-03 Thread Ted Dunning

I doubt seriously that Samsara will support matrix inversion per se. The problem is a) it densifies sparse matrices b) it is much more costly than solving a linear system Samsara is roughly memory based, but different back-ends will try to spill to disk if necessary. It is likely that the resul

Re: Modifying kmeans algo

2015-09-23 Thread Ted Dunning

On Tue, Sep 22, 2015 at 5:51 PM, Ankit Goel wrote: > What I wanted to do was modify the clustering algorithm, in hopes of > experimenting with different versions of it. I'm not much hung over the MR > part of things, rather the clustering algo itself. > Have at it. All yours. > Secondly or a

Re: Modifying kmeans algo

2015-09-21 Thread Ted Dunning

On Mon, Sep 21, 2015 at 4:44 PM, Ankit Goel wrote: > If one wanted to modify the kmeans algorithm given with the mahout package, > how would/should one go about doing it? > If you want to modify the old map reduce code, please go right ahead. The project members will not be maintaining that cod

Re: [mahout 0.9 | k-means] methodology for selecting k to cluster very large datasets

2015-09-15 Thread Ted Dunning

My own feeling is that the right answer is to look at average squared distance on your training data and on held out data. As long as these values are nearly the same, you likely have a smaller (or equal) than optimal value of k. When the average squared distance is significantly less on the trai

Re: Is it possible to use Mahout Random Forests work with image(pixel) data in libsvm format?

2015-08-18 Thread Ted Dunning

Seems like a simple format translation. Why not just reformat the input file? On Tue, Aug 18, 2015 at 7:42 PM, Zhou Jiang wrote: > Hi All, > > The Default Random Forests MapReduce works with UCI glass data. > > ID f1 f2 f3 … fn L > > Is there a way to make it work with image data in libsvm fo

Re: Matrix inverse

2015-08-10 Thread Ted Dunning

Yes. That will solve a matrix problem. But it won't handle complex values. Check out JBlas for an in-core implementation that handles complex double matrices. For out of core version of QR, perhaps Dmitriy can turn up his writeup on the subject of block diagonal QR. On Mon, Aug 10, 2015 at 9:

Re: Kmeans clusterdump Interpretation

2015-07-20 Thread Ted Dunning

gt; in > > > the articles are from different news sources but are about the exact > same > > > thing. Intuitively it seems that these articles would get grouped > > > together. Any suggestions how I should go about that? So far I'm using > > > nutch to crawl, solr to

Re: Kmeans clusterdump Interpretation

2015-07-20 Thread Ted Dunning

The most central point in a cluster is often referred to as a medoid (similar to median, but multi-dimensional). The Mahout code does not compute medoids. In general, they are difficult to compute and implementing a full k-medoid clustering algorithm even more so. On Mon, Jul 20, 2015 at 6:25

Re: Realtime update of similarity matrices

2015-06-22 Thread Ted Dunning

James, This isn't an answer to your last question ... You have an excellent summary there. The only thing that you may have missed is that using cooccurrence/search-based recommendations allows you to improve results precisely because it gets you out of the business of tweaking algorithms and in

Re: Realtime update of similarity matrices

2015-06-19 Thread Ted Dunning

The standard approach is to re-run the off-line learning. It is possible, though not yet supported in Mahout tools, to do real-time updates. See here for some details: https://www.mapr.com/resources/videos/fully-real-time-recommendation-%E2%80%93-ted-dunning-sf-data-mining On Fri, Jun 19

Re: Streaming K-means

2015-06-01 Thread Ted Dunning

The streaming k-means works by building a sketch of the data which is then used to do real clustering. It might be that this sketch would be acceptable to do k-medoids, but that is definitely not guaranteed. Similarly, it might be possible to build a medoid sketch instead of a mean based sketch,

Re: Regression using MapReduce

2015-05-29 Thread Ted Dunning

Mahout is deprecating pretty much all of the classic MapReduce implementations in any case in favor of algorithms based fundamentally on a new linear algebra system known as Mahout-Samsara. On Fri, May 29, 2015 at 10:52 PM, Punit Naik wrote: > Hello all users > > I just wanted to know if Mahou

Re: Row Similarity

2015-05-14 Thread Ted Dunning

Actually, this is probably done more easily using a simple matrix multiplication. The reason for not using recommendation code for this is that your problem is entirely dense. How exactly you should go about this is a different question. Up to tens of thousands of stars, you can probably do this

Re: [VOTE] Apache Mahout 0.10.0 Release

2015-04-11 Thread Ted Dunning

t; > andrew.mussel...@gmail.com> wrote: > > > > > >> After checking the binary tarball and zip, and running through all the > > >> examples on an EMR cluster, I am good with this release. > > >> > > >> +1 (binding) > > >> > > >

Re: [VOTE] Apache Mahout 0.10.0 Release

2015-04-10 Thread Ted Dunning

Ah... forgot this. +1 (binding) On Fri, Apr 10, 2015 at 11:14 PM, Ted Dunning wrote: > > I downloaded and tested the signatures and check-sums on {binary,source} x > {zip,tar} + pom. All were correct. > > One thing that I worry a little about is that the name of the artifact &g

Re: [VOTE] Apache Mahout 0.10.0 Release

2015-04-10 Thread Ted Dunning

I downloaded and tested the signatures and check-sums on {binary,source} x {zip,tar} + pom. All were correct. One thing that I worry a little about is that the name of the artifact doesn't include "apache". Not sure that is a hard requirement, but it seems a good thing to do. On Fri, Apr 10,

Re: fast performance way of writing preferences to file?

2015-04-03 Thread Ted Dunning

Are you sure that the problem is writing the results? It seems to me that the real problem is the use of a user-based recommender. For such a small data set, for instance, a search-based recommender will be able to make recommendations in less than a millisecond with multiple recommendations poss

Re: adjusted cosine similarity for item-based recommender?

2015-04-03 Thread Ted Dunning

For practical recommendation systems, ratings are almost irrelevant. Ratings were prominent in the original academic work on recommendations largely because with the early research systems, users had no recordable interactions with content other than ratings. The Taste component of Mahout was writ

Re: Text clustering with SVD

2015-03-30 Thread Ted Dunning

Lanczos may be more accurate than SSVD, but if you use a power step or three, this difference goes away as well. The best way to select k is actually to pick a value k_max larger than you expect to need and then pick random vectors instead of singular vectors. To evaluate how many singular vectors

Re: Latent Semantic Analysis for Document Categorization

2015-03-30 Thread Ted Dunning

ents' I should look for. > > On Fri, Mar 27, 2015 at 2:45 AM, Ted Dunning > wrote: > > > Also, if you can include linking information between documents, you > should > > be able to substantially improve accuracy. Same goes for behavioral data > > like browsin

Re: Fw: Mahout dataset Vectorization

2015-03-26 Thread Ted Dunning

DFS in text format. > Destination IP address is not implicit infact its in the second row and > is a server. > Kindly suggest how i can do the kmeans clustering wrt timestamp or is > there a better way? > Regards,Raghuveer > > > > On Thursday, March 26, 2015 6:34 AM, Ted Dunning

Re: Latent Semantic Analysis for Document Categorization

2015-03-26 Thread Ted Dunning

Also, if you can include linking information between documents, you should be able to substantially improve accuracy. Same goes for behavioral data like browsing history. On Thu, Mar 26, 2015 at 6:10 AM, Hersheeta Chandankar < hersheetachandan...@gmail.com> wrote: > Thank you so much Chirag an

Re: Fw: Mahout dataset Vectorization

2015-03-25 Thread Ted Dunning

er ideas and how can i do it using JAVA code It would be really helpful > if you can show me a sample for this issue. Kindly suggest. > > Thanks, > Raghuveer > > On Tuesday, February 17, 2015 12:24 AM, Ted Dunning < > ted.dunn...@gmail.com> wrote: > > > > Please

Re: implementation of context-aware recommender in Mahout

2015-03-10 Thread Ted Dunning

Glad to help. You can help us by reporting your results when you get them. We look forward to that! On Tue, Mar 10, 2015 at 4:22 AM, Efi Koulouri wrote: > Things got clearier with your help! > > Thank you very much > > On 9 March 2015 at 01:50, Ted Dunning wrote: > > &g

Re: implementation of context-aware recommender in Mahout

2015-03-08 Thread Ted Dunning

the search engine approach is very interesting but in my case I > think that building the recommender using the java classes is more > appropriate as I need to use both approaches (post filtering,pre > filtering). Am I right ? > > On 8 March 2015 at 16:08, Ted Dunning wrote: > > > The

Re: implementation of context-aware recommender in Mahout

2015-03-08 Thread Ted Dunning

The by far easiest way to build a recommender (especially for production) is to use the search engine approach (what Pat was recommending). Post filtering can be done using the search engine far more easily than using Java classes. On Sat, Mar 7, 2015 at 8:44 AM, Pat Ferrel wrote: > Ooops a s

Re: problem in recommender similarity computation (taste)

2015-03-08 Thread Ted Dunning

On Sat, Mar 7, 2015 at 3:05 AM, Tevfik Aytekin wrote: > There can be two solutions: > 1. There should be a parameter n, which determines the minimum number > of common ratings needed to compute a similarity otherwise the system > should return NaN. > 2. The similarity should be computed using all

Re: spark-itemsimilarity question: what's the difference between indicator-matrix and cross-indicator-matrix

2015-03-06 Thread Ted Dunning

The terms main and secondary are a bit confusing. The easiest definition is that cooccurrence analyzes the record of actions you want to recommend. Cross occurrence tries to transfer from one behavior to another. In practice, it has been common to conflate many behaviors into one precisely

Re: How can I manually specify user similarities in the user-based algorithm?

2015-02-16 Thread Ted Dunning

On Mon, Feb 16, 2015 at 1:25 AM, Eugenio Tacchini < eugenio.tacch...@gmail.com> wrote: > Yes, I need to implement a lookup function, I was wondering which is the > easiest way, since I am not a Java programmer and I've started using Mahout > since a few days ago. > Without Java programming, there

Re: Apache Mahout Project for GSOC 2015

2015-02-15 Thread Ted Dunning

We haven't had anyone volunteer as a mentor this year as far as I know. On Sun, Feb 15, 2015 at 12:36 PM, Prasad Priyadarshana Fernando < bpp...@gmail.com> wrote: > Hi, > > I am interested in doing a project on recommender system framework for GSOC > 2015. Can somebody tell me whether Apache is

Re: How can I manually specify user similarities in the user-based algorithm?

2015-02-15 Thread Ted Dunning

On Sat, Feb 14, 2015 at 6:05 AM, Eugenio Tacchini < eugenio.tacch...@gmail.com> wrote: > Hi Pat, I don't understand why it is not a Mahout problem, my goal is to > evaluate (RMSE) the output of a user based algorithm comparing different > user similarity measures, Mahout already has everything I n

Re: How can I manually specify user similarities in the user-based algorithm?

2015-02-13 Thread Ted Dunning

On Fri, Feb 13, 2015 at 11:11 AM, Eugenio Tacchini < eugenio.tacch...@gmail.com> wrote: > Is there anyone who can give me some hints about this task? > Another way to look at this is to try to wedge this into the item similarity code. There are hooks available in the map-reduce version of item s

Re: Documentation

2015-02-13 Thread Ted Dunning

On Fri, Feb 13, 2015 at 9:37 AM, Eugenio Tacchini < eugenio.tacch...@gmail.com> wrote: > If I need to use a classical user-based technique, however, the only > alternative is the Taste-oriented code, am I right? > Right. > Still, I can't see how > to perform a prediction for a a user/item coupl

Re: Neural Network in hadoop

2015-02-12 Thread Ted Dunning

That is a really old paper that basically pre-dates all of the recent important work in neural networks. You should look for works on Rectified Linear Units (ReLU), drop-out regularization, parameter servers (downpour sgd) and deep learning. Map-reduce as you have used it will not produce interes

Re: Documentation

2015-02-12 Thread Ted Dunning

I would go so far as to say that all of the old Taste-oriented code is strongly deprecated. The indicator-based approach that Pat refers to is the best way forward. On Thu, Feb 12, 2015 at 8:29 AM, Pat Ferrel wrote: > The new cooccurrence recommender that works with a search engine has > sever

Re: Own recommender

2015-01-21 Thread Ted Dunning

Juanjo, Using the Taste components, it will be almost impossible to get really high performance. For that, using the itemsimilarity program to feed a search index is the best alternative. The scala version of the itemsimilarity program is available in Scala and could be called fairly easily as a

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-15 Thread Ted Dunning

On Thu, Jan 15, 2015 at 1:47 PM, wrote: > So, to summarize, my idea of K-medoids with DTW as a distance measure > (implemented as a two phase mapreduce) doesn't sound like an unrealistic > idea? > Sounds like a fine idea. > I'm mostly afraid not to get in the situation that the algorithm needs

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-15 Thread Ted Dunning

On Thu, Jan 15, 2015 at 3:50 AM, Marko Dinic wrote: > > > Thank you for your answer. Maybe I made a wrong picture about my data when > giving sinusoid as an example, my time series are not periodic. I merely continued with that example because it illustrates how an alternative metric makes all

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-15 Thread Ted Dunning

to > > fit it in what's already implemented in Mahout (for clustering), but > > it's not so obvious to me. > > > > I'm open to suggestions, I'm still new to all of this. > > > > Thanks, > > Marko > > > > On Sat 10

Re: Own recommender

2015-01-15 Thread Ted Dunning

The old Taste code is not the state of the art. User-based recommenders built on that will be slow. On Thu, Jan 15, 2015 at 7:10 AM, Juanjo Ramos wrote: > Hi David, > You implement your custom algorithm and create your own class that > implements the UserSimilarity interface. > > When you the

Re: boost selected dimensions in kmeans clustering

2015-01-15 Thread Ted Dunning

On Thu, Jan 15, 2015 at 5:23 AM, Miguel Angel Martin junquera < mianmarjun.mailingl...@gmail.com> wrote: > My question is:.. > Is it better to scale up these dimensions directly in the tf-idf > sequence final mix file using this correction factors OR first do scale > up in each tf-vectors

Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

2015-01-14 Thread Ted Dunning

have you considered implementing using something like spark? That could be much easier than raw map-reduce On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni wrote: > In KNN like algorithm we need to load model Data into cache for predicting > the records. > > Here is the example for KNN. > > >

Re: boost selected dimensions in kmeans clustering

2015-01-14 Thread Ted Dunning

The easiest way is to scale those dimensions up. On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera < mianmarjun.mailingl...@gmail.com> wrote: > hi all, > > > I am clustering using kmeans several text documents from distintct sources > and I have generated the sparse vectors of each

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-10 Thread Ted Dunning

On Sat, Jan 10, 2015 at 3:02 AM, Marko Dinic wrote: > For example, mean of two sinusoids while one of them is shifted by Pi is > 0. And that's definitely not a good centroid in my case. Well, if you think that phase shifts represent small distance proportional to phase difference then the mean

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-09 Thread Ted Dunning

the end I could take one signal from each cluster that is the most similar > with others in cluster (some kind of centroid/medioid). > > What do you think about this approach and about the scalability? > > I would highly appreciate your answer, thanks. > > On Thu 08 Jan 201

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-08 Thread Ted Dunning

On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic wrote: > 1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout that > could be used as a distance measure for clustering? > No. > > 2) Why isn't there an implementation of K-mediods in Mahout? I'm guessing > that it could not be imple

Re: consistency of StaticWordValueEncoder

2015-01-07 Thread Ted Dunning

On Wed, Jan 7, 2015 at 2:20 PM, chirag lakhani wrote: > In the Mahout in Action book I got the impression that the term "memo" will > seed the random number generator and I wanted to confirm that means I will > have consistency if I deploy this vectorizer in both my Hadoop environment > as well a

Re: spark-itemsimilarity out of memory problem

2014-12-23 Thread Ted Dunning

On Tue, Dec 23, 2014 at 9:16 AM, Pat Ferrel wrote: > > To use the hadoop mapreduce version (Ted’s suggestion) you’ll loose the > cross-cooccurrence indicators and you’ll have to translate your IDs into > Mahout IDs. This means mapping user and item IDs from your values into > non-negative integer

Re: spark-itemsimilarity out of memory problem

2014-12-23 Thread Ted Dunning

On Tue, Dec 23, 2014 at 7:39 AM, AlShater, Hani wrote: > @Ted, It is 3 nodes small cluster for POC. Spark executer is given 2g and > yarn is configured accordingly. I am trying to avoid spark memory caching. > Have you tried the map-reduce version?

Re: spark-itemsimilarity out of memory problem

2014-12-22 Thread Ted Dunning

Can you say what kind of cluster you have? How many machines? How much memory? How much memory is given to Spark? On Sun, Dec 21, 2014 at 11:44 PM, AlShater, Hani wrote: > Hi All, > > I am trying to use spark-itemsimilarity on 160M user interactions dataset. > The job launches and running su

Re: Question about choice of a recommender

2014-12-16 Thread Ted Dunning

How much data are you going to be collecting? How many users and how many presentations per user? Are you saying that the product for each video are completely fixed? Does the same product appear for more than one video? Do users interact with products outside of the narrow confines that you ha

Re: Collaborative filtering item-based in mahout - without isolating users

2014-12-11 Thread Ted Dunning

Natalia, It sounds like you are starting from the assumption that ratings are being done. This can happen, but in production recommendation settings, ratings is typically a very low value input because the meaning of a rating is very complex and because so few users actually do ratings unless for

Re: User based recommender

2014-12-05 Thread Ted Dunning

ry etc) > > Maybe location,sales per item(similarity might lead to knowledge of people > who share same purchasing patterns) etc. > > > On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning wrote: > > > On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel > > wrote: > > > >

Re: Process UnStructured Data in Mahout for Clustering

2014-12-05 Thread Ted Dunning

On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh wrote: > i see the problem is with the way data is written What exactly do you mean by this?

Re: User based recommender

2014-12-04 Thread Ted Dunning

On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel wrote: > I have multiple different columns such as category,shipping location,item > price,online user, etc. > > How can i use all these different columns and improve recommendation > quality(ie.calculate more precise similarity between users by use of >

Re: DBSCAN implementation in Mahout

2014-11-30 Thread Ted Dunning

t; parallel threads. > > Thus the scale up is almost 'n'. I think scalability should not be an > issue for a Map Reduce implementation. > > Chirag Nagpal > University of Pune, India > www.chiragnagpal.com > > From: Ted Dun

Re: DBSCAN implementation in Mahout

2014-11-30 Thread Ted Dunning

On Sat, Nov 29, 2014 at 8:31 PM, 3316 Chirag Nagpal < chiragnagpal_12...@aitpune.edu.in> wrote: > Since Density based clustering algorithms, are being utilised extensively, > especially by the GIS research groups, it is a bit sad that there isn't a > Map Reduce implementation available.. > > I thi

Re: Bi-Factorization vs Tri-Factorization for recommender systems

2014-11-24 Thread Ted Dunning

There is no inherent mathematical difference, but there may be some pretty significant practical differences. Using the three matrix form (X = USV') puts the normalization constants into a place where you can control them a bit easier. This can be useful if you want *both* user and item vectors t

Re: Mahout 0.7 ALS Recommender: java.lang.Exception: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable

2014-11-24 Thread Ted Dunning

The error message that you got indicated that some input was textual and needed to be an integer. Is there a chance that the type of some of your input is incorrect in your sequence files? On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal wrote: > Thanks for reply. I did not compile mahout. Mahou

Re: Re: why rbm was removed from mahout?

2014-11-09 Thread Ted Dunning

Check out H2O. http://0xdata.com/ On Mon, Nov 10, 2014 at 1:38 AM, zhonghong...@yy.com wrote: > So is there any scalable rbms available ? > I'm going to implement a recommender based on it. > > From: Ted Dunning > Date: 2014-11-10 15:34 > To: user@mahout.apache.org >

Re: why rbm was removed from mahout?

2014-11-09 Thread Ted Dunning

The algorithm wasn't particularly scalable. Nobody was around to support it. Nobody complained about the many warnings that it would be removed, nor the deprecation. Nor even the removal. On Mon, Nov 10, 2014 at 1:20 AM, zhonghong...@yy.com wrote: > Can anyone tell me why the Restricted Bol

Re: Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Ted Dunning

uld be storaged in > vector(dense or sparse) format ,so a conversion step > needs to be doned before algorithms deal with data. Is that right? > > 2014-11-04 23:56 GMT+08:00 Ted Dunning : > >> What should the input be? >> >> >> >>> On Tue, Nov 4,

Re: Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Ted Dunning

What should the input be? On Tue, Nov 4, 2014 at 12:28 AM, Lee S wrote: > Hi all: > I'm wondering why the input and output of most algorithm like > kmeans,naivebayes are all sequencefiles. One more step of conversion need > to be done if we want the algorithm works.And > I think the step is

Re: using Mahout to classify customer service and sales emails?

2014-10-25 Thread Ted Dunning

be the same process > from scratch or can it be done incrementally? > > Best, > Mahesh.B. > > > On Thu, Oct 23, 2014 at 1:13 AM, Ted Dunning > wrote: > > > Yes. Mahout can do this. > > > > Pro: MapR classifiers are pretty easy to integrate because of a

Re: Mahout Vs Spark

2014-10-23 Thread Ted Dunning

The Python API > uses the standard CPython implementation, and can call into existing C > libraries for Python such as NumPy. > > > > On Thu, Oct 23, 2014 at 1:11 PM, Ted Dunning > wrote: > > > Hmmm > > > > I don't think that the array formats use

Re: Mahout Vs Spark

2014-10-23 Thread Ted Dunning

shu Prasad wrote: > actually spark is available in python also, so users of spark are having an > upper hand over users of traditional users of mahout. This is applicable to > all the libraries of python (including numpy). > > On Wed, Oct 22, 2014 at 3:54 AM, Ted Dunning > wrote

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2397 matches

Mail list logo