There was a variant of cholesky decomposition in Mahout at one time not so
long ago. I would guess that it is still there.
It is difficult to make a truly distributed version of QR decomposition,
but for the purposes of the randomized SVD in Mahout, it wasn't actually
necessary to have a true QR.
ffic driven
> from external sources.
>
> Thanks for the detailed hints - now it's time to see what comes out of
> this.
>
> Johannes
>
> On Sun, Nov 12, 2017 at 7:52 AM, Ted Dunning
> wrote:
>
> > Events have the natural good quality that having a cold star
talk from trevor grant but I'm really eager to attack
> this after years of batch :)
>
> Thanks for your thoughts, I am happy I can rule something out given the
> domain (poisson llr). Luckily the domain I'm working on is event
> recommendations, so there is a natural de
yield “hot in
> Greece”
>
I think that this is a good approach.
>
> Ted’s “Christmas video” tag is what I was calling a business rule and can
> be added to either of the above techniques.
>
But the (not) hotness feature might help with automated this.
>
> On Nov 11, 2
So ... there are a few different threads here.
1) LLR but with time. Quite possible, but not really what Johannes is
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.
2) time varying recommendation. As Johannes notes, this can make use of
windowed counts. The problem i
It is common with large numerical codes that things run faster in memory on
just a few cores if the communication required outweighs the parallel
speedup.
The issue is that memory bandwidth is slower than the arithmetic speed by a
very good amount. If you just have to move stuff into the CPU and m
On Sat, May 6, 2017 at 2:43 PM, Scott C. Cote wrote:
> Will you be wearing “one of those t-shirts” on Monday in Houston :) ?
>
Not likely.
It is in the archive.
nologies used it back in the 90s, however they used a
> >very
> >> > specific red one, and this isn't a deal breaker for me.
> >> >
> >> > Other thoughts:
> >> > Based on the tattoo I saw- one could make an Enso using old mahout
> >col
I haven't been active enough to feel good about an out and out -1.
Put me as -0
On Thu, Apr 27, 2017 at 3:54 PM, Pat Ferrel wrote:
> Fair enough, I think Trevor feels the same.
>
> The blue man can continue, all it takes is a -1
>
>
> On Apr 27, 2017, at 3:50 PM, Te
or opinion is welcome input) or
> would you like to discontinue the contest. If the later, -1 now.
>
>
> On Apr 27, 2017, at 3:42 PM, Ted Dunning wrote:
>
> I thought that none of the proposals were worth continuing with.
>
>
>
> On Thu, Apr 27, 2017 at 3:36 PM, Pat
I thought that none of the proposals were worth continuing with.
On Thu, Apr 27, 2017 at 3:36 PM, Pat Ferrel wrote:
> Yes, -1 means you hate them all or think the designers are not worth
> paying. We have to pay to continue, I’ll foot the bill (donations
> appreciated) but don’t want to unles
nts to be indexed by Solr has fairly large content in it and
> 100+ users searching within it(once the solr search tool goes live).
> Kindly guide me on the integration steps for mahout with Solr(with respect
> all the stats mentioned).
>
> Thanks and Regards,
> Arun
>
> On 2
to use the LAN path for configurations and
> index.I can use the larger document base.
>
> Thanks and Regards,
> Arun
>
> On 2 April 2017 at 07:00, Ted Dunning wrote:
>
> > On Sat, Apr 1, 2017 at 6:21 PM, arun abraham
> > wrote:
> >
> > > As
On Sat, Apr 1, 2017 at 6:21 PM, arun abraham
wrote:
> As a first step I am trying to recommend min of two documents(As my
> Solr document index is ~100 docs).
>
This is kind of weird.
Can you say why you have so very few documents?
There may be something special going on that will make this w
On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel wrote:
> maybe we should drop the name Mahout altogether.
I have been told that there is a cool secondary interpretation of Mahout as
well.
I think that the Hebrew word is pronounced roughly like Mahout.
מַהוּת
The cool thing is that this word mean
>From my perspective, the state of the art of machine learning is with
systems like Tensorflow and dl4j. If you can deal with the limits of a
non-clustered GPU system, then Theano and Cafe are very useful. Keras
papers over the difference between different back-ends nicely.
Tensorflow and Theano c
This actually sounds like a very small problem.
My guess is that there are bad settings for the interaction and frequency
cuts.
On Thu, Jun 23, 2016 at 11:07 AM, Pat Ferrel wrote:
> In addition to increasing downsampling there are some other things to
> note. The original OOM was caused by th
On Sat, Jun 4, 2016 at 10:14 AM, forme book wrote:
> On the (Lucene side) has already by default this implementations, what I do
> struggle to understand what is the advantage of having lucene.vector in
> mahout when Lucene offer that feature out of the box ?
>
> Maybe I'm missing something big b
It just means that there is an association. Causation is much more
difficult to ascertain.
On Wed, May 4, 2016 at 6:06 AM, Nikaash Puri wrote:
> Hi,
>
> Just wanted to clarify a small doubt. On running LLR with primary
> indicator as view and secondary indicator as purchase. Say, one line of t
Mahout is considerably better at sparse operations and optimizations than
dense ones.
Beyond that, I would expect that you would do better with traditional math
libraries.
And, are you really trying to invert a matrix? The common maxim is that
this implies an error in your method because inversio
On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal
wrote:
> Actually, I need to use fuzzy clustering to cluster the sentence in my
> research. I found fuzzy k clustering algorithm in Apache Mahout, thus, I
> am trying to use it for my purpose.
>
That's great.
But that code is no longer supporte
The project has moved in various ways since MiA was first published, but
just covering Samsara leaves a lot of recommendation code that needs to be
covered. There is room for another book.
On Thu, Feb 25, 2016 at 9:32 AM, Suneel Marthi wrote:
> The Mahout project has diverged from 'Mahout in
On Thu, Feb 25, 2016 at 6:52 AM, wrote:
> Thank you for your answer
> What other tools you advise me to use?
> Do you recommend Rhadoop?
>
Try h2o instead. Good R interface. Decent model building.
See here:
https://ssc.io/pdf/rec11-schelter.pdf
On Fri, Feb 19, 2016 at 3:16 AM, Lee S wrote:
> Hi:
>Does anybody know which paper the mr algorithm is based on?
>
Did you want textual similarity?
Or semantic similarity?
The actual semantics of a message can be opaque from the content, but clear
from the usage.
On Sun, Feb 14, 2016 at 5:29 AM, Charles Earl wrote:
> David,
> LDA or LSI can work quite nicely for similarity (YMMV of course depending
> on
On Sun, Nov 29, 2015 at 9:36 PM, Niklas Ekvall
wrote:
> My conclusion is that recommenditembased in Mahout works better for ratings
> than binary data, what is your conclusions?
>
Still operator error somewhere. Binary data works much better as a real
recommender.
There are a few problems that you have.
1) user-based recommendation is often slower than item-based (sometimes
MUCH slower). This can make a 2-10x difference in practice
2) pre-computing recommendations is usually much less efficient than
computing them on the fly (because typically few users w
On Tue, Nov 3, 2015 at 3:20 PM, Pat Ferrel wrote:
> For the strict out there we did not directly isolate the two actions,
> which is work remaining so some of the lift might be due to just having
> more data but it’s a really good first step because more data doesn't
> always translate to better
No. Not entirely surprising, but it is *really* nice to get some public
results on this.
The treatment of the negatives as a separate cross term instead of just
lumping them together is a very significant difference.
On Tue, Nov 3, 2015 at 3:42 PM, Peter Jaumann
wrote:
> Fascinating!!! Not too
>
>
> On Monday, October 5, 2015 2:25 PM, Ted Dunning <
> ted.dunn...@gmail.com> wrote:
>
>
> That isn't enough detail.
>
> How do you mean to compute degrees of freedom? WHy do you need the inverse
> to do this?
>
> Where did you get this al
ore than interested to extend to complex double, when the
> solver is ready for double data type. thanks, canal
>
>
> On Monday, October 5, 2015 2:02 PM, Ted Dunning <
> ted.dunn...@gmail.com> wrote:
>
>
> On Sun, Oct 4, 2015 at 10:32 PM, go canal
> wrote:
&g
On Sun, Oct 4, 2015 at 10:32 PM, go canal wrote:
> in fact i need to support both double and complex double for either
> distributed memory based or out-of-core.
Ahh...
Well Mahout doesn't support complex anything. So this isn't going to help
you.
Jaumann <
> peter.jauma...@gmail.com> wrote:
>
>
> This should be done with a matrix solver indeed!!!
>
>
>
> On Oct 4, 2015 11:53 AM, "Ted Dunning" wrote:
> >
> >
> > It is almost certain that starting with an inversion is a serious e
version of a very large matrix. will have to revert back to scalapack or MR
> based solutions I guess.
> thanks, canal
>
>
> On Saturday, October 3, 2015 11:31 PM, Ted Dunning
> wrote:
>
>
> I doubt seriously that Samsara will support matrix inversion per se.
I doubt seriously that Samsara will support matrix inversion per se. The
problem is
a) it densifies sparse matrices
b) it is much more costly than solving a linear system
Samsara is roughly memory based, but different back-ends will try to spill
to disk if necessary. It is likely that the resul
On Tue, Sep 22, 2015 at 5:51 PM, Ankit Goel wrote:
> What I wanted to do was modify the clustering algorithm, in hopes of
> experimenting with different versions of it. I'm not much hung over the MR
> part of things, rather the clustering algo itself.
>
Have at it.
All yours.
> Secondly or a
On Mon, Sep 21, 2015 at 4:44 PM, Ankit Goel wrote:
> If one wanted to modify the kmeans algorithm given with the mahout package,
> how would/should one go about doing it?
>
If you want to modify the old map reduce code, please go right ahead. The
project members will not be maintaining that cod
My own feeling is that the right answer is to look at average squared
distance on your training data and on held out data.
As long as these values are nearly the same, you likely have a smaller (or
equal) than optimal value of k. When the average squared distance is
significantly less on the trai
Seems like a simple format translation.
Why not just reformat the input file?
On Tue, Aug 18, 2015 at 7:42 PM, Zhou Jiang wrote:
> Hi All,
>
> The Default Random Forests MapReduce works with UCI glass data.
>
> ID f1 f2 f3 … fn L
>
> Is there a way to make it work with image data in libsvm fo
Yes. That will solve a matrix problem.
But it won't handle complex values.
Check out JBlas for an in-core implementation that handles complex double
matrices.
For out of core version of QR, perhaps Dmitriy can turn up his writeup on
the subject of block diagonal QR.
On Mon, Aug 10, 2015 at 9:
gt; in
> > > the articles are from different news sources but are about the exact
> same
> > > thing. Intuitively it seems that these articles would get grouped
> > > together. Any suggestions how I should go about that? So far I'm using
> > > nutch to crawl, solr to
The most central point in a cluster is often referred to as a medoid
(similar to median, but multi-dimensional).
The Mahout code does not compute medoids. In general, they are difficult
to compute and implementing a full k-medoid clustering algorithm even more
so.
On Mon, Jul 20, 2015 at 6:25
James,
This isn't an answer to your last question ...
You have an excellent summary there. The only thing that you may have
missed is that using cooccurrence/search-based recommendations allows you
to improve results precisely because it gets you out of the business of
tweaking algorithms and in
The standard approach is to re-run the off-line learning.
It is possible, though not yet supported in Mahout tools, to do real-time
updates.
See here for some details:
https://www.mapr.com/resources/videos/fully-real-time-recommendation-%E2%80%93-ted-dunning-sf-data-mining
On Fri, Jun 19
The streaming k-means works by building a sketch of the data which is then
used to do real clustering.
It might be that this sketch would be acceptable to do k-medoids, but that
is definitely not guaranteed.
Similarly, it might be possible to build a medoid sketch instead of a mean
based sketch,
Mahout is deprecating pretty much all of the classic MapReduce
implementations in any case in favor of algorithms based fundamentally on a
new linear algebra system known as Mahout-Samsara.
On Fri, May 29, 2015 at 10:52 PM, Punit Naik wrote:
> Hello all users
>
> I just wanted to know if Mahou
Actually, this is probably done more easily using a simple matrix
multiplication. The reason for not using recommendation code for this is
that your problem is entirely dense.
How exactly you should go about this is a different question. Up to tens
of thousands of stars, you can probably do this
t; > andrew.mussel...@gmail.com> wrote:
> > >
> > >> After checking the binary tarball and zip, and running through all the
> > >> examples on an EMR cluster, I am good with this release.
> > >>
> > >> +1 (binding)
> > >>
> > >
Ah... forgot this.
+1 (binding)
On Fri, Apr 10, 2015 at 11:14 PM, Ted Dunning wrote:
>
> I downloaded and tested the signatures and check-sums on {binary,source} x
> {zip,tar} + pom. All were correct.
>
> One thing that I worry a little about is that the name of the artifact
&g
I downloaded and tested the signatures and check-sums on {binary,source} x
{zip,tar} + pom. All were correct.
One thing that I worry a little about is that the name of the artifact
doesn't include "apache". Not sure that is a hard requirement, but it
seems a good thing to do.
On Fri, Apr 10,
Are you sure that the problem is writing the results? It seems to me that
the real problem is the use of a user-based recommender.
For such a small data set, for instance, a search-based recommender will be
able to make recommendations in less than a millisecond with multiple
recommendations poss
For practical recommendation systems, ratings are almost irrelevant.
Ratings were prominent in the original academic work on recommendations
largely because with the early research systems, users had no recordable
interactions with content other than ratings. The Taste component of
Mahout was writ
Lanczos may be more accurate than SSVD, but if you use a power step or
three, this difference goes away as well.
The best way to select k is actually to pick a value k_max larger than you
expect to need and then pick random vectors instead of singular vectors.
To evaluate how many singular vectors
ents' I should look for.
>
> On Fri, Mar 27, 2015 at 2:45 AM, Ted Dunning
> wrote:
>
> > Also, if you can include linking information between documents, you
> should
> > be able to substantially improve accuracy. Same goes for behavioral data
> > like browsin
DFS in text format.
> Destination IP address is not implicit infact its in the second row and
> is a server.
> Kindly suggest how i can do the kmeans clustering wrt timestamp or is
> there a better way?
> Regards,Raghuveer
>
>
>
> On Thursday, March 26, 2015 6:34 AM, Ted Dunning
Also, if you can include linking information between documents, you should
be able to substantially improve accuracy. Same goes for behavioral data
like browsing history.
On Thu, Mar 26, 2015 at 6:10 AM, Hersheeta Chandankar <
hersheetachandan...@gmail.com> wrote:
> Thank you so much Chirag an
er ideas and how can i do it using JAVA code It would be really helpful
> if you can show me a sample for this issue. Kindly suggest.
>
> Thanks,
> Raghuveer
>
> On Tuesday, February 17, 2015 12:24 AM, Ted Dunning <
> ted.dunn...@gmail.com> wrote:
>
>
>
> Please
Glad to help.
You can help us by reporting your results when you get them.
We look forward to that!
On Tue, Mar 10, 2015 at 4:22 AM, Efi Koulouri wrote:
> Things got clearier with your help!
>
> Thank you very much
>
> On 9 March 2015 at 01:50, Ted Dunning wrote:
>
> &g
the search engine approach is very interesting but in my case I
> think that building the recommender using the java classes is more
> appropriate as I need to use both approaches (post filtering,pre
> filtering). Am I right ?
>
> On 8 March 2015 at 16:08, Ted Dunning wrote:
>
> > The
The by far easiest way to build a recommender (especially for production)
is to use the search engine approach (what Pat was recommending).
Post filtering can be done using the search engine far more easily than
using Java classes.
On Sat, Mar 7, 2015 at 8:44 AM, Pat Ferrel wrote:
> Ooops a s
On Sat, Mar 7, 2015 at 3:05 AM, Tevfik Aytekin
wrote:
> There can be two solutions:
> 1. There should be a parameter n, which determines the minimum number
> of common ratings needed to compute a similarity otherwise the system
> should return NaN.
> 2. The similarity should be computed using all
The terms main and secondary are a bit confusing.
The easiest definition is that cooccurrence analyzes the record of actions you
want to recommend. Cross occurrence tries to transfer from one behavior to
another.
In practice, it has been common to conflate many behaviors into one precisely
On Mon, Feb 16, 2015 at 1:25 AM, Eugenio Tacchini <
eugenio.tacch...@gmail.com> wrote:
> Yes, I need to implement a lookup function, I was wondering which is the
> easiest way, since I am not a Java programmer and I've started using Mahout
> since a few days ago.
>
Without Java programming, there
We haven't had anyone volunteer as a mentor this year as far as I know.
On Sun, Feb 15, 2015 at 12:36 PM, Prasad Priyadarshana Fernando <
bpp...@gmail.com> wrote:
> Hi,
>
> I am interested in doing a project on recommender system framework for GSOC
> 2015. Can somebody tell me whether Apache is
On Sat, Feb 14, 2015 at 6:05 AM, Eugenio Tacchini <
eugenio.tacch...@gmail.com> wrote:
> Hi Pat, I don't understand why it is not a Mahout problem, my goal is to
> evaluate (RMSE) the output of a user based algorithm comparing different
> user similarity measures, Mahout already has everything I n
On Fri, Feb 13, 2015 at 11:11 AM, Eugenio Tacchini <
eugenio.tacch...@gmail.com> wrote:
> Is there anyone who can give me some hints about this task?
>
Another way to look at this is to try to wedge this into the item
similarity code.
There are hooks available in the map-reduce version of item s
On Fri, Feb 13, 2015 at 9:37 AM, Eugenio Tacchini <
eugenio.tacch...@gmail.com> wrote:
> If I need to use a classical user-based technique, however, the only
> alternative is the Taste-oriented code, am I right?
>
Right.
> Still, I can't see how
> to perform a prediction for a a user/item coupl
That is a really old paper that basically pre-dates all of the recent
important work in neural networks.
You should look for works on Rectified Linear Units (ReLU), drop-out
regularization, parameter servers (downpour sgd) and deep learning.
Map-reduce as you have used it will not produce interes
I would go so far as to say that all of the old Taste-oriented code is
strongly deprecated. The indicator-based approach that Pat refers to is
the best way forward.
On Thu, Feb 12, 2015 at 8:29 AM, Pat Ferrel wrote:
> The new cooccurrence recommender that works with a search engine has
> sever
Juanjo,
Using the Taste components, it will be almost impossible to get really high
performance. For that, using the itemsimilarity program to feed a search
index is the best alternative.
The scala version of the itemsimilarity program is available in Scala and
could be called fairly easily as a
On Thu, Jan 15, 2015 at 1:47 PM, wrote:
> So, to summarize, my idea of K-medoids with DTW as a distance measure
> (implemented as a two phase mapreduce) doesn't sound like an unrealistic
> idea?
>
Sounds like a fine idea.
> I'm mostly afraid not to get in the situation that the algorithm needs
On Thu, Jan 15, 2015 at 3:50 AM, Marko Dinic
wrote:
>
>
> Thank you for your answer. Maybe I made a wrong picture about my data when
> giving sinusoid as an example, my time series are not periodic.
I merely continued with that example because it illustrates how an
alternative metric makes all
to
> > fit it in what's already implemented in Mahout (for clustering), but
> > it's not so obvious to me.
> >
> > I'm open to suggestions, I'm still new to all of this.
> >
> > Thanks,
> > Marko
> >
> > On Sat 10
The old Taste code is not the state of the art. User-based recommenders
built on that will be slow.
On Thu, Jan 15, 2015 at 7:10 AM, Juanjo Ramos wrote:
> Hi David,
> You implement your custom algorithm and create your own class that
> implements the UserSimilarity interface.
>
> When you the
On Thu, Jan 15, 2015 at 5:23 AM, Miguel Angel Martin junquera <
mianmarjun.mailingl...@gmail.com> wrote:
> My question is:..
> Is it better to scale up these dimensions directly in the tf-idf
> sequence final mix file using this correction factors OR first do scale
> up in each tf-vectors
have you considered implementing using something like spark? That could be
much easier than raw map-reduce
On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni
wrote:
> In KNN like algorithm we need to load model Data into cache for predicting
> the records.
>
> Here is the example for KNN.
>
>
>
The easiest way is to scale those dimensions up.
On Wed, Jan 14, 2015 at 2:41 AM, Miguel Angel Martin junquera <
mianmarjun.mailingl...@gmail.com> wrote:
> hi all,
>
>
> I am clustering using kmeans several text documents from distintct sources
> and I have generated the sparse vectors of each
On Sat, Jan 10, 2015 at 3:02 AM, Marko Dinic
wrote:
> For example, mean of two sinusoids while one of them is shifted by Pi is
> 0. And that's definitely not a good centroid in my case.
Well, if you think that phase shifts represent small distance proportional
to phase difference then the mean
the end I could take one signal from each cluster that is the most similar
> with others in cluster (some kind of centroid/medioid).
>
> What do you think about this approach and about the scalability?
>
> I would highly appreciate your answer, thanks.
>
> On Thu 08 Jan 201
On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic
wrote:
> 1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout that
> could be used as a distance measure for clustering?
>
No.
>
> 2) Why isn't there an implementation of K-mediods in Mahout? I'm guessing
> that it could not be imple
On Wed, Jan 7, 2015 at 2:20 PM, chirag lakhani
wrote:
> In the Mahout in Action book I got the impression that the term "memo" will
> seed the random number generator and I wanted to confirm that means I will
> have consistency if I deploy this vectorizer in both my Hadoop environment
> as well a
On Tue, Dec 23, 2014 at 9:16 AM, Pat Ferrel wrote:
>
> To use the hadoop mapreduce version (Ted’s suggestion) you’ll loose the
> cross-cooccurrence indicators and you’ll have to translate your IDs into
> Mahout IDs. This means mapping user and item IDs from your values into
> non-negative integer
On Tue, Dec 23, 2014 at 7:39 AM, AlShater, Hani wrote:
> @Ted, It is 3 nodes small cluster for POC. Spark executer is given 2g and
> yarn is configured accordingly. I am trying to avoid spark memory caching.
>
Have you tried the map-reduce version?
Can you say what kind of cluster you have?
How many machines? How much memory? How much memory is given to Spark?
On Sun, Dec 21, 2014 at 11:44 PM, AlShater, Hani wrote:
> Hi All,
>
> I am trying to use spark-itemsimilarity on 160M user interactions dataset.
> The job launches and running su
How much data are you going to be collecting? How many users and how many
presentations per user?
Are you saying that the product for each video are completely fixed? Does
the same product appear for more than one video?
Do users interact with products outside of the narrow confines that you
ha
Natalia,
It sounds like you are starting from the assumption that ratings are being
done.
This can happen, but in production recommendation settings, ratings is
typically a very low value input because the meaning of a rating is very
complex and because so few users actually do ratings unless for
ry etc)
>
> Maybe location,sales per item(similarity might lead to knowledge of people
> who share same purchasing patterns) etc.
>
>
> On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning wrote:
>
> > On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel
> > wrote:
> >
> >
On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh
wrote:
> i see the problem is with the way data is written
What exactly do you mean by this?
On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel wrote:
> I have multiple different columns such as category,shipping location,item
> price,online user, etc.
>
> How can i use all these different columns and improve recommendation
> quality(ie.calculate more precise similarity between users by use of
>
t; parallel threads.
>
> Thus the scale up is almost 'n'. I think scalability should not be an
> issue for a Map Reduce implementation.
>
> Chirag Nagpal
> University of Pune, India
> www.chiragnagpal.com
>
> From: Ted Dun
On Sat, Nov 29, 2014 at 8:31 PM, 3316 Chirag Nagpal <
chiragnagpal_12...@aitpune.edu.in> wrote:
> Since Density based clustering algorithms, are being utilised extensively,
> especially by the GIS research groups, it is a bit sad that there isn't a
> Map Reduce implementation available..
>
> I thi
There is no inherent mathematical difference, but there may be some pretty
significant practical differences.
Using the three matrix form (X = USV') puts the normalization constants
into a place where you can control them a bit easier. This can be useful
if you want *both* user and item vectors t
The error message that you got indicated that some input was textual and
needed to be an integer.
Is there a chance that the type of some of your input is incorrect in your
sequence files?
On Mon, Nov 24, 2014 at 3:47 PM, Ashok Harnal wrote:
> Thanks for reply. I did not compile mahout. Mahou
Check out H2O.
http://0xdata.com/
On Mon, Nov 10, 2014 at 1:38 AM, zhonghong...@yy.com
wrote:
> So is there any scalable rbms available ?
> I'm going to implement a recommender based on it.
>
> From: Ted Dunning
> Date: 2014-11-10 15:34
> To: user@mahout.apache.org
>
The algorithm wasn't particularly scalable. Nobody was around to support
it. Nobody complained about the many warnings that it would be removed,
nor the deprecation. Nor even the removal.
On Mon, Nov 10, 2014 at 1:20 AM, zhonghong...@yy.com
wrote:
> Can anyone tell me why the Restricted Bol
uld be storaged in
> vector(dense or sparse) format ,so a conversion step
> needs to be doned before algorithms deal with data. Is that right?
>
> 2014-11-04 23:56 GMT+08:00 Ted Dunning :
>
>> What should the input be?
>>
>>
>>
>>> On Tue, Nov 4,
What should the input be?
On Tue, Nov 4, 2014 at 12:28 AM, Lee S wrote:
> Hi all:
> I'm wondering why the input and output of most algorithm like
> kmeans,naivebayes are all sequencefiles. One more step of conversion need
> to be done if we want the algorithm works.And
> I think the step is
be the same process
> from scratch or can it be done incrementally?
>
> Best,
> Mahesh.B.
>
>
> On Thu, Oct 23, 2014 at 1:13 AM, Ted Dunning
> wrote:
>
> > Yes. Mahout can do this.
> >
> > Pro: MapR classifiers are pretty easy to integrate because of a
The Python API
> uses the standard CPython implementation, and can call into existing C
> libraries for Python such as NumPy.
>
>
>
> On Thu, Oct 23, 2014 at 1:11 PM, Ted Dunning
> wrote:
>
> > Hmmm
> >
> > I don't think that the array formats use
shu Prasad
wrote:
> actually spark is available in python also, so users of spark are having an
> upper hand over users of traditional users of mahout. This is applicable to
> all the libraries of python (including numpy).
>
> On Wed, Oct 22, 2014 at 3:54 AM, Ted Dunning
> wrote
1 - 100 of 2397 matches
Mail list logo