from:"Alan Prando"

Product similarity with TF/IDF and Cosine similarity (DIMSUM)

2016-01-30 Thread Alan Prando

Hi Folks! I am trying to implement a spark job to calculate the similarity of my database products, using only name and descriptions. I would like to use TF-IDF to represent my text data and cosine similarity to calculate all similarities. My goal is, after job completes, get all similarities a

Spark saveAsText file size

2014-11-24 Thread Alan Prando

Hi Folks! I'm running a spark JOB on a cluster with 9 slaves and 1 master (250GB RAM, 32 cores each and 1TB of storage each). This job generates 1.200 TB of data on a RDD with 1200 partitions. When I call saveAsTextFile("hdfs://..."), spark creates 1200 files named "part-000*" on HDFS's folder. H

MLIB KMeans Exception

2014-11-20 Thread Alan Prando

Hi Folks! I'm running a Python Spark job on a cluster with 1 master and 10 slaves (64G RAM and 32 cores each machine). This job reads a file with 1.2 terabytes and 1128201847 lines on HDFS and call Kmeans method as following: # SLAVE CODE - Reading features from HDFS def get_features_from

Re: Spark on YARN

2014-11-19 Thread Alan Prando

014-11-18 16:18 GMT-02:00 Sean Owen : > My guess is you're asking for all cores of all machines but the driver > needs at least one core, so one executor is unable to find a machine to fit > on. > On Nov 18, 2014 7:04 PM, "Alan Prando" wrote: > >> Hi Folks! >&

Spark on YARN

2014-11-18 Thread Alan Prando

Hi Folks! I'm running Spark on YARN cluster installed with Cloudera Manager Express. The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G RAM. My spark's job is working fine, however it seems that just 2 of 3 slaves are working (htop shows 2 slaves working 100% on 32 cores, a

Reading from Hbase using python

2014-11-12 Thread Alan Prando

Hi all, I'm trying to read an hbase table using this an example from github ( https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py), however I have two qualifiers in a column family. Ex.: ROW COLUMN+CELL row1 column=f1:1, timestamp=1401883411986, value=valu

Product similarity with TF/IDF and Cosine similarity (DIMSUM)

Spark saveAsText file size

MLIB KMeans Exception

Re: Spark on YARN

Spark on YARN

Reading from Hbase using python

6 matches

Site Navigation

Mail list logo

Footer information