The difference is that the job used a reduce-side join to join feature
vectors and ratings in 0.5 which is scalable but very slow.

We changed this to a broadcast join in later versions, which can be
executed using a single map-only job. However, each of the feature
matrices has to fit into the mappers memory for that to work.

Can you provide some numbers about the volume of your data and the
cluster you run the job in?

/s

On 27.02.2013 11:22, Razon, Oren wrote:
> Yes I'm sure.
> We used some code of us that execute the specific 
> ParallelALSFactorizationJob. 
> Same execution worked for mahout0.5 but not for 0.6 \ 0.7.
> 
> Is there anything different in the way this job is activated?
> 
> -----Original Message-----
> From: Sebastian Schelter [mailto:[email protected]] 
> Sent: Wednesday, February 27, 2013 11:56
> To: [email protected]
> Subject: Re: Using ALS job to extract the decomposed matrices
> 
> Hell Razon,
> 
> this a strange bug that should not happen. It seems that some of the
> vectors supplied to the solver are null. Are you sure that there no
> exceptions previous to this one?
> 
> Best,
> Sebastian
> 
> On 27.02.2013 09:53, Razon, Oren wrote:
>> Hi there,
>> I'm using Hadoop-core 0.20.3 and I want to use mahout ALS algorithm.
>> My purpose is to run the ALS model and extract the decomposed matrices for 
>> further usage in my application (I want to create 2 different csv files: 
>> [UserId, latentFeatureId, Value] and [ItemId, latentFeatureId, Value]).
>> I found that in Mahout0.6\7 I have the ALSUtils class who can help me 
>> extract this info.
>> However when I just try to execute a simple movielens example by executing 
>> the "ParallelALS..." job I get the error below. I found it strange cause 
>> when executing the exact same job on mahout 0.5 it works fine. Any thoughts? 
>> What differences exist between the version (and relevant to ALS) which can 
>> cause it?
>>
>> In addition, if I will use Mahout 0.5 eventually, can u advise me how can I 
>> extract the decomposed matrices data so I could build my csv's?
>>
>> Thanks,
>> Oren
>>
>> 13/02/26 04:31:27 INFO als.ParallelALSFactorizationJob: Recomputing U 
>> (iteration 0/1)
>> 13/02/26 04:31:34 INFO input.FileInputFormat: Total input paths to process : 
>> 1
>> 13/02/26 04:31:34 INFO mapred.JobClient: Running job: job_201302081356_0543
>> 13/02/26 04:31:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 13/02/26 04:31:50 INFO mapred.JobClient:  map 1% reduce 0%
>> 13/02/26 04:31:53 INFO mapred.JobClient: Task Id : 
>> attempt_201302081356_0543_m_000000_0, Status : FAILED
>> java.lang.NullPointerException
>>       at 
>> org.apache.mahout.math.als.AlternatingLeastSquaresSolver.createMiIi(AlternatingLeastSquaresSolver.java:73)
>>       at 
>> org.apache.mahout.math.als.AlternatingLeastSquaresSolver.solve(AlternatingLeastSquaresSolver.java:45)
>>       at 
>> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob$SolveExplicitFeedbackMapper.map(ParallelALSFacto
>>       at 
>> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob$SolveExplicitFeedbackMapper.map(ParallelALSFacto
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>       at 
>> org.apache.hadoop.mapred.MapTask.runNewMapper_aroundBody4(MapTask.java:813)
>>       at org.apache.hadoop.mapred.MapTask$AjcClosure5.run(MapTask.java:1)
>>       at 
>> org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
>>       at 
>> com.intel.bigdata.management.agent.HadoopTaskAspect.doPhaseCall(HadoopTaskAspect.java:166)
>>       at 
>> com.intel.bigdata.management.agent.HadoopTaskAspect.ajc$inlineAccessMethod$com_intel_bigdata_management_agent_
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Electronics Ltd.
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
> 
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 

Reply via email to