date:20160102

RE: frequent itemsets

2016-01-02 Thread LinChen

Hi roberto,I have ever done some experiments on a dataset with 3196 transactions and 289154813 frequent itemsets. FPGrowth can finish the computing within 10 minutes. I can have a try if you could share the artificial dataset. From: roberto.pagli...@asos.com To: m2linc...@outlook.com CC: use

Re: frequent itemsets

2016-01-02 Thread Roberto Pagliari

Hi Lin, >From 1e-5 and below it crashes with me. I also developed my own program in C++ >(single machine, no spark) and I was able to compute all itemsets, that is, >support = 0. Stack overflow definitely occur when computing frequent itemset, before association rule even starts. If you want, I

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

2016-01-02 Thread Prabhu Joseph

The attached image just has thread states, and WAITING threads need not be the issue. We need to take thread stack traces and identify at which area of code, threads are spending lot of time. Use jstack -l or kill -3 , where pid is the process id of the executor process. Take jstack stack trace f

Re: How to specify the numFeatures in HashingTF

2016-01-02 Thread Chris Fregly

You can use CrossValidator/TrainingValidationSplit with ParamGridBuilder and Evaluator to empirically choose the model hyper parameters (ie. numFeatures) per the following: http://spark.apache.org/docs/latest/ml-guide.html#example-model-selection-via-cross-validation http://spark.apache.org/docs/

Re: how to extend java transformer from Scala UnaryTransformer ?

2016-01-02 Thread Chris Fregly

Looks like you're not registering the input param correctly. Below are examples from the Spark Java source that show how to build a custom transformer. Note that a Model is a Transformer. Also, that chimpler/wordpress/naive bayes example is a bit dated. I tried to implement it a while ago, but

Re: Cannot get repartitioning to work

2016-01-02 Thread jimitkr

Thanks. Repartitioning works now. Thread closed :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-get-repartitioning-to-work-tp25852p25858.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: feedback on the use of Spark’s gateway hidden REST API (standalone cluster mode) for application submission

2016-01-02 Thread Jim Lohse

There is a lot of interesting info about this API here: https://issues.apache.org/jira/browse/SPARK-5388 I got that from a comment thread on the last link in your PR. Thanks for bringing this up! I knew you could check status via REST per http://spark.apache.org/docs/latest/monitoring.html#res

feedback on the use of Spark’s gateway hidden REST API (standalone cluster mode) for application submission

2016-01-02 Thread HILEM Youcef

Happy new year. I solicit community feedback on the use of Spark’s gateway hidden REST API (standalone cluster mode) for application submission. We already use status checking and cancellation in our ansible scripts. I also opened a ticket to make this API public (https://issues.apache.org/ji

Re: SparkSQL integration issue with AWS S3a

2016-01-02 Thread KOSTIANTYN Kudriavtsev

thanks Jerry, it works! really appreciate your help Thank you, Konstantin Kudryavtsev On Fri, Jan 1, 2016 at 4:35 PM, Jerry Lam wrote: > Hi Kostiantyn, > > You should be able to use spark.conf to specify s3a keys. > > I don't remember exactly but you can add hadoop properties by prefixing > spa

Re: Unable to read JSON input in Spark (YARN Cluster)

2016-01-02 Thread Vijay Gharge

Hi Few suggestions: 1. Try storage mode as "memory and disk" both. >> to verify heap memory error 2. Try to copy and read json source file from local filesystem (i.e. Without hdfs) >> to verify minimum working code 3. Looks like some library issue which is causing lzo telated error. On Saturday

RE: frequent itemsets

2016-01-02 Thread LinChen

Hi Roberto,What is the minimum support threshold you set? Could you check which stage you ran into StackOverFlow exception? Thanks. From: roberto.pagli...@asos.com To: yblia...@gmail.com CC: user@spark.apache.org Subject: Re: frequent itemsets Date: Sat, 2 Jan 2016 12:01:31 + Hi Yanbo,

Does state survive application restart in StatefulNetworkWordCount?

2016-01-02 Thread Rado Buranský

I am trying to understand how state in Spark Streaming works in general. If I run this example program twice will the second run see state from the first run? https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala It s

Re: frequent itemsets

2016-01-02 Thread Roberto Pagliari

Hi Yanbo, Unfortunately, I cannot share the data. I am using the code in the tutorial https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html Did you ever try run it when there are hundreds of millions of co-purchases of at least two products? I suspect AR does not handle that ve

Re: frequent itemsets

2016-01-02 Thread Yanbo Liang

Hi Roberto, Could you share your code snippet that others can help to diagnose your problems? 2016-01-02 7:51 GMT+08:00 Roberto Pagliari : > When using the frequent itemsets APIs, I’m running into stackOverflow > exception whenever there are too many combinations to deal with and/or too > many

Re: Problem embedding GaussianMixtureModel in a closure

2016-01-02 Thread Yanbo Liang

Hi Tomasz, The GMM is bind with the peer Java GMM object, so it need reference to SparkContext. Some of MLlib(not ML) models are simple object such as KMeansModel, LinearRegressionModel etc., but others will refer SparkContext. The later ones and corresponding member functions should not called in

Re: How to load partial data from HDFS using Spark SQL

2016-01-02 Thread swetha kasireddy

OK. What should the table be? Suppose I have a bunch of parquet files, do I just specify the directory as the table? On Fri, Jan 1, 2016 at 11:32 PM, UMESH CHAUDHARY wrote: > Ok, so whats wrong in using : > > var df=HiveContext.sql("Select * from table where id = ") > //filtered data frame > df.

RE: frequent itemsets

Re: frequent itemsets

Re: How to find cause(waiting threads etc) of hanging job for 7 hours?

Re: How to specify the numFeatures in HashingTF

Re: how to extend java transformer from Scala UnaryTransformer ?

Re: Cannot get repartitioning to work

Re: feedback on the use of Spark’s gateway hidden REST API (standalone cluster mode) for application submission

feedback on the use of Spark’s gateway hidden REST API (standalone cluster mode) for application submission

Re: SparkSQL integration issue with AWS S3a

Re: Unable to read JSON input in Spark (YARN Cluster)

RE: frequent itemsets

Does state survive application restart in StatefulNetworkWordCount?

Re: frequent itemsets

Re: frequent itemsets

Re: Problem embedding GaussianMixtureModel in a closure

Re: How to load partial data from HDFS using Spark SQL

16 matches

Site Navigation

Mail list logo

Footer information