Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
Thanks for the advice! The following line causes spark to crash: kp, descriptors = sift.detectAndCompute(gray, None) But I do need this line to be executed and the code does not crash when running outside of Spark but passing the same parameters. You're saying maybe the bytes from the sequencefil

Re: Batch aggregation by sliding window + join

2015-05-30 Thread Igor Berman
yes, I see now. In a case of 3 days it's indeed possible, however if I want to hold 30 days(or even bigger) block aggregation it will be a bit slow. for the sake of the history: I've found several directions that I can improve shuffling(from video https://www.youtube.com/watch?v=Wg2boMqLjCg) e.g.

Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
I've verified the issue lies within Spark running OpenCV code and not within the sequence file BytesWritable formatting. This is the code which can reproduce that spark is causing the failure by not using the sequencefile as input at all but running the same function with same input on spark but f

Re: [Streaming] Configure executor logging on Mesos

2015-05-30 Thread andy petrella
Hello, I'm currently exploring DCOS for the spark notebook, and while looking at the spark configuration I found something interesting which is actually converging to what we've discovered: https://github.com/mesosphere/universe/blob/master/repo/packages/S/spark/0/marathon.json So the logging is

Re: [Streaming] Configure executor logging on Mesos

2015-05-30 Thread Tim Chen
So sounds like some generic downloadable uris support can solve this problem, that Mesos automatically places in your sandbox and you can refer to it. If so please file a jira and this is a pretty simple fix on the Spark side. Tim On Sat, May 30, 2015 at 7:34 AM, andy petrella wrote: > Hello,

Re: spark-sql errors

2015-05-30 Thread Sanjay Subramanian
any ideas guys ? how to solve this ? From: Sanjay Subramanian To: user Sent: Friday, May 29, 2015 5:29 PM Subject: spark-sql errors https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/6SqGuYemnbc

Why is my performance on local really slow?

2015-05-30 Thread Tal
Hi, following my previous post I have been trying to find the best way to intersect an RDD of Longs (ids) with an RDD of (id, value) pairs such that i end up with just the values of the ids from th

import CSV file using read.csv

2015-05-30 Thread sherine ahmed
Hi All , I need to read a csv file using read.csv function in igraph by python but i don't know where to put the data to be read by igraph and i don't know the exact syntax of importing a csv file into igraph so i would appreciate any help -- View this message in context: http://apache-spark-u

Re: Spark 1.3.0 -> 1.3.1 produces java.lang.NoSuchFieldError: NO_FILTER

2015-05-30 Thread ogoh
I had the same issue on AWS EMR with Spark 1.3.1.e (AWS version) passed with '-h' parameter (it is bootstrap action parameter for spark). I don't see the problem with Spark 1.3.1.e not passing the parameter. I am not sure about your env. Thanks, -- View this message in context: http://apache-s

Re: Spark 1.3.0 -> 1.3.1 produces java.lang.NoSuchFieldError: NO_FILTER

2015-05-30 Thread Yin Huai
Looks like your program somehow picked up a older version of parquet (spark 1.3.1 uses parquet 1.6.0rc3 and seems NO_FILTER field was introduced in 1.6.0rc2). Is it possible that you can check the parquet lib version in your classpath? Thanks, Yin On Sat, May 30, 2015 at 2:44 PM, ogoh wrote: >

Re: MLlib: how to get the best model with only the most significant explanatory variables in LogisticRegressionWithLBFGS or LogisticRegressionWithSGD ?

2015-05-30 Thread Joseph Bradley
Spark 1.4 should be available next month, but I'm not sure about the exact date. Your interpretation of high lambda is reasonable. "High" lambda is really data-dependent. "lambda" is the same as the "regParam" in Spark, available in all recent Spark versions. On Fri, May 29, 2015 at 5:35 AM, méla

Re: How to get the best performance with LogisticRegressionWithSGD?

2015-05-30 Thread Joseph Bradley
This is really getting into an understanding of how optimization and GLMs work. I'd recommend reading some intro ML or stats literature on how Generalized Linear Models are estimated, as well as how convex optimization is used in ML. There are some free online texts as well as MOOCs which have go

Re: MLlib: how to get the best model with only the most significant explanatory variables in LogisticRegressionWithLBFGS or LogisticRegressionWithSGD ?

2015-05-30 Thread ayan guha
I hope they will come up with1.4 before spark summit in mid June On 31 May 2015 10:07, "Joseph Bradley" wrote: > Spark 1.4 should be available next month, but I'm not sure about the exact > date. > Your interpretation of high lambda is reasonable. "High" lambda is really > data-dependent. > "lam

Re: Question regarding spark data partition and coalesce. Need info on my use case.

2015-05-30 Thread firemonk9
I had a similar requirement and I come up with a small algorithem to determine number of partitions based on cluster size and input data. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Question-regarding-spark-data-partition-and-coalesce-Need-info-on-my-us

Re: MLlib: how to get the best model with only the most significant explanatory variables in LogisticRegressionWithLBFGS or LogisticRegressionWithSGD ?

2015-05-30 Thread DB Tsai
Alternatively, I will give a talk about LOR and LIR with elastic-net implementation and interpretation of those models in spark summit. https://spark-summit.org/2015/events/large-scale-lasso-and-elastic-net-regularized-generalized-linear-models/ You may attend or watch online. Sincerely, DB Ts