Re: MLib : Non Linear Optimization

2016-10-04 Thread nsareen
I'm not getting any support in this group, is the question not valid ? need someone to reply to this question, we have a huge dependency on SAS which we want to eliminate & want to know if spark can help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLib-

Re: MLib : Non Linear Optimization

2016-09-07 Thread nsareen
Any answer to this question group ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLib-Non-Linear-Optimization-tp27645p27676.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

MLib : Non Linear Optimization

2016-09-01 Thread nsareen
I'm part of an Predictive Analytics marketing platform. We do a lot of Optimizations ( non linear ), currently using SAS / Lindo routines. I was going through Spark's MLib documentation & found it supports Linear Optimization, was wondering if it also supports Non Linear Optimization & if not, are

input size too large | Performance issues with Spark

2015-03-28 Thread nsareen
Hi All, I'm facing performance issues with spark implementation, and was briefly investigating on WebUI logs, i noticed that my RDD size is 55GB & the Shuffle Write is 10 GB & Input Size is 200GB. Application is a web application which does predictive analytics, so we keep most of our data in memo

Re: Does filter on an RDD scan every data item ?

2014-12-15 Thread nsareen
Thanks! shall try it out. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20683.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: Does filter on an RDD scan every data item ?

2014-12-07 Thread nsareen
@Sowen, would appreciate, if you can explain how would Spark SQL help in my scenario.. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20571.html Sent from the Apache Spark User List mailing list archive at

Re: Does filter on an RDD scan every data item ?

2014-12-05 Thread nsareen
Any thoughts, how could Spark SQL help in our scenario ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20465.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Does filter on an RDD scan every data item ?

2014-12-04 Thread nsareen
I'm not sure sample is what i was looking for. As mentioned in another post above. this is what i'm looking for. 1) My RDD contains this structure. Tuple2. 2) Each CustomTuple is a combination of string id's e.g. CustomTuple.dimensionOne="AE232323" CustomTuple.dimensionTwo="BE232323" CustomTupl

Re: Does filter on an RDD scan every data item ?

2014-12-04 Thread nsareen
Thanks for the reply! To be honest, I was expecting spark to have some sort of Indexing for keys, which would help it locate the keys efficiently. I wasn't using Spark SQL here, but if it helps perform this efficiently, i can try it out, can you please elaborate, how will it be helpful in this sc

Re: Calling spark from a java web application.

2014-12-02 Thread nsareen
We have a web application which talks to spark server. This is how we have done the integration. 1) In the tomcat's classpath, add the spark distribution jar for spark code to be available at runtime ( for you it would be Jetty). 2) In the Web application project, add the spark distribution jar in

Does filter on an RDD scan every data item ?

2014-12-02 Thread nsareen
Hi , I wanted some clarity into the functioning of Filter function of RDD. 1) Does filter function scan every element saved in RDD? if my RDD represents 10 Million rows, and if i want to work on only 1000 of them, is there an efficient way of filtering the subset without having to scan every elem

RDD Action require data from Another RDD

2014-11-20 Thread nsareen
Hi, We have a requirement, where we have two data sets represented by RDD's RDDA & RDDB. For performing an aggregation operation on RDDA, the action would need RDDB's subset of data, wanted to understand if there is a best practice in doing this ? Dont even know how will this be possible as of

Saving RDD into DB & then Reading back from DB

2014-11-12 Thread nsareen
Hi All, I know that Spark has integration with cassandra DB. Can the RDD be persisted into DB, be read back into the same state, on server boot ? If yes, are there any examples which would demonstrate how it's done ? We have a requirement, where we are currently saving a snapshot of many rows in

Re: Efficient Key Structure in pairRDD

2014-11-11 Thread nsareen
Spark Dev / Users, help in this regard would be appreciated, we are kind of stuck at this point. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-Key-Structure-in-pairRDD-tp18461p18557.html Sent from the Apache Spark User List mailing list archive

Efficient Key Structure in pairRDD

2014-11-09 Thread nsareen
Hi, We are trying to adopt Spark for our application. We have an analytical application which stores data in Star Schemas ( SQL Server ). All the cubes are loaded into a Key / Value structure and saved in Trove ( in memory collection ). here key is a short array where each short number represents

Re: How to trace/debug serialization?

2014-11-06 Thread nsareen
Will this work even with Kryo Serialization ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serialization-tp18230p18319.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: Task size variation while using Range Vs List

2014-11-06 Thread nsareen
Thanks for the response!! Will try to see the behaviour with Cache() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-size-variation-while-using-Range-Vs-List-tp18243p18318.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: How to trace/debug serialization?

2014-11-05 Thread nsareen
>From what i've observed, there are no debug logs while serialization takes place. You can see the source code if you want, TaskSetManager class has some functions for serialization. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-trace-debug-serializ

Task size variation while using Range Vs List

2014-11-05 Thread nsareen
I noticed a behaviour where it was observed that, if i'm using val temp = sc.parallelize ( 1 to 10) temp.collect Task size will be in bytes let's say ( 1120 bytes). But if i change this to a for loop import scala.collection.mutable.ArrayBuffer val data= new ArrayBuffer[Integer]() for(i <-

Task Size Increases when using loops

2014-10-29 Thread nsareen
Hi,I'm new to spark, and am facing a peculiar problem. I'm writing a simple Java Driver program where i'm creating Key / Value data structure and collecting them, once created. The problem i'm facing is that, when i increase the iterations of a for loop which creates the ArrayList of Long Values wh

Re: Spark Concepts

2014-10-15 Thread nsareen
Anybody with good hands on with Spark, please do reply. It would help us a lot!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Concepts-tp16477p16536.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Spark in cluster and errors

2014-10-15 Thread nsareen
Did you manage to solve this issue ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-in-cluster-and-errors-tp16249p16479.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark Concepts

2014-10-15 Thread nsareen
Hi ,I'm pretty new to Big Data & Spark both. I've just started POC work on spark and me & my team are evaluating it with other In Memory computing tools such as GridGain, Bigmemory, Aerospike & some others too, specifically to solve two sets of problems.1) Data Storage : Our current application ru