Re: Linear regression + Janino Exception

2016-11-20 Thread janardhan shetty
Seems like this is associated to : https://issues.apache.org/jira/browse/SPARK-16845 On Sun, Nov 20, 2016 at 6:09 PM, janardhan shetty wrote: > Hi, > > I am trying to execute Linear regression algorithm for Spark 2.02 and > hitting the below error when I am fitting my training

Linear regression + Janino Exception

2016-11-20 Thread janardhan shetty
Hi, I am trying to execute Linear regression algorithm for Spark 2.02 and hitting the below error when I am fitting my training set: val lrModel = lr.fit(train) It happened on 2.0.0 as well. Any resolution steps is appreciated. *Error Snippet: * 16/11/20 18:03:45 *ERROR CodeGenerator: failed t

Re: Usage of mllib api in ml

2016-11-20 Thread janardhan shetty
use BinaryClassificationEvaluator, and it should be very >> straightforward to switch to MulticlassClassificationEvaluator. >> >> Thanks >> Yanbo >> >> On Sat, Nov 19, 2016 at 9:03 AM, janardhan shetty > > wrote: >> >>> Hi, >>> >>&

Usage of mllib api in ml

2016-11-19 Thread janardhan shetty
Hi, I am trying to use the evaluation metrics offered by mllib multiclassmetrics in ml dataframe setting. Is there any examples how to use it?

Re: Log-loss for multiclass classification

2016-11-16 Thread janardhan shetty
I am sure some work might be in pipeline as it is a normal evaluation criteria. Any thoughts or links ? On Nov 15, 2016 11:15 AM, "janardhan shetty" wrote: > Hi, > > Best practice for multi class classification technique is to evaluate the > model by *log-loss*. >

Log-loss for multiclass classification

2016-11-15 Thread janardhan shetty
Hi, Best practice for multi class classification technique is to evaluate the model by *log-loss*. Is there any jira or work going on to implement the same in *MulticlassClassificationEvaluator* Currently it supports following : (supports "f1" (default), "weightedPrecision", "weightedRecall", "a

Re: Convert SparseVector column to Densevector column

2016-11-14 Thread janardhan shetty
gt; (0.2, Vectors.sparse(16, Array(0, 3), Array(0.1, 0.3.toDF("a", "b") > df.select(toSV($"b")) > > // maropu > > > On Mon, Nov 14, 2016 at 1:20 PM, janardhan shetty > wrote: > >> Hi, >> >> Is there an

Convert SparseVector column to Densevector column

2016-11-13 Thread janardhan shetty
Hi, Is there any easy way of converting a dataframe column from SparseVector to DenseVector using import org.apache.spark.ml.linalg.DenseVector API ? Spark ML 2.0

Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread janardhan shetty
the columns. > > > > > On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty > wrote: > >> I had already tried this way : >> >> scala> val featureCols = Array("category","newone") >> featureCols: Array[String] = Array(category, newone)

Re: Deep learning libraries for scala

2016-10-19 Thread janardhan shetty
lgorithms in this instance unless you want to start > developing algorithms from grounds up ( and in which case you might not > require any libraries at all). > > On Sat, Oct 1, 2016 at 3:30 AM, janardhan shetty > wrote: > >> Hi, >> >> Are there any good libraries which can be used for scala deep learning >> models ? >> How can we integrate tensorflow with scala ML ? >> > > >

Re: Deep learning libraries for scala

2016-10-05 Thread janardhan shetty
Any help from the experts regarding this is appreciated On Oct 3, 2016 1:45 PM, "janardhan shetty" wrote: > Thanks Ben. The current spark ML package has feed forward multilayer > perceptron algorithm as well and just wondering how different is your > implementation ? > ht

Re: Deep learning libraries for scala

2016-10-03 Thread janardhan shetty
se, let me know if you have any > comment or questions. > > > Hope this helps. > > Cheers, > Ben > > On Oct 3, 2016, at 12:05 PM, janardhan shetty > wrote: > > Any leads in this regard ? > > On Sat, Oct 1, 2016 at 1:48 PM, janardhan shetty > wrote: > >>

Re: Deep learning libraries for scala

2016-10-03 Thread janardhan shetty
Any leads in this regard ? On Sat, Oct 1, 2016 at 1:48 PM, janardhan shetty wrote: > Apparently there are no Neural network implementations in tensorframes > which we can use right ? or Am I missing something here. > > I would like to apply neural networks for an NLP settting is t

Re: Deep learning libraries for scala

2016-10-01 Thread janardhan shetty
< suresh.thalam...@gmail.com> wrote: > Tensor frames > > https://spark-packages.org/package/databricks/tensorframes > > Hope that helps > -suresh > > On Sep 30, 2016, at 8:00 PM, janardhan shetty > wrote: > > Looking for scala dataframes in particular ? >

Re: Deep learning libraries for scala

2016-09-30 Thread janardhan shetty
Looking for scala dataframes in particular ? On Fri, Sep 30, 2016 at 7:46 PM, Gavin Yue wrote: > Skymind you could try. It is java > > I never test though. > > > On Sep 30, 2016, at 7:30 PM, janardhan shetty > wrote: > > > > Hi, > > > > Are there a

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
ructing and pruning them for over 30 > years. I think it's rather a question for a historian at this point. > > On Fri, Sep 30, 2016 at 5:08 PM, janardhan shetty > wrote: > >> Read this explanation but wondering if this algorithm has the base from a >> research p

Deep learning libraries for scala

2016-09-30 Thread janardhan shetty
Hi, Are there any good libraries which can be used for scala deep learning models ? How can we integrate tensorflow with scala ML ?

Re: Spark ML Decision Trees Algorithm

2016-09-30 Thread janardhan shetty
e.html > > Thanks, > Kevin > > On Fri, Sep 30, 2016 at 1:14 AM, janardhan shetty > wrote: > >> Hi, >> >> Any help here is appreciated .. >> >> On Wed, Sep 28, 2016 at 11:34 AM, janardhan shetty < >> janardhan...@gmail.com> wrote: >> &

Re: Spark ML Decision Trees Algorithm

2016-09-29 Thread janardhan shetty
Hi, Any help here is appreciated .. On Wed, Sep 28, 2016 at 11:34 AM, janardhan shetty wrote: > Is there a reference to the research paper which is implemented in spark > 2.0 ? > > On Wed, Sep 28, 2016 at 9:52 AM, janardhan shetty > wrote: > >> Which algorithm is use

Re: Spark ML Decision Trees Algorithm

2016-09-28 Thread janardhan shetty
Is there a reference to the research paper which is implemented in spark 2.0 ? On Wed, Sep 28, 2016 at 9:52 AM, janardhan shetty wrote: > Which algorithm is used under the covers while doing decision trees FOR > SPARK ? > for example: scikit-learn (python) uses an optimised version of

Spark ML Decision Trees Algorithm

2016-09-28 Thread janardhan shetty
Which algorithm is used under the covers while doing decision trees FOR SPARK ? for example: scikit-learn (python) uses an optimised version of the CART algorithm.

Re: SPARK-10835 in 2.0

2016-09-20 Thread janardhan shetty
Hi Sean, Any suggestions for workaround as of now? On Sep 20, 2016 7:46 AM, "janardhan shetty" wrote: > Thanks Sean. > On Sep 20, 2016 7:45 AM, "Sean Owen" wrote: > >> Ah, I think that this was supposed to be changed with SPARK-9062. Let >> me se

Re: SPARK-10835 in 2.0

2016-09-20 Thread janardhan shetty
Thanks Sean. On Sep 20, 2016 7:45 AM, "Sean Owen" wrote: > Ah, I think that this was supposed to be changed with SPARK-9062. Let > me see about reopening 10835 and addressing it. > > On Tue, Sep 20, 2016 at 3:24 PM, janardhan shetty > wrote: > > Is this a bug? &

Re: SPARK-10835 in 2.0

2016-09-20 Thread janardhan shetty
Is this a bug? On Sep 19, 2016 10:10 PM, "janardhan shetty" wrote: > Hi, > > I am hitting this issue. https://issues.apache.org/jira/browse/SPARK-10835 > . > > Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is > appreciated ? > > Note:

SPARK-10835 in 2.0

2016-09-19 Thread janardhan shetty
Hi, I am hitting this issue. https://issues.apache.org/jira/browse/SPARK-10835. Issue seems to be resolved but resurfacing in 2.0 ML. Any workaround is appreciated ? Note: Pipeline has Ngram before word2Vec. Error: val word2Vec = new Word2Vec().setInputCol("wordsGrams").setOutputCol("features")

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread janardhan shetty
om.google.protobuf" % "protobuf-java" % "2.6.1", > "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models", > "org.scalatest" %% "scalatest" % "2.2.6" % &qu

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
un, Sep 18, 2016 at 2:21 PM, Sujit Pal wrote: > Hi Janardhan, > > Maybe try removing the string "test" from this line in your build.sbt? > IIRC, this restricts the models JAR to be called from a test. > > "edu.stanford.nlp" % "stanford-corenlp" % &quo

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
glish-left3words-distsim.tagger" as class path, filename or URL at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:765) On Sun, Sep 18, 2016 at 12:27 PM, janardhan she

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11 On Sun, Sep 18, 2016 at 12:26 PM, janardhan shetty wrote: > Hi Jacek, > > Thanks for your response. This is the code I am trying to execute > > import org.apache.spark.sql.funct

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
astering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sun, Sep 18, 2016 at 8:01 PM, janardhan shetty > wrote: > > Hi, > > > > I am trying to use lemmatization as a transformer and added belwo to the

Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Hi, I am trying to use lemmatization as a transformer and added belwo to the build.sbt "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", "com.google.protobuf" % "protobuf-java" % "2.6.1", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier "models", "org.scalatest"

Re: LDA spark ML visualization

2016-09-13 Thread janardhan shetty
Any help is appreciated to proceed in this problem. On Sep 12, 2016 11:45 AM, "janardhan shetty" wrote: > Hi, > > I am trying to visualize the LDA model developed in spark scala (2.0 ML) > in LDAvis. > > Is there any links to convert the spark model parameters to

LDA spark ML visualization

2016-09-12 Thread janardhan shetty
Hi, I am trying to visualize the LDA model developed in spark scala (2.0 ML) in LDAvis. Is there any links to convert the spark model parameters to the following 5 params to visualize ? 1. φ, the K × W matrix containing the estimated probability mass function over the W terms in the vocabulary f

Re: Spark transformations

2016-09-12 Thread janardhan shetty
column. So far no great solution. > > Sorry I don't have any answers, but wanted to chime in that I am also a > bit stuck on similar issues. Hope we can find a workable solution soon. > Cheers, > Thunder > > > > On Tue, Sep 6, 2016 at 1:32 PM janardhan shetty > wro

Re: Using spark package XGBoost

2016-09-08 Thread janardhan shetty
Tried to implement spark package in 2.0 https://spark-packages.org/package/rotationsymmetry/sparkxgboost but it is throwing the error: error: not found: type SparkXGBoostClassifier On Tue, Sep 6, 2016 at 11:26 AM, janardhan shetty wrote: > Is this merged to Spark ML ? If so which vers

Difference between UDF and Transformer in Spark ML

2016-09-06 Thread janardhan shetty
Apart from creation of a new column what are the other differences between transformer and an udf in spark ML ?

Re: Spark ML 2.1.0 new features

2016-09-06 Thread janardhan shetty
; > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Tue, Sep 6, 2016 at 10:27 PM, janardhan shetty > wrote: > > Any links ? > > > > On Mon, Sep 5,

Re: Spark transformations

2016-09-06 Thread janardhan shetty
orward checking* how can we get this information ? We have visibility into single element and not the entire column. On Sun, Sep 4, 2016 at 9:30 AM, janardhan shetty wrote: > In scala Spark ML Dataframes. > > On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Sekar tigeranalytics.com> w

Re: Spark ML 2.1.0 new features

2016-09-06 Thread janardhan shetty
Any links ? On Mon, Sep 5, 2016 at 1:50 PM, janardhan shetty wrote: > Is there any documentation or links on the new features which we can > expect for Spark ML 2.1.0 release ? >

Re: Using spark package XGBoost

2016-09-06 Thread janardhan shetty
gt;> 2.10) [1] so you need to build the project yourself and uber-jar it >>> (using sbt-assembly plugin). >>> >>> [1] https://spark-packages.org/package/rotationsymmetry/sparkxgboost >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> >&

Spark ML 2.1.0 new features

2016-09-05 Thread janardhan shetty
Is there any documentation or links on the new features which we can expect for Spark ML 2.1.0 release ?

Re: Spark transformations

2016-09-04 Thread janardhan shetty
In scala Spark ML Dataframes. On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Sekar < somasundar.se...@tigeranalytics.com> wrote: > Can you try this > > https://www.linkedin.com/pulse/hive-functions-udfudaf- > udtf-examples-gaurav-singh > > On 4 Sep 2016 9:38 pm, "jana

Spark transformations

2016-09-04 Thread janardhan shetty
Hi, Is there any chance that we can send entire multiple columns to an udf and generate a new column for Spark ML. I see similar approach as VectorAssembler but not able to use few classes /traitslike HasInputCols, HasOutputCol, DefaultParamsWritable since they are private. Any leads/examples is

Re: Combining multiple models in Spark-ML 2.0

2016-08-23 Thread janardhan shetty
Any methods to achieve this? On Aug 22, 2016 3:40 PM, "janardhan shetty" wrote: > Hi, > > Are there any pointers, links on stacking multiple models in spark > dataframes ?. WHat strategies can be employed if we need to combine greater > than 2 models ? >

Combining multiple models in Spark-ML 2.0

2016-08-22 Thread janardhan shetty
Hi, Are there any pointers, links on stacking multiple models in spark dataframes ?. WHat strategies can be employed if we need to combine greater than 2 models ?

Re: Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-22 Thread janardhan shetty
://lists.apache.org/ > thread.html/a7e06426fd958665985d2c4218ea2f9bf9ba136ddefe83e1ad6f1727@% > 3Cuser.spark.apache.org%3E for some details). > > > > On Mon, 22 Aug 2016 at 03:20 janardhan shetty > wrote: > >> Thanks Krishna for your response. >> Features in the training set has more cat

Re: Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread janardhan shetty
29,471, then the X Matrix is not right. >> 2. It is also probable that the size of the test-data is something >>else. If so, check the data pipeline. >>3. If you print the count() of the various vectors, I think you can >>find the error. >> >> C

Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread janardhan shetty
Hi, I have built the logistic regression model using training-dataset. When I am predicting on a test-dataset, it is throwing the below error of size mismatch. Steps done: 1. String indexers on categorical features. 2. One hot encoding on these indexed features. Any help is appreciated to resolv

Re: SPARK MLLib - How to tie back Model.predict output to original data?

2016-08-18 Thread janardhan shetty
There is a spark-ts package developed by Sandy which has rdd version. Not sure about the dataframe roadmap. http://sryza.github.io/spark-timeseries/0.3.0/index.html On Aug 18, 2016 12:42 AM, "ayan guha" wrote: > Thanks a lot. I resolved it using an UDF. > > Qs: does spark support any time series

Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
ps://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder, > I see that it still accepts one column at a time. > > On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty > wrote: > >> 2.0: >> >> One hot encoding currently accepts single input column is there a way to >> include multiple columns ? >> > >

Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
2.0: One hot encoding currently accepts single input column is there a way to include multiple columns ?

Re: Using spark package XGBoost

2016-08-14 Thread janardhan shetty
Any leads how to do acheive this? On Aug 12, 2016 6:33 PM, "janardhan shetty" wrote: > I tried using *sparkxgboost package *in build.sbt file but it failed. > Spark 2.0 > Scala 2.11.8 > > Error: > [warn] http://dl.bintray.com/spark-packages/maven/ > rotationsym

Re: Using spark package XGBoost

2016-08-12 Thread janardhan shetty
; => MergeStrategy.first case "application.conf" => MergeStrategy.concat case "unwanted.txt"=> MergeStrategy.discard case x => val oldStrategy = (assemblyMergeStrategy in assembly).value oldStrategy(x) } On Fri, Aug 12, 2016 at 3:35 PM, janardhan shetty wrote: > Is there a dataframe version of XGBoost in spark-ml ?. > Has anyone used sparkxgboost package ? >

Using spark package XGBoost

2016-08-12 Thread janardhan shetty
Is there a dataframe version of XGBoost in spark-ml ?. Has anyone used sparkxgboost package ?

Re: Symbol HasInputCol is inaccesible from this place

2016-08-08 Thread janardhan shetty
Can some experts shed light on this one? Still facing issues with extends HasInputCol and DefaultParamsWritable On Mon, Aug 8, 2016 at 9:56 AM, janardhan shetty wrote: > you mean is it deprecated ? > > On Mon, Aug 8, 2016 at 5:02 AM, Strange, Nick > wrote: > >> What po

Re: Symbol HasInputCol is inaccesible from this place

2016-08-08 Thread janardhan shetty
you mean is it deprecated ? On Mon, Aug 8, 2016 at 5:02 AM, Strange, Nick wrote: > What possible reason do they have to think its fragmentation? > > > > *From:* janardhan shetty [mailto:janardhan...@gmail.com] > *Sent:* Saturday, August 06, 2016 2:01 PM > *To:* Ted Yu &g

Re: [Spark1.6] Or (||) operator not working in DataFrame

2016-08-07 Thread janardhan shetty
Can you try 'or' keyword instead? On Aug 7, 2016 7:43 AM, "Divya Gehlot" wrote: > Hi, > I have use case where I need to use or[||] operator in filter condition. > It seems its not working its taking the condition before the operator and > ignoring the other filter condition after or operator. > A

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread janardhan shetty
2016 at 1:18 PM, janardhan shetty > wrote: > >> Version : 2.0.0-preview >> >> import org.apache.spark.ml.param._ >> import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} >> >> >> class CustomTransformer(override val uid: String)

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread janardhan shetty
Any thoughts or suggestions on this error? On Thu, Aug 4, 2016 at 1:18 PM, janardhan shetty wrote: > Version : 2.0.0-preview > > import org.apache.spark.ml.param._ > import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} > > > class CustomTransformer(ove

Re: Generating unique id for a column in Row without breaking into RDD and joining back

2016-08-05 Thread janardhan shetty
Mike, Any suggestions on doing it for consequitive id's? On Aug 5, 2016 9:08 AM, "Tony Lane" wrote: > Mike. > > I have figured how to do this . Thanks for the suggestion. It works > great. I am trying to figure out the performance impact of this. > > thanks again > > > On Fri, Aug 5, 2016 at 9

Symbol HasInputCol is inaccesible from this place

2016-08-04 Thread janardhan shetty
Version : 2.0.0-preview import org.apache.spark.ml.param._ import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol} class CustomTransformer(override val uid: String) extends Transformer with HasInputCol with HasOutputCol with DefaultParamsWritableimport org.apache.spark.ml.param.share

Re: decribe function limit of columns

2016-08-02 Thread janardhan shetty
If you are referring to limit the # of columns you can select the columns and describe. df.select("col1", "col2").describe().show() On Tue, Aug 2, 2016 at 6:39 AM, pseudo oduesp wrote: > Hi > in spark 1.5.0 i used descibe function with more than 100 columns . > someone can tell me if any limi

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-08-01 Thread janardhan shetty
What is the difference between UnaryTransformer and Transformer classes. In which scenarios should we use one or the other ? On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty wrote: > Developing in scala but any help with difference between UnaryTransformer > (Is this experimental still

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-31 Thread janardhan shetty
loped a simple ML estimator (in Java) that implements > conditional Markov model for sequence labelling in Vitk toolkit. You > can check it out here: > > > https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java > > Phuong Le-Hong > > On Fri,

Re: Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-28 Thread janardhan shetty
https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>. > > -- > Steve > www.lucidworks.com > > > On Jul 27, 2016, at 1:31 PM, janardhan shetty > wrote: > > > > 1. Any links or blogs to develop custom transformers ? ex: Tokenizer > > > > 2. Any links or blogs to develop custom estimators ? ex: any ml algorithm > >

Re: ORC v/s Parquet for Spark 2.0

2016-07-27 Thread janardhan shetty
ransitive dependencies. yikes >>> >>> On Jul 26, 2016 5:09 AM, "Jörn Franke" wrote: >>> >>>> I think both are very similar, but with slightly different goals. While >>>> they work transparently for each Hadoop application you need to en

Writing custom Transformers and Estimators like Tokenizer in spark ML

2016-07-27 Thread janardhan shetty
1. Any links or blogs to develop *custom* transformers ? ex: Tokenizer 2. Any links or blogs to develop *custom* estimators ? ex: any ml algorithm

Re: Maintaining order of pair rdd

2016-07-26 Thread janardhan shetty
n do this > val reduced = myRDD.reduceByKey((first, second) => first ++ second) > > val sorted = reduced.sortBy(tpl => tpl._1) > > hth > > > > On Tue, Jul 26, 2016 at 3:31 AM, janardhan shetty > wrote: > >> groupBy is a shuffle operation and index is alr

Re: ORC v/s Parquet for Spark 2.0

2016-07-25 Thread janardhan shetty
uld choose Parquet > 5) AFAIK, Parquet has its metadata at the end of the file (correct me if > something has changed) . It means that Parquet file must be completely read > & put into RAM. If there is no enough RAM or file somehow is corrupted --> > problems arise > > On Tue,

Re: Maintaining order of pair rdd

2016-07-25 Thread janardhan shetty
Basically , a groupBy reduces your structure to (anyone correct me if i m > wrong) a RDD[(key,val)], which you can see as a tuple.so you could use > sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1) > hth > > On Mon, Jul 25, 2016 at 1:21 AM, janardhan shetty > wr

ORC v/s Parquet for Spark 2.0

2016-07-25 Thread janardhan shetty
Just wondering advantages and disadvantages to convert data into ORC or Parquet. In the documentation of Spark there are numerous examples of Parquet format. Any strong reasons to chose Parquet over ORC file format ? Also : current data compression is bzip2 http://stackoverflow.com/questions/32

Re: Bzip2 to Parquet format

2016-07-25 Thread janardhan shetty
uet file. > > Reference for SQLContext / createDataFrame: > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SQLContext > > > > On Jul 24, 2016, at 5:34 PM, janardhan shetty > wrote: > > We have data in Bz2 compression format. Any l

Bzip2 to Parquet format

2016-07-24 Thread janardhan shetty
We have data in Bz2 compression format. Any links in Spark to convert into Parquet and also performance benchmarks and uses study materials ?

K-means Evaluation metrics

2016-07-24 Thread janardhan shetty
Hi, I was trying to evaluate k-means clustering prediction since the exact cluster numbers were provided before hand for each data point. Just tried the Error = Predicted cluster number - Given number as brute force method. What are the evaluation metrics available in Spark for K-means clustering

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
)]):T = { > if (lst.isEmpty): /// return your comparison > else { > val splits = lst.splitAt(5) > // do sometjhing about it using splits._1 > iterate(splits._2) >} > > will this help? or am i still missing something? > > kr > > &g

Frequent Item Pattern Spark ML Dataframes

2016-07-24 Thread janardhan shetty
Is there any implementation of FPGrowth and Association rules in Spark Dataframes ? We have in RDD but any pointers to Dataframes ?

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
. Similarly next 5 elements in that order until the end of number of elements. Let me know if this helps On Sun, Jul 24, 2016 at 7:45 AM, Marco Mistroni wrote: > Apologies I misinterpreted could you post two use cases? > Kr > > On 24 Jul 2016 3:41 pm, "janardhan shetty&qu

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty
Marco, Thanks for the response. It is indexed order and not ascending or descending order. On Jul 24, 2016 7:37 AM, "Marco Mistroni" wrote: > Use map values to transform to an rdd where values are sorted? > Hth > > On 24 Jul 2016 6:23 am, "janardhan shetty" wro

Locality sensitive hashing

2016-07-24 Thread janardhan shetty
I was looking through to implement locality sensitive hashing in dataframes. Any pointers for reference?

Maintaining order of pair rdd

2016-07-23 Thread janardhan shetty
I have a key,value pair rdd where value is an array of Ints. I need to maintain the order of the value in order to execute downstream modifications. How do we maintain the order of values? Ex: rdd = (id1,[5,2,3,15], Id2,[9,4,2,5]) Followup question how do we compare between one element in rdd

Re: Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
t.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Jul 22, 2016 at 4:23 PM, janardhan shetty > wrote: > > Changed to sbt.0.14.3 and it gave : > > > > [info] Packaging > > > /Users/jshetty/sparkApplica

Re: Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
need to create assembly.sbt file inside project directory if so what will the the contents of it for this config ? On Fri, Jul 22, 2016 at 5:42 AM, janardhan shetty wrote: > Is scala version also the culprit? 2.10 and 2.11.8 > > Also Can you give the steps to create sbt package command

Re: Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
ttps://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, Jul 22, 2016 at 2:08 PM, janardhan shetty > wrote: > > Hi, > > > > I was setting up my development environ

Unresolved dependencies while creating spark application Jar

2016-07-22 Thread janardhan shetty
Hi, I was setting up my development environment. Local Mac laptop setup IntelliJ IDEA 14CE Scala Sbt (Not maven) Error: $ sbt package [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :::