Re: best spark spatial lib?

2017-10-10 Thread Ram Sriharsha
why can't you do this in Magellan? Can you post a sample query that you are trying to run that has spatial and logical operators combined? Maybe I am not understanding the issue properly Ram On Tue, Oct 10, 2017 at 2:21 AM, Imran Rajjad wrote: > I need to have a location column inside my Datafr

Re: cannot cast to double from spark row

2017-09-14 Thread Ram Sriharsha
; 1. row.getAs[Double](Constants.Datapoint.Latitude) > > 2. row.getAs[String](Constants.Datapoint.Latitude).toDouble > > I dont want to use row.getDouble(0) as position of column in file keeps on > change. > > Thanks, > Asmath > -- Ram Sriharsha Product Manager, Apache

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
. but if it cannot for some reason, we can have a check in OneVsRest that doesn't train that classifier On Tue, Jan 26, 2016 at 4:33 PM, Ram Sriharsha wrote: > Hey David > > In your scenario, OneVsRest is training a classifier for 1 vs not 1... and > the input dataset for fit (or t

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
m split ><https://gist.github.com/junglebarry/6073aa474d89f3322063>. Only >exceptions in 2/3 of cases, due to randomness. > > If these look good as test cases, I'll take a look at filing JIRAs and > getting patches tomorrow morning. It's late here! > > T

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
to be thrown in the case the training dataset is missing the rare class. could you reproduce this in a simple snippet of code that we can quickly test on the shell? On Tue, Jan 26, 2016 at 3:02 PM, Ram Sriharsha wrote: > Hey David, Yeah absolutely!, feel free to create a JIRA and attach your

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-26 Thread Ram Sriharsha
point (in `transform`) and attaches it to the column. This way, I'd hope >> that even once TrainValidationSplit returns a subset dataframe - which >> may not contain all labels - the metadata on the column should still >> contain all labels. >> >> Does my use of Strin

Re: MLlib OneVsRest causing intermittent exceptions

2016-01-25 Thread Ram Sriharsha
y to look into patching the code, but I first wanted to confirm > that the problem was real, and that I wasn't somehow misunderstanding how I > should be using OneVsRest. > > Any guidance would be appreciated - I'm new to the list. > > Many thanks, > David > -- Ra

Re: XML Parsing

2015-07-19 Thread Ram Sriharsha
You would need to write an Xml Input Format that can parse XML into lines based on start/end tags Mahout has a XMLInputFormat implementation you should be able to import: https://github.com/apache/mahout/blob/master/integration/src/main/java/org/apache/mahout/text/wikipedia/XmlInputFormat.java Onc

Re: Examples of flatMap in dataFrame

2015-06-07 Thread Ram Sriharsha
Hi You are looking for the explode method (in Dataframe API starting 1.3 I believe) https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1002 Ram On Sun, Jun 7, 2015 at 9:22 PM, Dimp Bhat wrote: > Hi, > I'm trying to write a custom transform

Re: Embedding your own transformer in Spark.ml Pipleline

2015-06-02 Thread Ram Sriharsha
Hi We are in the process of adding examples for feature transformations ( https://issues.apache.org/jira/browse/SPARK-7546) and this should be available shortly on Spark Master. In the meanwhile, the best place to start would be to look at how the Tokenizer works here: https://github.com/apache/sp

Re: Doubts about SparkSQL

2015-05-23 Thread Ram Sriharsha
Yes it does ... you can try out the following example (the People dataset that comes with Spark). There is an inner query that filters on age and an outer query that filters on name. The physical plan applies a single composite filter on name and age as you can see below sqlContext.sql("select * f

Re: Query a Dataframe in rdd.map()

2015-05-21 Thread Ram Sriharsha
21, 2015 at 10:54 AM, Ram Sriharsha wrote: > Your original code snippet seems incomplete and there isn't enough > information to figure out what problem you actually ran into > > from your original code snippet there is an rdd variable which is well > defined and a df variable

Re: Query a Dataframe in rdd.map()

2015-05-21 Thread Ram Sriharsha
Your original code snippet seems incomplete and there isn't enough information to figure out what problem you actually ran into from your original code snippet there is an rdd variable which is well defined and a df variable that is not defined in the snippet of code you sent one way to make thi

Re: DataFrame Column Alias problem

2015-05-21 Thread Ram Sriharsha
df.groupBy($"col1").agg(count($"col1").as("c")).show On Thu, May 21, 2015 at 3:09 AM, SLiZn Liu wrote: > Hi Spark Users Group, > > I’m doing groupby operations on my DataFrame *df* as following, to get > count for each value of col1: > > > df.groupBy("col1").agg("col1" -> "count").show // I don'

Re: Decision tree: categorical variables

2015-05-19 Thread Ram Sriharsha
Hi Keerthi As Xiangrui mentioned in the reply, the categorical variables are assumed to be encoded as integers between 0 and k - 1, if k is the parameter you are passing as the category info map. So you will need to handle this during parsing (your columns 3 and 6 need to be converted into ints in

Re: InferredSchema Example in Spark-SQL

2015-05-17 Thread Ram Sriharsha
Int)).toDF() >> >> >> On Sun, May 17, 2015 at 5:41 PM, Cheng, Hao wrote: >> >>> Typo? Should be .toDF(), not .toRD() >>> >>> >>> >>> *From:* Ram Sriharsha [mailto:sriharsha@gmail.com] >>> *Sent:* Monday, May

Re: InferredSchema Example in Spark-SQL

2015-05-17 Thread Ram Sriharsha
you mean toDF() ? (toDF converts the RDD to a DataFrame, in this case inferring schema from the case class) On Sun, May 17, 2015 at 5:07 PM, Rajdeep Dua wrote: > Hi All, > Was trying the Inferred Schema spart example > http://spark.apache.org/docs/latest/sql-programming-guide.html#overview > >

Re: Getting the best parameter set back from CrossValidatorModel

2015-05-16 Thread Ram Sriharsha
Hi Justin The CrossValidatorExample here https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/CrossValidatorExample.scala is a good example of how to set up an ML Pipeline for extracting a model with the best parameter set. You set up the pipeline as in

Re: Using sc.HadoopConfiguration in Python

2015-05-14 Thread Ram Sriharsha
Jo > > Thanks for the reply, but _jsc does not have anything to pass hadoop > configs. can you illustrate your answer a bit more? TIA... > > On Wed, May 13, 2015 at 12:08 AM, Ram Sriharsha > wrote: > >> yes, the SparkContext in the Python API has a reference to the &g

Re: Using sc.HadoopConfiguration in Python

2015-05-12 Thread Ram Sriharsha
yes, the SparkContext in the Python API has a reference to the JavaSparkContext (jsc) https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext through which you can access the hadoop configuration On Tue, May 12, 2015 at 6:39 AM, ayan guha wrote: > Hi > > I found this m