RE: Spark Hive Rejection

2016-10-03 Thread Mostafa Alaa Mohamed
No, Sometime when you have table with column int and you insert in this column string job will be fail some times Best Regards, Mostafa Alaa Mohamed, Technical Expert Big Data, M: +971506450787 Email: mohamedamost...@etisalat.ae From: Michael Segel [mailto:ms

Re: Prototype Implementation of Hierarchical Clustering on Spark

2016-10-03 Thread pcandido
Hello, May you tell me how you applied the MapReduce on Bisecting K-Means? I know how the classical BKM works, but how did you parallelize the processing? All leaf nodes are divided at same time? If no, How? If yes, how do you handle the last nodes? Dividing every leaf node by iteration, you always

Re: access spark thrift server from another spark session

2016-10-03 Thread ayan guha
I do not think you can see temp tables from other application, like thrift. You need save the tables in hive and then they will be visible through thrift. Thrift uses hive metastore. However temp tables do not make to central metastore until saved. On 4 Oct 2016 11:44, "Takeshi Yamamuro" wrote: >

Re: access spark thrift server from another spark session

2016-10-03 Thread Takeshi Yamamuro
-dev +user Hi, Have you try to share a session by `spark.sql.hive.thriftServer.singleSession`? // maropu On Tue, Oct 4, 2016 at 6:10 AM, Herman Yu wrote: > > I built spark data frame/dataset on top of several hive tables, and then > registered dataframe/dataset as temporary tables, as well as

Executor Lost error

2016-10-03 Thread Punit Naik
Hi All I am trying to run a program for a large dataset (~ 1TB). I have already tested the code for low size of data and it works fine. But what I noticed is that he job fails if the size of input is large. It was giving me errors regarding akkka actor disassociation which I fixed by increasing th

Re: ML - MulticlassClassificationEvaluator How to get metrics for each class

2016-10-03 Thread Nirav Patel
I think its via using MulticlassMetrics class. Just found it. Thanks On Mon, Oct 3, 2016 at 3:31 PM, Nirav Patel wrote: > I see that in scikit library if you specify 'Non' or nothing for 'average' > parameter it returns metrics for each classes. How to get this in ML > library? > http://scikit-

ML - MulticlassClassificationEvaluator How to get metrics for each class

2016-10-03 Thread Nirav Patel
I see that in scikit library if you specify 'Non' or nothing for 'average' parameter it returns metrics for each classes. How to get this in ML library? http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html Current weighted metrics does help to see overall picture b

MulticlassClassificationEvaluator how weighted precision and weighted recall calculated

2016-10-03 Thread Nirav Patel
For example 3 class would it be? weightedPrecision = ( TP1 * w1 + TP2 * w2 + TP3 * w3) / ( TP1 * w1 + TP2 * w2 + TP3 * w3) + ( FP1 * w1 + FP2 * w2 + FP3 * w3) where TP1..2 are TP for each class. w1, w2.. are wight for each class based on their distribution in sample data? and similar for recall

Re: Deep learning libraries for scala

2016-10-03 Thread janardhan shetty
Thanks Ben. The current spark ML package has feed forward multilayer perceptron algorithm as well and just wondering how different is your implementation ? https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier On Mon, Oct 3, 2016 at 1:40 PM, Benjam

Re: Deep learning libraries for scala

2016-10-03 Thread Benjamin Kim
I got this email a while back in regards to this. Dear Spark users and developers, I have released version 1.0.0 of scalable-deeplearning package. This package is based on the implementation of artificial neural networks in Spark ML. It is intended for new Spark deep learning features that wer

Re: Deep learning libraries for scala

2016-10-03 Thread janardhan shetty
Any leads in this regard ? On Sat, Oct 1, 2016 at 1:48 PM, janardhan shetty wrote: > Apparently there are no Neural network implementations in tensorframes > which we can use right ? or Am I missing something here. > > I would like to apply neural networks for an NLP settting is there are any >

Data Format for Running Collaborative Filtering in Spark MLlib

2016-10-03 Thread Baktaawar
Hi I am working on building a recommender system on a learning content data. My data format is a user-item matrix of views. Similar to the below one NS

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
Hi *be*njamin, How stable is Kudu? Is it production ready? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordp

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
If you’re interested, here is the link to the development page for Kudu. It has the Spark code snippets using DataFrames. http://kudu.apache.org/docs/developing.html Cheers, Ben > On Oct 3, 2016, at 9:56 AM, ayan guha wrote: > > That sounds inter

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
That sounds interesting, would love to learn more about it. Mitch: looks good. Lastly I would suggest you to think if you really need multiple column families. On 4 Oct 2016 02:57, "Benjamin Kim" wrote: > Lately, I’ve been experimenting with Kudu. It has been a much better > experience than with

Pros and cons of using different persistence layers for Spark

2016-10-03 Thread Ashok Kumar
What are the pros and cons of using different persistence layers for Spark, such as S3,Cassandra, and HDFS? Thanks

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Benjamin Kim
Lately, I’ve been experimenting with Kudu. It has been a much better experience than with HBase. Using it is much simpler, even from spark-shell. spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0 It’s like going back to rudimentary DB systems where tables have just a primary key and

Re: statistical theory behind estimating the number of total tasks in GroupedSumEvaluator.scala

2016-10-03 Thread Sean Owen
FWIW I think there are in any event several small problems with these classes; I'm tracking it here and have a change almost ready: https://issues.apache.org/jira/browse/SPARK-17768 On Mon, Oct 3, 2016 at 9:39 AM, Sean Owen wrote: > +Matei for question about the source of this bit of code > > Th

Document listing spark sql aggregate functions

2016-10-03 Thread Ashish Tadose
Hi Team, Is there a documentation page which lists all the aggregation functions supported in Spark sql query language. Same as listed DataFrame aggregate functions as below https://spark.apache.org/docs/1.6.2/api/scala/index.html#org.apache.spark.sql.functions$ I was looking for spark sql query

Spark_Jdbc_Hive

2016-10-03 Thread Ajay Chander
Hi Everyone, First of all let me explain you what I am trying to do and I apologize for writing a lengthy mail. 1) Pragmatically connect to remote secured(Kerberized) Hadoop cluster(CDH 5.7) from my local machine. - Once connected, I want to read the data from remote Hive table into Spark

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
with ticker+date I can c reate something like below for row key TSCO_1-Apr-08 or TSCO1-Apr-08 if I understood you correctly Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
Hi Looks like you are saving to new.csv but still loading tsco.csv? Its definitely the header. Suggestion: ticker+date as row key has following benefits: 1. using ticker+date as row key will enable you to hold multiple ticker in this single hbase table. (Think composite primary key) 2. Using dat

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
Hi Ayan, Sounds like the row key has to be unique much like a primary key in RDBMS This is what I download as a csv for stock from Google Finance Date Open High Low Close Volume 27-Sep-16 177.4 177.75 172.5 177.75 24117196 So What I do I add the stock and ticker myself to end of the row via

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
I am not well versed with importtsv, but you can create a CSV file using a simple spark program to create first column as ticker+tradedate. I remember doing similar manipulation to create row key format in pig. On 3 Oct 2016 20:40, "Mich Talebzadeh" wrote: > Thanks Ayan, > > How do you specify ti

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
Thanks Ayan, How do you specify ticker+rtrade as row key in the below hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stoc

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread ayan guha
Hi Mitch It is more to do with hbase than spark. Row key can be anything, yes but essentially what you are doing is insert and update tesco PLC row. Given your schema, ticker+trade date seems to be a good row key On 3 Oct 2016 18:25, "Mich Talebzadeh" wrote: > thanks again. > > I added that jar

Re: statistical theory behind estimating the number of total tasks in GroupedSumEvaluator.scala

2016-10-03 Thread Sean Owen
+Matei for question about the source of this bit of code That's a good question; I remember wondering about this once upon a time. First, GroupedSumEvaluator and GroupedMeanEvaluator look like dead code at this point. GroupedCountEvaluator is still used. MeanEvaluator is a better example, becaus

unsubsribe

2016-10-03 Thread asukhenko
unsubscribe

Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error

2016-10-03 Thread Mich Talebzadeh
thanks again. I added that jar file to the classpath and that part worked. I was using spark shell so I have to use spark-submit for it to be able to interact with map-reduce job. BTW when I use the command line utility ImportTsv to load a file into Hbase with the following table format descri

Re: Filtering in SparkR

2016-10-03 Thread Deepak Sharma
Hi Yogesh You can try registering these 2 DFs as temporary table and then execute the sql query. df1.registerTempTable("df1") df2.registerTempTable("df2") val rs = sqlContext.sql("SELECT a.* FROM df1 a, df2 b where a.id != b.id) Thanks Deepak On Mon, Oct 3, 2016 at 12:38 PM, Yogesh Vyas wrote:

Filtering in SparkR

2016-10-03 Thread Yogesh Vyas
Hi, I have two SparkDataFrames, df1 and df2. There schemas are as follows: df1=>SparkDataFrame[id:double, c1:string, c2:string] df2=>SparkDataFrame[id:double, c3:string, c4:string] I want to filter out rows from df1 where df1$id does not match df2$id I tried some expression: filter(df1,!(df1$id