alternatives for long to longwritable typecasting in spark sql

2017-01-30 Thread Alex
Hi Guys Please let me know if any other ways to typecast as below is throwing error unable to typecast java.lang Long to Longwritable and same for Double for Text also in spark -sql Below piece of code is from hive udf which i am trying to run in spark-sql public Object get(Object name) {

does both below code do the same thing? I had to refactor code to fit in spark-sql

2017-01-30 Thread Alex
public Object get(Object name) { int pos = getPos((String) name); if (pos < 0) return null; String f = "string"; Object obj = list.get(pos); Object result = null; if (obj == null)

Spark 2.1.0 and Shapeless

2017-01-30 Thread Timothy Chan
I'm using a library, https://github.com/guardian/scanamo, that uses shapeless 2.3.2. What are my options if I want to use this with Spark 2.1.0? Based on this: http://apache-spark-developers-list.1001551.n3.nabble.com/shapeless-in-spark-2-1-0-tt20392.html I'm guessing I would have to release my o

RDD unpersisted still showing in my Storage tab UI

2017-01-30 Thread Saulo Ricci
Hi, I have a spark streaming application and basically in the end of each batch processing I call the method unpersist for the batch's RDD. But I've noticed the RDD's for all past batches are still showing on my Spark's UI Storage table. Shouldn't I expect to never see those RDD's again in my Sto

Re: Tableau BI on Spark SQL

2017-01-30 Thread Todd Nist
Hi Mich, You could look at http://www.exasol.com/. It works very well with Tableau without the need to extract the data. Also in V6, it has the virtual schemas which would allow you to access data in Spark, Hive, Oracle, or other sources. May be outside of what you are looking for, it works wel

Cosine Similarity Implementation in Spark

2017-01-30 Thread Manish Tripathi
I have a data frame which has two columns (id, vector (tf-idf)). The first column signifies the Id of the document while the second column is a Vector(tf-idf) values. I want to use DIMSUM for cosine similarity but unfortunately I have Spark 1.x and looks like these methods are implemented only in

graphframes stateful motif

2017-01-30 Thread geoHeil
Starting out with graph frames I would like to understand stateful motifs better. There is a nice example in the documentation. How can I explicitly return the counts? How could it be extended to count - the friends of each vertex with age > 30 - the percentage of friendsGreater30 / allFriends

Re: Examples in graphx

2017-01-30 Thread Ankur Srivastava
The one issue with using Neo4j is that you need to persist the whole graph on one single machine i.e you can not shard the graph. I am not sure what is the size of your graph but if it is huge one way to shard could be to use the Component Id to shard. You can generate Component Id by running Conne

Re: mapWithState question

2017-01-30 Thread Cody Koeninger
Keep an eye on https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging although it'll likely be a while On Mon, Jan 30, 2017 at 3:41 PM, Tathagata Das wrote: > If you care about the semantics of those writes to Kafka, then you should be > awa

Re: mapWithState question

2017-01-30 Thread shyla deshpande
Thanks. Appreciate your input. On Mon, Jan 30, 2017 at 1:41 PM, Tathagata Das wrote: > If you care about the semantics of those writes to Kafka, then you should > be aware of two things. > 1. There are no transactional writes to Kafka. > 2. So, when tasks get reexecuted due to any failure, your

Re: kafka structured streaming source refuses to read

2017-01-30 Thread Michael Armbrust
Thanks for for following up! I've linked the relevant tickets to SPARK-18057 and I targeted it for Spark 2.2. On Sat, Jan 28, 2017 at 10:15 AM, Koert Kuipers wrote: > there was also already an existing spark ticket for this: > SPARK-18779

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
With a lot of data (TB) it is not that good, hence the extraction. Otherwise you have to wait every time you do drag and drop. With the extracts it is better. > On 30 Jan 2017, at 22:59, Mich Talebzadeh wrote: > > Thanks Jorn, > > So Tableau uses its own in-memory representation as I guessed.

Re: Tableau BI on Spark SQL

2017-01-30 Thread Mich Talebzadeh
Thanks Jorn, So Tableau uses its own in-memory representation as I guessed. Now the question is how is performance accessing data in Oracle tables> Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Tableau BI on Spark SQL

2017-01-30 Thread Jörn Franke
Depending on the size of the data i recommend to schedule regularly an extract in tableau. There tableau converts it to an internal in-memory representation outside of Spark (can also exist on disk if memory is too small) and then use it within Tableau. Accessing directly the database is not so

Re: mapWithState question

2017-01-30 Thread Tathagata Das
If you care about the semantics of those writes to Kafka, then you should be aware of two things. 1. There are no transactional writes to Kafka. 2. So, when tasks get reexecuted due to any failure, your mapping function will also be reexecuted, and the writes to kafka can happen multiple times. So

Tableau BI on Spark SQL

2017-01-30 Thread Mich Talebzadeh
Hi, Has anyone tried using Tableau on Spark SQL? Specifically how does Tableau handle in-memory capabilities of Spark. As I understand Tableau uses its own propriety SQL against say Oracle. That is well established. So for each product Tableau will try to use its own version of SQL against that

Saving parquet file in Spark giving error when Encryption at Rest is implemented

2017-01-30 Thread morfious902002
We are using spark 1.6.1 on a CDH 5.5 cluster. The job worked fine with Kerberos but when we implemented Encryption at Rest we ran into the following issue:- Df.write().mode(SaveMode.Append).partitionBy("Partition").parquet(path); I have already tried setting these values with no success :-

Re: Dynamic resource allocation to Spark on Mesos

2017-01-30 Thread Michael Gummelt
On Mon, Jan 30, 2017 at 9:47 AM, Ji Yan wrote: > Tasks begin scheduling as soon as the first executor comes up > > > Thanks all for the clarification. Is this the default behavior of Spark on > Mesos today? I think this is what we are looking for because sometimes a > job can take up lots of reso

Pyspark 2.1.0 weird behavior with repartition

2017-01-30 Thread Blaž Šnuderl
I am loading a simple text file using pyspark. Repartitioning it seems to produce garbage data. I got this results using spark 2.1 prebuilt for hadoop 2.7 using pyspark shell. >>> sc.textFile("outc").collect() [u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', u'i', u'j', u'k', u'l'] >>> sc.textFil

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
Hi All, If I modify the code to below The hive UDF is working in spark-sql but it is giving different results..Please let me know difference between these two below codes.. 1) public Object get(Object name) { int pos = getPos((String)name); if(pos<0) return null; Str

Re: mapWithState question

2017-01-30 Thread shyla deshpande
Hello, TD, your suggestion works great. Thanks I have 1 more question, I need to write to kafka from within the mapWithState function. Just wanted to check if this a bad pattern in any way. Thank you. On Sat, Jan 28, 2017 at 9:14 AM, shyla deshpande wrote: > Thats a great idea. I will try

Re: Dynamic resource allocation to Spark on Mesos

2017-01-30 Thread Ji Yan
> > Tasks begin scheduling as soon as the first executor comes up Thanks all for the clarification. Is this the default behavior of Spark on Mesos today? I think this is what we are looking for because sometimes a job can take up lots of resources and later jobs could not get all the resources th

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
How to debug Hive UDfs?! On Jan 24, 2017 5:29 PM, "Sirisha Cheruvu" wrote: > Hi Team, > > I am trying to keep below code in get method and calling that get mthod in > another hive UDF > and running the hive UDF using Hive Context.sql procedure.. > > > switch (f) { > case "double" : return (

Re: how to compare two avro format hive tables

2017-01-30 Thread Deepak Sharma
You can use spark testing base's rdd comparators. Create 2 different dataframes from these 2 hive tables. Convert them to rdd and use spark-testing-base compareRDD. Here is an example for rdd comparison: https://github.com/holdenk/spark-testing-base/wiki/RDDComparisons On Mon, Jan 30, 2017 at 9:

how to compare two avro format hive tables

2017-01-30 Thread Alex
Hi Team, how to compare two avro format hive tables if there is same data in it if i give limit 5 its giving different results

Re: userClassPathFirst=true prevents SparkContext to be initialized

2017-01-30 Thread Koert Kuipers
i dont know why this is happening but i have given up on userClassPath=first. i have seen many weird errors with it and consider it broken. On Jan 30, 2017 05:24, "Roberto Coluccio" wrote: Hello folks, I'm trying to work around an issue with some dependencies by trying to specify at spark-submi

Reason behind mapping of StringType with CLOB nullType

2017-01-30 Thread Amiya Mishra
Hi, I am new to spark-sql. I am getting below mapping details in JdbcUtils.scala as: *case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))* in line number 125. which says SringType will map with Jdbc database type as "TEXT" having jdbc null type as CLOB , which internally takes 2005

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error: java.lang.Double cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable] Getting below error while running hive UDF on spark but the UDF is working perfectly fine in Hive.. public Object get(Object name) {

Re: DAG Visualization option is missing on Spark Web UI

2017-01-30 Thread Md. Rezaul Karim
Hi Mark, That worked for me! Thanks a million. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <

Pruning decision tree in Spark

2017-01-30 Thread Md. Rezaul Karim
Hi there, Say, I have a deeper tree that needs to be pruned to create an optimal tree. For example, in R it can be done using *rpart/prune *function. Is it possible to prune a* Spark MLlib/ML-based decision tree* while performing a classification or regression task? Regards, _

userClassPathFirst=true prevents SparkContext to be initialized

2017-01-30 Thread Roberto Coluccio
Hello folks, I'm trying to work around an issue with some dependencies by trying to specify at spark-submit time that I want my (user) classpath to be resolved and taken into account first (against the jars received through the System Classpath, which is /data/cloudera/parcels/CDH/jars/). In orde

ML model to associate search terms with objects and reranking them

2017-01-30 Thread dilip.srid...@unvired.com
Dear Spark ML Community, Is there an ML model to associate 'search terms' with objects (articles, etc.). I have considered PIO Text Classification and Universal recommendation variants. But these mainly help categorise or find related items and do not allow to associate 'search term' with an objec

Re: Having multiple spark context

2017-01-30 Thread Rohit Verma
Two ways, 1. There is an experimental support for this. Read at https://issues.apache.org/jira/browse/SPARK-2243. Afraid you might need to build spark from source code. 2. Use middleware. Deploy two apps separately communicating with your app over messaging/rest. Regards Rohit On Jan 30, 2017

RE: Having multiple spark context

2017-01-30 Thread jasbir.sing
Is there any way in which my application can connect to multiple Spark Clusters? Or is communication between Spark clusters possible? Regards, Jasbir From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Monday, January 30, 2017 1:33 PM To: vincent gromakowski Cc: Rohit Verma ; user@spa

Re: Having multiple spark context

2017-01-30 Thread Mich Talebzadeh
in general in a single JVM which is basically running in Local mode, you have only one Spark Context. However, you can stop the current Spark Context by sc.stop() HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw