Re: How to include book title at "Books" section on Spark website

2016-10-09 Thread Sean Owen
I can add it, just send me the info once it's available. On Sat, Oct 8, 2016 at 7:45 PM Karim, Md. Rezaul < rezaul.ka...@insight-centre.org> wrote: > Hi, > > I am writing a book on machine learning using Spark, which is going to be > published soon. > > Could anyone tell me how to include the tit

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Jörn Franke
Cloudera 5.8 has a very old version of Hive without Tez, but Mich provided already a good alternative. However, you should check if it contains a recent version of Hbase and Phoenix. That being said, I just wonder what is the dataflow, data model and the analysis you plan to do. Maybe there are

Convert hive sql to spark sql

2016-10-09 Thread Sree Eedupuganti
Hi users i need to test the performance of the query in hive and spark. Can any one convert these sql to spark sql. Here is the sql. SELECT split(DTD.TRAN_RMKS,'/')[0] AS TRAB_RMK1, split(DTD.TRAN_RMKS,'/')[1] AS ATM_ID, DTD.ACID, G.FORACID, DTD.REF_NUM, DTD.TRAN_ID, DTD.TRAN_DATE, DTD.VALUE_DATE

Re: Convert hive sql to spark sql

2016-10-09 Thread ayan guha
have you tried in sparkit should work as it is On Sun, Oct 9, 2016 at 7:55 PM, Sree Eedupuganti wrote: > Hi users i need to test the performance of the query in hive and spark. > Can any one convert these sql to spark sql. Here is the sql. > > > SELECT split(DTD.TRAN_RMKS,'/')[0] AS TRA

Re: How to include book title at "Books" section on Spark website

2016-10-09 Thread Karim, Md. Rezaul
Hi Owen, Thanks so much for the quick response. The book is already available online as an Alpha. It would be great and appreciated if you could add the title to the Spark website. Here's the related information about the book: *Title: *Large Scale Machine Learning with Spark *Author:* Md. Rezau

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Jörn Franke
please keep also in mind that Tableau Server has the capabilities to store data in-memory and refresh only when needed the in-memory data. This means you can import it from any source and let your users work only on the in-memory data in Tableau Server. On Sun, Oct 9, 2016 at 9:22 AM, Jörn Franke

Re: Convert hive sql to spark sql

2016-10-09 Thread Mich Talebzadeh
Ayan is correct. In Spark < 2 you can do val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> var sqltext = | """ | select count(1) from prices | """ sqltext: String = " select count(1) from prices " scala> HiveContext.sql(sqltext).show ++ |count(1)| +

java: see logging output in in UI

2016-10-09 Thread Miro Karpis
Hi, please, I would like to see my debug/error info logged in the spark web ui. Problem is that in my current setup running master locally (and connecting my pc as a worker node) I can see output in my console (my debug info) but not in the stderr. I have tried different setups, Logger, RootLogge

Re: Kafka 0.10 integ offset commit

2016-10-09 Thread Srikanth
I'll probably add this behavior. It's a good balance between not having to rely on another external system just for offset management and reducing duplicates. I was more worried about the underlying framework using the consumer in parallel. Will watch out for concurrent mod exp. BTW, the commitQue

Re: Spark SQL Thriftserver with HBase

2016-10-09 Thread Benjamin Kim
Thanks for all the suggestions. It would seem you guys are right about the Tableau side of things. The reports don’t need to be real-time, and they won’t be directly feeding off of the main DMP HBase data. Instead, it’ll be batched to Parquet or Kudu/Impala or even PostgreSQL. I originally thou

Re: Kafka 0.10 integ offset commit

2016-10-09 Thread Cody Koeninger
That's cool, just be aware that all you're affecting is the time between commits, not overall correctness. Good call on the iterator not draining the queue, I'll fix that. On Sun, Oct 9, 2016 at 12:22 PM, Srikanth wrote: > I'll probably add this behavior. It's a good balance between not having t

when does a Row object have a schema

2016-10-09 Thread Koert Kuipers
the Spark-SQL Row trait has a schema that by default is null. when the schema is null operations that rely on fieldIndex such as getAs[T](fieldName: String): T do not work. i noticed that when i convert a DataFrame to Rdd[Row] that the Row objects do have schemas. can i rely on this? when can i b

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-09 Thread Chin Wei Low
Hi Ishizaki san, Thanks for the reply. So, when I pre-cache the dataframe, the cache is being used during the job execution. Actually there are 3 events: 1. call res.collect 2. job started 3. job completed I am concerning about the longer time taken between 1st and 2nd events. It seems like the

This Exception has been really hard to trace

2016-10-09 Thread kant kodali
I tried SpanBy but look like there is a strange error that happening no matter which way I try. Like the one here described for Java solution. http://qaoverflow.com/question/how-to-use-spanby-in-java/ java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$Serializ

Re: This Exception has been really hard to trace

2016-10-09 Thread Reynold Xin
You should probably check with DataStax who build the Cassandra connector for Spark. On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote: > > I tried SpanBy but look like there is a strange error that happening no > matter which way I try. Like the one here described for Java solution. > > http:/

Re: This Exception has been really hard to trace

2016-10-09 Thread kant kodali
Hi Reynold, Actually, I did that a well before posting my question here. Thanks,kant On Sun, Oct 9, 2016 8:48 PM, Reynold Xin r...@databricks.com wrote: You should probably check with DataStax who build the Cassandra connector for Spark. On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote:

Re: when does a Row object have a schema

2016-10-09 Thread Divya Gehlot
A value of a row can be accessed through both generic access by ordinal, which will incur boxing overhead for primitives, as well as native primitive access. An example of generic access by ordinal: import org.apache.spark.sql._ val row = Row(1, true, "a string", null) // row: Row = [1,true,a

SPARK-17845 - window function frame boundary API

2016-10-09 Thread Reynold Xin
Hi all, I tried to use the window function DataFrame API this weekend and found it awkward to use, especially with respect to specifying frame boundaries. I wrote down some options here and am curious your thoughts. If you have suggestions on the API beyond what's already listed in the JIRA ticket

Re: SPARK-17845 - window function frame boundary API

2016-10-09 Thread ayan guha
Hi Reynold Thanks for asking. I am from sql world and use sparl sql with analytical functions prety heavily. IMHO, Window.rowsBetween() as a function name looks fine. What i would propose would be: Window.rowsBetween(startFrom=UNBOUNDED,endTo=CURRENT_ROW,preceeding=0,following=0) startFrom, en

Re: Map with state keys serialization

2016-10-09 Thread Shixiong(Ryan) Zhu
You can use Kryo. It also implements KryoSerializable which is supported by Kryo. On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria wrote: > Looking at the source code for StateMap[1], which is used by > JavaPairDStream#mapWithState(), it looks like state keys are > serialized using an ObjectOutp

Re: This Exception has been really hard to trace

2016-10-09 Thread Shixiong(Ryan) Zhu
Seems the runtime Spark is different from the compiled one. You should mark the Spark components "provided". See https://issues.apache.org/jira/browse/SPARK-9219 On Sun, Oct 9, 2016 at 8:13 PM, kant kodali wrote: > > I tried SpanBy but look like there is a strange error that happening no > matt