How do I read data in dockerized kafka from a spark streaming application

2017-01-06 Thread shyla deshpande
My kafka is in a docker container. How do I read this Kafka data in my Spark streaming app. Also, I need to write data from Spark Streaming to Cassandra database which is in docker container. I appreciate any help. Thanks.

Re: Approach: Incremental data load from HBASE

2017-01-06 Thread Chetan Khatri
Hi Ayan, I mean by Incremental load from HBase, weekly running batch jobs takes rows from HBase table and dump it out to Hive. Now when next i run Job it only takes newly arrived jobs. Same as if we use Sqoop for incremental load from RDBMS to Hive with below command, sqoop job --create myssb1 -

CompileException with Maps in Spark 2.1.0

2017-01-06 Thread Nils Grabbert
Hi all, the following code will run with Spark 2.0.2 but not with Spark 2.1.0: // case class Data(id: Int, param: Map[String, InnerData]) case class InnerData(name: String, value: Int) import spark.implicits._ val e= Data(1, Map("key" -> InnerData("name", 123))) val data = Seq(e) val d=

CompileException with Maps in Spark 2.1.0

2017-01-06 Thread nils.grabbert
Hi all, the following code will run with Spark 2.0.2 but not with Spark 2.1.0: // case class Data(id: Int, param: Map[String, InnerData]) case class InnerData(name: String, value: Int) import spark.implicits._ val e= Data(1, Map("key" -> InnerData("name", 123))) val data = Seq(e) val

Re: Approach: Incremental data load from HBASE

2017-01-06 Thread ayan guha
IMHO you should not "think" HBase in RDMBS terms, but you can use ColumnFilters to filter out new records On Fri, Jan 6, 2017 at 7:22 PM, Chetan Khatri wrote: > Hi Ayan, > > I mean by Incremental load from HBase, weekly running batch jobs takes > rows from HBase table and dump it out to Hive. No

Re: Approach: Incremental data load from HBASE

2017-01-06 Thread Chetan Khatri
Ayan, Thanks Correct I am not thinking RDBMS terms, i am wearing NoSQL glasses ! On Fri, Jan 6, 2017 at 3:23 PM, ayan guha wrote: > IMHO you should not "think" HBase in RDMBS terms, but you can use > ColumnFilters to filter out new records > > On Fri, Jan 6, 2017 at 7:22 PM, Chetan Khatri > wr

Kafka 0.8 + Spark 2.0 Partition Issue

2017-01-06 Thread Raghu Vadapalli
My spark 2.0 + kafka 0.8 streaming job fails with error partition leaderset exception. When I check the kafka topic the partition, it is indeed in error with Leader = -1 and empty ISR. I did lot of google and all of them point to either restarting or deleting the topic. To do any of those

Re: Kafka 0.8 + Spark 2.0 Partition Issue

2017-01-06 Thread Cody Koeninger
Kafka is designed to only allow reads from leaders. You need to fix this at the kafka level not the spark level. On Fri, Jan 6, 2017 at 7:33 AM, Raghu Vadapalli wrote: > > My spark 2.0 + kafka 0.8 streaming job fails with error partition leaderset > exception. When I check the kafka topic the p

Re: Spark Read from Google store and save in AWS s3

2017-01-06 Thread Steve Loughran
On 5 Jan 2017, at 20:07, Manohar Reddy mailto:manohar.re...@happiestminds.com>> wrote: Hi Steve, Thanks for the reply and below is follow-up help needed from you. Do you mean we can set up two native file system to single sparkcontext ,so then based on urls prefix( gs://bucket/path and dest s3a

Re: Spark GraphFrame ConnectedComponents

2017-01-06 Thread Steve Loughran
On 5 Jan 2017, at 21:10, Ankur Srivastava mailto:ankur.srivast...@gmail.com>> wrote: Yes I did try it out and it choses the local file system as my checkpoint location starts with s3n:// I am not sure how can I make it load the S3FileSystem. set fs.default.name to s3n://whatever , or, in spar

Spark SQL 1.6.3 ORDER BY and partitions

2017-01-06 Thread Joseph Naegele
I have two separate but similar issues that I've narrowed down to a pretty good level of detail. I'm using Spark 1.6.3, particularly Spark SQL. I'm concerned with a single dataset for now, although the details apply to other, larger datasets. I'll call it "table". It's around 160 M records, ave