Re: SparkAppHandle.Listener.infoChanged behaviour

2017-06-05 Thread Mohammad Tariq
e/mti [image: http://] <http://about.me/mti> On Mon, Jun 5, 2017 at 7:24 AM, Marcelo Vanzin wrote: > On Sat, Jun 3, 2017 at 7:16 PM, Mohammad Tariq wrote: > > I am having a bit of difficulty in understanding the exact behaviour of > > SparkAppHandle.Listener.infoCha

SparkAppHandle.Listener.infoChanged behaviour

2017-06-03 Thread Mohammad Tariq
Dear fellow Spark users, I am having a bit of difficulty in understanding the exact behaviour of *SparkAppHandle.Listener.infoChanged(SparkAppHandle handle)* method. The documentation says : *Callback for changes in any information that is not the handle's state.* What exactly is meant by *any i

Application not found in RM

2017-04-17 Thread Mohammad Tariq
Dear fellow Spark users, *Use case :* I have written a small java client which launches multiple Spark jobs through *SparkLauncher* and captures jobs' metrics during the course of the execution. *Issue :* Sometimes the client fails saying - *Caused by: org.apache.hadoop.ipc.RemoteException(org.ap

Intermittent issue while running Spark job through SparkLauncher

2017-03-25 Thread Mohammad Tariq
Dear fellow Spark users, I have a multithreaded Java program which launches multiple Spark jobs in parallel through *SparkLauncher* API. It also monitors these Spark jobs and keeps on updating information like job start/end time, current state, tracking url etc in an audit table. To get these info

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Mohammad Tariq
, Mohammad [image: https://]about.me/mti <https://about.me/mti?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=chrome_ext> [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Wed, Mar 1, 2017 at 12:27 AM, Mohammad Ta

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Mohammad Tariq
Hi Adaryl, You could definitely load data into a warehouse through Spark's JDBC support through DataFrames. Could you please explain your use case a bit more? That'll help us in answering your query better. [image: --] Tariq, Mohammad [image: https://]about.me/mti

Re: Need guidelines in Spark Streaming and Kafka integration

2016-11-16 Thread Mohammad Tariq
Hi Karim, Are you looking for something specific? Some information about your usecase would be really helpful in order to answer your question. On Wednesday, November 16, 2016, Karim, Md. Rezaul < rezaul.ka...@insight-centre.org> wrote: > Hi All, > > I am completely new with Kafka. I was wonder

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
unit tests: > https://github.com/apache/spark/blob/a8ea4da8d04c1ed621a96668118f20 > 739145edd2/yarn/src/test/scala/org/apache/spark/deploy/ > yarn/YarnClusterSuite.scala#L164 > > > On Thu, Nov 10, 2016 at 3:00 PM, Mohammad Tariq > wrote: > >> All I want to do is submit a job,

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
ti?promo=email_sig&utm_source=email_sig&utm_medium=external_link&utm_campaign=chrome_ext> [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Fri, Nov 11, 2016 at 4:27 AM, Mohammad Tariq wrote: > Yeah, that definitely makes sense. I

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
Nov 11, 2016 at 4:19 AM, Marcelo Vanzin wrote: > On Thu, Nov 10, 2016 at 2:43 PM, Mohammad Tariq > wrote: > > @Override > > public void stateChanged(SparkAppHandle handle) { > > System.out.println("Spark App Id [" + handle.getAppId() + "]. S

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
at 5:16 AM, Marcelo Vanzin wrote: > Then you need to look at your logs to figure out why the child app is not > working. "startApplication" will by default redirect the child's output to > the parent's logs. > > On Mon, Nov 7, 2016 at 3:42 PM, Mohammad

Re: Correct SparkLauncher usage

2016-11-07 Thread Mohammad Tariq
_medium=external_link&utm_campaign=chrome_ext> [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Tue, Nov 8, 2016 at 5:06 AM, Marcelo Vanzin wrote: > On Mon, Nov 7, 2016 at 3:29 PM, Mohammad Tariq wrote: > > I have been trying to use S

Correct SparkLauncher usage

2016-11-07 Thread Mohammad Tariq
Dear fellow Spark users, I have been trying to use *SparkLauncher.startApplication()* to launch a Spark app from within java code, but unable to do so. However, same piece of code is working if I use *SparkLauncher.launch()*. Here are the corresponding code snippets : *SparkAppHandle handle = ne

Re: [Erorr:]vieiwng Web UI on EMR cluster

2016-09-12 Thread Mohammad Tariq
Hi Divya, Do you you have inbounds enabled on port 50070 of your NN machine. Also, it's a good idea to have the public DNS in your /etc/hosts for proper name resolution. [image: --] Tariq, Mohammad [image: https://]about.me/mti

Re: Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mohammad Tariq
hor will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 28 July 2016 at 12:45, Mohammad Tariq > wrote: > >> Could anyone please help me with this? I have been using the same version >> of Spark with CDH-5.4.5

Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mohammad Tariq
Could anyone please help me with this? I have been using the same version of Spark with CDH-5.4.5 successfully so far. However after a recent CDH upgrade I'm not able to run the same Spark SQL module against hive-1.1.0-cdh5.7.1. When I try to run my program Spark tries to connect to local derby Hi

Re: Recommended way to push data into HBase through Spark streaming

2016-06-16 Thread Mohammad Tariq
Forgot to add, I'm on HBase 1.0.0-cdh5.4.5, so can't use HBaseContext. And spark version is 1.6.1 [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Thu, Jun 16, 2016 at 10:12 PM, Mohammad Tariq wrote: > Hi group, > > I have a

Recommended way to push data into HBase through Spark streaming

2016-06-16 Thread Mohammad Tariq
Hi group, I have a streaming job which reads data from Kafka, performs some computation and pushes the result into HBase. Actually the results are pushed into 3 different HBase tables. So I was wondering what could be the best way to achieve this. Since each executor will open its own HBase conne

Re: Running Spark in Standalone or local modes

2016-06-11 Thread Mohammad Tariq
Hi Ashok, In local mode all the processes run inside a single jvm, whereas in standalone mode we have separate master and worker processes running in their own jvms. To quickly test your code from within your IDE you could probable use the local mode. However, to get a real feel of how Spark oper

DataFrame.foreach(scala.Function1) example

2016-06-10 Thread Mohammad Tariq
Dear fellow spark users, Could someone please point me to any example showcasing the usage of *DataFrame.oreach(scala.Function1)* in *Java*? *Problem statement :* I am reading data from a Kafka topic, and for each RDD in the DStream I am creating a DataFrame in order to perform some operations. A

Re: Spark streaming readind avro from kafka

2016-06-01 Thread Mohammad Tariq
Hi Neeraj, You might find Kafka-Direct useful. BTW, are you using something like Confluent for you Kafka setup. If that's the case you might leverage Schema registry to get hold of the associated schema without additional effor

Recommended way to close resources in a Spark streaming application

2016-05-31 Thread Mohammad Tariq
Dear fellow Spark users, I have a streaming app which is reading data from Kafka, doing some computations and storing the results into HBase. Since I am new to Spark streaming I feel that there could still be scope of making my app better. To begin with, I was wondering what's the best way to fre

Re: Sorting the dataframe

2016-03-04 Thread Mohammad Tariq
You could try DataFrame.sort() to sort your data based on a column. [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Fri, Mar 4, 2016 at 1:48 PM, Angel Angel wrote: > hello sir, > > i want to sort the following table as per the *count* > > value count

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
wait for comments from the gurus. [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Thu, Mar 3, 2016 at 5:35 AM, Mohammad Tariq wrote: > Cool. Here is it how it goes... > > I am reading Avro objects from a Kafka topic as a DStream, co

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
> > Can you tell in brief what kind of operation you have to do? I can try > helping you out with that. > In general, if you are trying to use any group operations you can use > window operations. > > On Wed, Mar 2, 2016 at 6:40 PM, Mohammad Tariq wrote: > >> Hi

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
. > > On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq wrote: > >> Hi list, >> >> *Scenario :* >> I am creating a DStream by reading an Avro object from a Kafka topic and >> then converting it into a DataFrame to perform some operations on the data. >> I c

Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
Hi list, *Scenario :* I am creating a DStream by reading an Avro object from a Kafka topic and then converting it into a DataFrame to perform some operations on the data. I call DataFrame.collect() and perform the intended operation on each Row of Array[Row] returned by DataFrame.collect(). *Prob

Re: [Spark 1.5.2]: Iterate through Dataframe columns and put it in map

2016-03-02 Thread Mohammad Tariq
Hi Divya, You could call *collect()* method provided by DataFram API. This will give you an *Array[Rows]*. You could then iterate over this array and create your map. Something like this : val mapOfVals = scala.collection.mutable.Map[String,String]() var rows = DataFrame.collect() rows.foreach(r

Re: select * from mytable where column1 in (select max(column1) from mytable)

2016-02-25 Thread Mohammad Tariq
Spark doesn't support subqueries in WHERE clause, IIRC. It supports subqueries only in the FROM clause as of now. See this ticket for more on this. [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Fri, F

Re: Access fields by name/index from Avro data read from Kafka through Spark Streaming

2016-02-25 Thread Mohammad Tariq
Not sure though if it's the best way to achieve this. [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Fri, Feb 26, 2016 at 5:21 AM, Shixiong(Ryan) Zhu wrote: > You can use `DStream.map` to transform objects to anything you want. > >

Re: Spark SQL support for sub-queries

2016-02-25 Thread Mohammad Tariq
AFAIK, this isn't supported yet. A ticket is in progress though. [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.c

Access fields by name/index from Avro data read from Kafka through Spark Streaming

2016-02-25 Thread Mohammad Tariq
Hi group, I have just started working with confluent platform and spark streaming, and was wondering if it is possible to access individual fields from an Avro object read from a kafka topic through spark streaming. As per its default behaviour *KafkaUtils.createDirectStream[Object, Object, KafkaA

Spark with proxy

2015-09-08 Thread Mohammad Tariq
Hi friends, Is it possible to interact with Amazon S3 using Spark via a proxy? This is what I have been doing : SparkConf conf = new SparkConf().setAppName("MyApp").setMaster("local"); JavaSparkContext sparkContext = new JavaSparkContext(conf); Configuration hadoopConf = sparkCont

Re: DataFrame insertIntoJDBC parallelism while writing data into a DB table

2015-06-16 Thread Mohammad Tariq
I would really appreciate if someone could help me with this. On Monday, June 15, 2015, Mohammad Tariq wrote: > Hello list, > > The method *insertIntoJDBC(url: String, table: String, overwrite: > Boolean)* provided by Spark DataFrame allows us to copy a DataFrame into > a JDBC DB

DataFrame insertIntoJDBC parallelism while writing data into a DB table

2015-06-15 Thread Mohammad Tariq
Hello list, The method *insertIntoJDBC(url: String, table: String, overwrite: Boolean)* provided by Spark DataFrame allows us to copy a DataFrame into a JDBC DB table. Similar functionality is provided by the *createJDBCTable(url: String, table: String, allowExisting: Boolean) *method. But if you

Transactional guarantee while saving DataFrame into a DB

2015-06-02 Thread Mohammad Tariq
Hi list, With the help of Spark DataFrame API we can save a DataFrame into a database table through insertIntoJDBC() call. However, I could not find any info about how it handles the transactional guarantee. What if my program gets killed during the processing? Would it end up in partial load? Is

Re: Forbidded : Error Code: 403

2015-05-18 Thread Mohammad Tariq
mad about.me/mti [image: http://] <http://about.me/mti> On Sun, May 17, 2015 at 8:51 PM, Akhil Das wrote: > I think you can try this way also: > > DataFrame df = > sqlContext.load("s3n://ACCESS-KEY:SECRET-KEY@bucket-name/file.avro", > "com.databricks.spark.avro&

Re: Forbidded : Error Code: 403

2015-05-15 Thread Mohammad Tariq
Thanks for the suggestion Steve. I'll try that out. Read the long story last night while struggling with this :). I made sure that I don't have any '/' in my key. On Saturday, May 16, 2015, Steve Loughran wrote: > > > On 15 May 2015, at 21:20, Mohammad Tariq &g

Re: Forbidded : Error Code: 403

2015-05-15 Thread Mohammad Tariq
m bucket-name without > using Spark ? > > Seems like permission issue. > > Cheers > > > > On May 15, 2015, at 5:09 AM, Mohammad Tariq > wrote: > > Hello list, > > *Scenario : *I am trying to read an Avro file stored in S3 and create a > DataFrame out of i

Forbidded : Error Code: 403

2015-05-15 Thread Mohammad Tariq
Hello list, *Scenario : *I am trying to read an Avro file stored in S3 and create a DataFrame out of it using *Spark-Avro* library, but unable to do so. This is the code which I am using : public class S3DataFrame { public static void main(String[] args

NullPointerException while creating DataFrame from an S3 Avro Object

2015-05-13 Thread Mohammad Tariq
Hi List, I have just started using Spark and trying to create DataFrame from an Avro file stored in Amazon S3. I am using *Spark-Avro* library for this. The code which I'm using is shown below. Nothing fancy, just the basic prototype as shown on the Spark