Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
When you say driver running on mesos can you explain how are you doing that...?? > On Mar 10, 2016, at 4:44 PM, Eran Chinthaka Withana > wrote: > > Yanling I'm already running the driver on mesos (through docker). FYI, I'm > running this on cluster mode with MesosClusterDispatcher. > > Mac (c

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
ark home? > > Tim > > > On Mar 10, 2016, at 3:11 AM, Ashish Soni wrote: > > You need to install spark on each mesos slave and then while starting > container make a workdir to your spark home so that it can find the spark > class. > > Ashish > > On Mar 1

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
You need to install spark on each mesos slave and then while starting container make a workdir to your spark home so that it can find the spark class. Ashish > On Mar 10, 2016, at 5:22 AM, Guillaume Eynard Bontemps > wrote: > > For an answer to my question see this: > http://stackoverflow.co

Looking for Collaborator - Boston ( Spark Training )

2016-03-05 Thread Ashish Soni
Hi All, I am developing a detailed highly technical course on spark ( beyond word count ) and looking for a partner , let me know if anyone is interested. Ashish - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For a

Re: Spark 1.5 on Mesos

2016-03-04 Thread Ashish Soni
gt; chroot. > > Can you try mounting in a volume from the host when you launch the slave > for your slave's workdir? > docker run -v /tmp/mesos/slave:/tmp/mesos/slave mesos_image mesos-slave > --work_dir=/tmp/mesos/slave .... > > Tim > > On Thu, Mar 3, 2016 at 4:

Re: Spark 1.5 on Mesos

2016-03-03 Thread Ashish Soni
hed by the cluster > dispatcher, that shows you the spark-submit command it eventually ran? > > > Tim > > > > On Wed, Mar 2, 2016 at 5:42 PM, Ashish Soni wrote: > >> See below and Attached the Dockerfile to build the spark image ( >> between i just upgraded

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
>> >> On Wed, Mar 2, 2016 at 2:28 PM, Charles Allen < >> charles.al...@metamarkets.com> wrote: >> >>> Re: Spark on Mesos Warning regarding disk space: >>> https://issues.apache.org/jira/browse/SPARK-12330 >>> >>> That's a s

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
Vairavelu < vsathishkuma...@gmail.com> wrote: > Try passing jar using --jars option > > On Wed, Mar 2, 2016 at 10:17 AM Ashish Soni wrote: > >> I made some progress but now i am stuck at this point , Please help as >> looks like i am close to get it working >

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
what the problem is? > > Tim > > On Mar 1, 2016, at 8:05 AM, Ashish Soni wrote: > > Not sure what is the issue but i am getting below error when i try to run > spark PI example > > Blacklisting Mesos slave value: "5345asdasdasdkas234234asdasdasdasd&

Spark Submit using Convert to Marthon REST API

2016-03-01 Thread Ashish Soni
Hi All , Can some one please help me how do i translate below spark submit to marathon JSON request docker run -it --rm -e SPARK_MASTER="mesos://10.0.2.15:5050" -e SPARK_IMAGE="spark_driver:latest" spark_driver:latest /opt/spark/bin/spark-submit --name "PI Example" --class org.apache.spark.exam

Re: Spark 1.5 on Mesos

2016-03-01 Thread Ashish Soni
Check your Mesos UI if you see Spark application in the > Frameworks tab > > On Mon, Feb 29, 2016 at 12:23 PM Ashish Soni > wrote: > >> What is the Best practice , I have everything running as docker container >> in single host ( mesos and marathon also as docker containe

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
) > and Mesos will automatically launch docker containers for you. > > Tim > > On Mon, Feb 29, 2016 at 7:36 AM, Ashish Soni > wrote: > >> Yes i read that and not much details here. >> >> Is it true that we need to have spark installed on each mesos docker >>

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
source, what problems were you running into? > > Tim > > On Fri, Feb 26, 2016 at 11:06 AM, Yin Yang wrote: > >> Have you read this ? >> https://spark.apache.org/docs/latest/running-on-mesos.html >> >> On Fri, Feb 26, 2016 at 11:03 AM, Ashish Soni >>

Spark 1.5 on Mesos

2016-02-26 Thread Ashish Soni
Hi All , Is there any proper documentation as how to run spark on mesos , I am trying from the last few days and not able to make it work. Please help Ashish

Communication between two spark streaming Job

2016-02-19 Thread Ashish Soni
Hi , Is there any way we can communicate across two different spark streaming job , as below is the scenario we have two spark streaming job one to process metadata and one to process actual data ( this needs metadata ) So if someone did the metadata update we need to update the cache maintained

SPARK-9559

2016-02-18 Thread Ashish Soni
Hi All , Just wanted to know if there is any work around or resolution for below issue in Stand alone mode https://issues.apache.org/jira/browse/SPARK-9559 Ashish

Seperate Log4j.xml for Spark and Application JAR ( Application vs Spark )

2016-02-12 Thread Ashish Soni
Hi All , As per my best understanding we can have only one log4j for both spark and application as which ever comes first in the classpath takes precedence , Is there any way we can keep one in application and one in the spark conf folder .. is it possible ? Thanks

Re: Spark Submit

2016-02-12 Thread Ashish Soni
; spark-submit --conf "spark.executor.memory=512m" --conf > "spark.executor.extraJavaOptions=x" --conf "Dlog4j.configuration=log4j.xml" > > Sent from Samsung Mobile. > > > Original message ---- > From: Ted Yu > Date:12/02/2016

Spark Submit

2016-02-12 Thread Ashish Soni
Hi All , How do i pass multiple configuration parameter while spark submit Please help i am trying as below spark-submit --conf "spark.executor.memory=512m spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.xml" Thanks,

Example of onEnvironmentUpdate Listener

2016-02-08 Thread Ashish Soni
Are there any examples as how to implement onEnvironmentUpdate method for customer listener Thanks,

Dynamically Change Log Level Spark Streaming

2016-02-08 Thread Ashish Soni
Hi All , How do change the log level for the running spark streaming Job , Any help will be appriciated. Thanks,

Redirect Spark Logs to Kafka

2016-02-01 Thread Ashish Soni
Hi All , Please let me know how we can redirect spark logging files or tell spark to log to kafka queue instead of files .. Ashish

Re: Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
gt; In that case, use the simpler constructor that takes the kafka config and > the topics. Let it figure it out the offsets (it will contact kafka and > request the partitions for the topics provided) > > KafkaUtils.createDirectStream[...](ssc, kafkaConfig, topics) > > -kr, Gerard > &

Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
Hi All , What is the best way to tell spark streaming job for the no of partition to to a given topic - Should that be provided as a parameter or command line argument or We should connect to kafka in the driver program and query it Map fromOffsets = new HashMap(); fromOffsets.put(new TopicAndPa

How to change the no of cores assigned for a Submitted Job

2016-01-12 Thread Ashish Soni
Hi , I have a strange behavior when i creating standalone spark container using docker Not sure why by default it is assigning 4 cores to the first Job it submit and then all the other jobs are in wait state , Please suggest if there is an setting to change this i tried --executor-cores 1 but it

Re: question on make multiple external calls within each partition

2015-10-05 Thread Ashish Soni
Need more details but you might want to filter the data first ( create multiple RDD) and then process. > On Oct 5, 2015, at 8:35 PM, Chen Song wrote: > > We have a use case with the following design in Spark Streaming. > > Within each batch, > * data is read and partitioned by some key > * fo

Re: DStream Transformation to save JSON in Cassandra 2.1

2015-10-05 Thread Ashish Soni
try this You can use dstream.map to conver it to JavaDstream with only the data you are interested probably return an Pojo of your JSON and then call foreachRDD and inside that call below line javaFunctions(rdd).writerBuilder("table", "keyspace", mapToRow(Class.class)).saveToCassandra(); On Mo

Re: Spark Streaming Log4j Inside Eclipse

2015-09-29 Thread Ashish Soni
gLevel to set the log level in your > codes. > > Best Regards, > Shixiong Zhu > > 2015-09-28 22:55 GMT+08:00 Ashish Soni : > >> I am not running it using spark submit , i am running locally inside >> Eclipse IDE , how i set this using JAVA Code >> >> Ashish

Re: Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
38/how-to-override-sparks-log4j-properties-per-driver > > From: Ashish Soni > Date: Monday, September 28, 2015 at 5:18 PM > To: user > Subject: Spark Streaming Log4j Inside Eclipse > > I need to turn off the verbose logging of Spark Streaming Code when i am > running

Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
Hi All , I need to turn off the verbose logging of Spark Streaming Code when i am running inside eclipse i tried creating a log4j.properties file and placed inside /src/main/resources but i do not see it getting any effect , Please help as not sure what else needs to be done to change the log at D

Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Ashish Soni
Hi All , Just wanted to find out if there is an benefits to installing kafka brokers and spark nodes on the same machine ? is it possible that spark can pull data from kafka if it is local to the node i.e. the broker or partition is on the same machine. Thanks, Ashish

Spark Cassandra Filtering

2015-09-16 Thread Ashish Soni
Hi , How can i pass an dynamic value inside below function to filter instead of hardcoded if have an existing RDD and i would like to use data in that for filter so instead of doing .where("name=?","Anna") i want to do .where("name=?",someobject.value) Please help JavaRDD rdd3 = javaFunctions(sc

Dynamic Workflow Execution using Spark

2015-09-15 Thread Ashish Soni
Hi All , Are there any framework which can be used to execute workflows with in spark or Is it possible to use ML Pipeline for workflow execution but not doing ML . Thanks, Ashish

Re: FlatMap Explanation

2015-09-03 Thread Ashish Soni
gt; > > > Flat map that concatenates the results, so you get > > > > 1,2,3, 2,3, 3,3 > > > > You should get the same with any scala collection > > > > Cheers > > > > *From:* Ashish Soni [mailto:asoni.le...@gmail.com] > *Sent:* Thursday, Se

FlatMap Explanation

2015-09-02 Thread Ashish Soni
Hi , Can some one please explain the output of the flat map data in RDD as below {1, 2, 3, 3} rdd.flatMap(x => x.to(3)) output as below {1, 2, 3, 2, 3, 3, 3} i am not able to understand how the output came as above. Thanks,

Java Streaming Context - File Stream use

2015-08-10 Thread Ashish Soni
Please help as not sure what is incorrect with below code as it gives me complilaton error in eclipse SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("JavaDirectKafkaWordCount"); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Duratio

spark.files.userClassPathFirst=true Return Error - Please help

2015-07-22 Thread Ashish Soni
Hi All , I am getting below error when i use the --conf spark.files.userClassPathFirst=true parameter Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 32, 10.200.37.161): java.lang.ClassCastException: cannot assign instance

Class Loading Issue - Spark Assembly and Application Provided

2015-07-21 Thread Ashish Soni
Hi All , I am having a class loading issue as Spark Assembly is using google guice internally and one of Jar i am using uses sisu-guice-3.1.0-no_aop.jar , How do i load my class first so that it doesn't result in error and tell spark to load its assembly later on Ashish

XML Parsing

2015-07-19 Thread Ashish Soni
Hi All , I have an XML file with same tag repeated multiple times as below , Please suggest what would be best way to process this data inside spark as ... How can i extract each open and closing tag and process them or how can i combine multiple line into single line ... .. .. Thanks,

BroadCast on Interval ( eg every 10 min )

2015-07-16 Thread Ashish Soni
Hi All , How can i broadcast a data change to all the executor ever other 10 min or 1 min Ashish

How Will Spark Execute below Code - Driver and Executors

2015-07-06 Thread Ashish Soni
Hi All , If some one can help me understand as which portion of the code gets executed on Driver and which portion will be executed on executor from the below code it would be a great help I have to load data from 10 Tables and then use that data in various manipulation and i am using SPARK SQL f

Spark SQL and Streaming - How to execute JDBC Query only once

2015-07-02 Thread Ashish Soni
Hi All , I have and Stream of Event coming in and i want to fetch some additional data from the database based on the values in the incoming data , For Eg below is the data coming in loginName Email address city Now for each login name i need to go to oracle database and get the userId from the

Re: DataFrame Filter Inside Another Data Frame Map

2015-07-01 Thread Ashish Soni
dd inside another rdd.map function... > Rdd object is not serialiable. Whatever objects you use inside map > function should be serializable as they get transferred to executor nodes. > On Jul 2, 2015 6:13 AM, "Ashish Soni" wrote: > >> Hi All , >> >> I am not

DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread Ashish Soni
Hi All , I have an DataFrame Created as below options.put("dbtable", "(select * from user) as account"); DataFrame accountRdd = sqlContext.read().format("jdbc").options(options).load(); and i have another RDD which contains login name and i want to find the userid from above DF RDD and r

DataFrame Filter Inside Another Data Frame Map

2015-07-01 Thread Ashish Soni
Hi All , I am not sure what is the wrong with below code as it give below error when i access inside the map but it works outside JavaRDD rdd2 = rdd.map(new Function() { @Override public Charge call(Charge ch) throws Exception { * DataFrame df = accountR

BroadCast Multiple DataFrame ( JDBC Tables )

2015-07-01 Thread Ashish Soni
Hi , I need to load 10 tables in memory and have them available to all the workers , Please let me me know what is the best way to do broadcast them sc.broadcast(df) allow only one Thanks,

Convert CSV lines to List of Objects

2015-07-01 Thread Ashish Soni
Hi , How can i use Map function in java to convert all the lines of csv file into a list of objects , Can some one please help... JavaRDD> rdd = sc.textFile("data.csv").map(new Function>() { @Override public List call(String s) { } }); Thanks,

Load Multiple DB Table - Spark SQL

2015-06-29 Thread Ashish Soni
Hi All , What is the best possible way to load multiple data tables using spark sql Map options = new HashMap<>(); options.put("driver", MYSQLDR); options.put("url", MYSQL_CN_URL); options.put("dbtable","(select * from courses); *can i add multiple tables to options map options.put("dbtable1"

Spark-Submit / Spark-Shell Error Standalone cluster

2015-06-27 Thread Ashish Soni
Not sure what is the issue but when i run the spark-submit or spark-shell i am getting below error /usr/bin/spark-class: line 24: /usr/bin/load-spark-env.sh: No such file or directory Can some one please help Thanks,

Re: Kafka Direct Stream - Custom Serialization and Deserilization

2015-06-26 Thread Ashish Soni
main/java/org/apache/spark/examples/streaming/JavaDirectKafkaWordCount.java> > for you to start.​ > > > > Thanks > Best Regards > > On Fri, Jun 26, 2015 at 6:09 PM, Ashish Soni > wrote: > >> Hi , >> >> If i have a below data format , how can i use k

Kafka Direct Stream - Custom Serialization and Deserilization

2015-06-26 Thread Ashish Soni
Hi , If i have a below data format , how can i use kafka direct stream to de-serialize as i am not able to understand all the parameter i need to pass , Can some one explain what will be the arguments as i am not clear about this JavaPairInputDStream , V > org .apache .spark .streaming .kafk

WorkFlow Processing - Spark

2015-06-24 Thread Ashish Soni
Hi All , We are looking to use spark as our stream processing framework and it would be helpful if experts can weigh if we made a right choice given below requirement Given a stream of data we need to take those event to multiple stage ( pipeline processing ) and in those stage customer will defi

How Spark Execute chaining vs no chaining statements

2015-06-23 Thread Ashish Soni
Hi All , What is difference between below in terms of execution to the cluster with 1 or more worker node rdd.map(...).map(...)...map(..) vs val rdd1 = rdd.map(...) val rdd2 = rdd1.map(...) val rdd3 = rdd2.map(...) Thanks, Ashish

Spark and HDFS ( Worker and Data Nodes Combination )

2015-06-22 Thread Ashish Soni
Hi All , What is the Best Way to install and Spark Cluster along side with Hadoop Cluster , Any recommendation for below deployment topology will be a great help *Also Is it necessary to put the Spark Worker on DataNodes as when it read block from HDFS it will be local to the Server / Worker or

Spark 1.4 History Server - HDP 2.2

2015-06-20 Thread Ashish Soni
Can any one help i am getting below error when i try to start the History Server I do not see any org.apache.spark.deploy.yarn.history.pakage inside the assembly jar not sure how to get that java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.history.YarnHistoryProvider Thanks, As

Spark on Yarn - How to configure

2015-06-19 Thread Ashish Soni
Can some one please let me know what all i need to configure to have Spark run using Yarn , There is lot of documentation but none of it says how and what all files needs to be changed Let say i have 4 node for Spark - SparkMaster , SparkSlave1 , SparkSlave2 , SparkSlave3 Now in which node which

Re: Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Ashish Soni
> On Fri, Jun 19, 2015 at 10:22 PM, Ashish Soni > wrote: > >> Hi , >> >> Is any one able to install Spark 1.4 on HDP 2.2 , Please let me know how >> can i do the same ? >> >> Ashish >> > > > > -- > Best Regards, > Ayan Guha >

Re: RE: Spark or Storm

2015-06-19 Thread Ashish Soni
ics? > > > -- > bit1...@163.com > > > *From:* Haopu Wang > *Date:* 2015-06-19 18:47 > *To:* Enno Shioji ; Tathagata Das > *CC:* prajod.vettiyat...@wipro.com; Cody Koeninger ; > bit1...@163.com; Jordan Pilat ; Will Briggs > ; Ashish

Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Ashish Soni
Hi , Is any one able to install Spark 1.4 on HDP 2.2 , Please let me know how can i do the same ? Ashish

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
ment: Also, you can do some processing with >>>> Kinesis. If all you need to do is straight forward transformation and you >>>> are reading from Kinesis to begin with, it might be an easier option to >>>> just do the transformation in Kinesis >>>> >>&

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
you must somehow make the update >> operation idempotent. Replacing the entire state is the easiest way to do >> it, but it's obviously expensive. >> >> The alternative is to do something similar to what Storm does. At that >> point, you'll have to ask tho

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
ther you have to implement something yourself, or you can use Storm > Trident (or transactional low-level API). > > On Wed, Jun 17, 2015 at 1:26 PM, Ashish Soni > wrote: > >> My Use case is below >> >> We are going to receive lot of event as stream ( basically Kafka S

Twitter Heron: Stream Processing at Scale - Does Spark Address all the issues

2015-06-17 Thread Ashish Soni
Hi Sparkers , https://dl.acm.org/citation.cfm?id=2742788 Recently Twitter release a paper on Heron as an replacement of Apache Storm and i would like to know if currently Apache Spark Does Suffer from the same issues as they have outlined. Any input / thought will be helpful. Thanks, Ashish

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
My Use case is below We are going to receive lot of event as stream ( basically Kafka Stream ) and then we need to process and compute Consider you have a phone contract with ATT and every call / sms / data useage you do is an event and then it needs to calculate your bill on real time basis so