Unsubscribe

2022-07-28 Thread Ashish
Unsubscribe Sent from my iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: UnknownhostException : home

2015-01-19 Thread Ashish
20 more >> >> >> I couldn't trace the cause of this exception. Any help in this regard? >> >> Thanks > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread Ashish
> For additional commands, e-mail: user-h...@spark.apache.org >> > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-20 Thread Ashish
e-calculated (and persisted) after (4) is calculated. > > On Tue, Jan 20, 2015 at 3:38 AM, Ashish wrote: >> Sean, >> >> A related question. When to persist the RDD after step 2 or after Step >> 3 (nothing would happen before step 3 I assume)? >> >> On Mon, J

Re: UnknownhostException : home

2015-01-19 Thread Ashish
ala) > Caused by: java.net.UnknownHostException: home > ... 20 more > > > I couldn't trace the cause of this exception. Any help in this regard? > > Thanks -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal -

Spark SQL v MemSQL/Voltdb

2015-05-28 Thread Ashish Mukherjee
Hello, I was wondering if there is any documented comparison of SparkSQL with MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow queries to be run in a clustered environment. What is the major differentiation? Regards, Ashish

Re: Spark SQL v MemSQL/Voltdb

2015-05-28 Thread Ashish Mukherjee
to a clustered scenario, which is the right engine at various degrees of scale? Regards, Ashish On Fri, May 29, 2015 at 6:57 AM, Mohit Jaggi wrote: > I have used VoltDB and Spark. The use cases for the two are quite > different. VoltDB is intended for transactions and also supports quer

RDD staleness

2015-05-31 Thread Ashish Mukherjee
Hello, Since RDDs are created from data from Hive tables or HDFS, how do we ensure they are invalidated when the source data is updated? Regards, Ashish

spark streaming - checkpointing - looking at old application directory and failure to start streaming context

2015-06-10 Thread Ashish Nigam
ala:145) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561) at Any idea on how to fix this issue? Thanks Ashish

Re: spark streaming - checkpointing - looking at old application directory and failure to start streaming context

2015-06-10 Thread Ashish Nigam
Jun 10, 2015 at 9:18 AM, Akhil Das wrote: > Delete the checkpoint directory, you might have modified your driver > program. > > Thanks > Best Regards > > On Wed, Jun 10, 2015 at 9:44 PM, Ashish Nigam > wrote: > >> Hi, >> If checkpoint data is already pres

Re: spark streaming - checkpointing - looking at old application directory and failure to start streaming context

2015-06-11 Thread Ashish Nigam
Any idea why this happens? On Wed, Jun 10, 2015 at 9:28 AM, Ashish Nigam wrote: > BTW, I am using spark streaming 1.2.0 version. > > On Wed, Jun 10, 2015 at 9:26 AM, Ashish Nigam > wrote: > >> I did not change driver program. I just shutdown the context and again >>

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
we will be writing that >> piece in Java pojo. >> All env is on aws. Hbase is on a long running EMR and kinesis on a >> separate cluster. >> TIA. >> Best >> Ayan >> On 17 Jun 2015 12:13, "Will Briggs" wrote: >> >> The progr

Twitter Heron: Stream Processing at Scale - Does Spark Address all the issues

2015-06-17 Thread Ashish Soni
Hi Sparkers , https://dl.acm.org/citation.cfm?id=2742788 Recently Twitter release a paper on Heron as an replacement of Apache Storm and i would like to know if currently Apache Spark Does Suffer from the same issues as they have outlined. Any input / thought will be helpful. Thanks, Ashish

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
As per my Best Understanding Spark Streaming offer Exactly once processing , is this achieve only through updateStateByKey or there is another way to do the same. Ashish On Wed, Jun 17, 2015 at 8:48 AM, Enno Shioji wrote: > In that case I assume you need exactly once semantics. There&#

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
Stream can also be processed in micro-batch / batches which is the main reason behind Spark Steaming so what is the difference ? Ashish On Wed, Jun 17, 2015 at 9:04 AM, Enno Shioji wrote: > PS just to elaborate on my first sentence, the reason Spark (not > streaming) can offer exactl

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
ment: Also, you can do some processing with >>>> Kinesis. If all you need to do is straight forward transformation and you >>>> are reading from Kinesis to begin with, it might be an easier option to >>>> just do the transformation in Kinesis >>>> >>&

Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Ashish Soni
Hi , Is any one able to install Spark 1.4 on HDP 2.2 , Please let me know how can i do the same ? Ashish

Re: RE: Spark or Storm

2015-06-19 Thread Ashish Soni
evaluating the framework and does not have enough time to validate all the use cases but to relay on the documentation. Ashish On Fri, Jun 19, 2015 at 7:10 AM, bit1...@163.com wrote: > > I think your observation is correct, you have to take care of these > replayed data at your end,eg,each mess

Re: Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Ashish Soni
I do not where to start as Spark 1.2 comes bundled with HDP2.2 but i want to use 1.4 and i do not know how to update it to 1.4 Ashish On Fri, Jun 19, 2015 at 8:26 AM, ayan guha wrote: > what problem are you facing? are you trying to build it yurself or > gettingpre-built version? >

Spark on Yarn - How to configure

2015-06-19 Thread Ashish Soni
which files needs to changed to make sure my master node is SparkMaster and slave nodes are 1,2,3 and how to tell / configure Yarn Ashish

Spark 1.4 History Server - HDP 2.2

2015-06-20 Thread Ashish Soni
, Ashish

Spark and HDFS ( Worker and Data Nodes Combination )

2015-06-22 Thread Ashish Soni
Hi All , What is the Best Way to install and Spark Cluster along side with Hadoop Cluster , Any recommendation for below deployment topology will be a great help *Also Is it necessary to put the Spark Worker on DataNodes as when it read block from HDFS it will be local to the Server / Worker or

How Spark Execute chaining vs no chaining statements

2015-06-23 Thread Ashish Soni
Hi All , What is difference between below in terms of execution to the cluster with 1 or more worker node rdd.map(...).map(...)...map(..) vs val rdd1 = rdd.map(...) val rdd2 = rdd1.map(...) val rdd3 = rdd2.map(...) Thanks, Ashish

WorkFlow Processing - Spark

2015-06-24 Thread Ashish Soni
define there own logic like custom code which we need to load inside a driver program ... Any idea the best way to do this ... Ashish

Kafka Direct Stream - Custom Serialization and Deserilization

2015-06-26 Thread Ashish Soni
Hi , If i have a below data format , how can i use kafka direct stream to de-serialize as i am not able to understand all the parameter i need to pass , Can some one explain what will be the arguments as i am not clear about this JavaPairInputDStream , V > org .apache .spark .streaming .kafk

Re: Kafka Direct Stream - Custom Serialization and Deserilization

2015-06-26 Thread Ashish Soni
my question is why there are similar two parameter String.Class and StringDecoder.class what is the difference each of them ? Ashish On Fri, Jun 26, 2015 at 8:53 AM, Akhil Das wrote: > ​JavaPairInputDStream messages = > KafkaUtils.createDirectStream( > jssc, >

spark streaming job fails to restart after checkpointing due to DStream initialization errors

2015-06-26 Thread Ashish Nigam
resolve this issue? Thanks Ashish

Re: spark streaming job fails to restart after checkpointing due to DStream initialization errors

2015-06-26 Thread Ashish Nigam
Make sure you're following the docs regarding setting up a streaming > checkpoint. > > Post your code if you can't get it figured out. > > On Fri, Jun 26, 2015 at 3:45 PM, Ashish Nigam > wrote: > >> I bring up spark streaming job that uses Kafka as input s

Spark-Submit / Spark-Shell Error Standalone cluster

2015-06-27 Thread Ashish Soni
Not sure what is the issue but when i run the spark-submit or spark-shell i am getting below error /usr/bin/spark-class: line 24: /usr/bin/load-spark-env.sh: No such file or directory Can some one please help Thanks,

Load Multiple DB Table - Spark SQL

2015-06-29 Thread Ashish Soni
bles to options map options.put("dbtable1","(select * from test1);options.put("dbtable2","(select * from test2);* DataFrame jdbcDF = sqlContext.load("jdbc", options); Thanks, Ashish

Convert CSV lines to List of Objects

2015-07-01 Thread Ashish Soni
Hi , How can i use Map function in java to convert all the lines of csv file into a list of objects , Can some one please help... JavaRDD> rdd = sc.textFile("data.csv").map(new Function>() { @Override public List call(String s) { } }); Thanks,

BroadCast Multiple DataFrame ( JDBC Tables )

2015-07-01 Thread Ashish Soni
Hi , I need to load 10 tables in memory and have them available to all the workers , Please let me me know what is the best way to do broadcast them sc.broadcast(df) allow only one Thanks,

DataFrame Filter Inside Another Data Frame Map

2015-07-01 Thread Ashish Soni
Hi All , I am not sure what is the wrong with below code as it give below error when i access inside the map but it works outside JavaRDD rdd2 = rdd.map(new Function() { @Override public Charge call(Charge ch) throws Exception { * DataFrame df = accountR

DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread Ashish Soni
Hi All , I have an DataFrame Created as below options.put("dbtable", "(select * from user) as account"); DataFrame accountRdd = sqlContext.read().format("jdbc").options(options).load(); and i have another RDD which contains login name and i want to find the userid from above DF RDD and r

Re: DataFrame Filter Inside Another Data Frame Map

2015-07-01 Thread Ashish Soni
Thanks , So if i load some static data from database and then i need to use than in my map function to filter records what will be the best way to do it, Ashish On Wed, Jul 1, 2015 at 10:45 PM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > You cannot refer to one r

Spark SQL and Streaming - How to execute JDBC Query only once

2015-07-02 Thread Ashish Soni
Hi All , I have and Stream of Event coming in and i want to fetch some additional data from the database based on the values in the incoming data , For Eg below is the data coming in loginName Email address city Now for each login name i need to go to oracle database and get the userId from the

How Will Spark Execute below Code - Driver and Executors

2015-07-06 Thread Ashish Soni
Hi All , If some one can help me understand as which portion of the code gets executed on Driver and which portion will be executed on executor from the below code it would be a great help I have to load data from 10 Tables and then use that data in various manipulation and i am using SPARK SQL f

JVM is not ready after 10 seconds.

2015-07-06 Thread Ashish Dutt
s:port number") I get the following error. > sc=sparkR.init(master="spark://10.229.200.250:7377") Launching java with spark-submit command C:\spark-1.4.0\bin/bin/spark-submit.cmd sparkr-shell C:\Users\ASHISH~1\AppData\Local\Temp\Rtmp82kCxH\backend_port4281739d85 Error

Re: JVM is not ready after 10 seconds

2015-07-06 Thread Ashish Dutt
Hello Shivaram, Thank you for your response. Being a novice at this stage can you also tell how to configure or set the execute permission for the spark-submit file? Thank you for your time. Sincerely, Ashish Dutt On Tue, Jul 7, 2015 at 9:21 AM, Shivaram Venkataraman < sh

Re: JVM is not ready after 10 seconds

2015-07-06 Thread Ashish Dutt
# spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" Sincerely, Ashish Dutt On Tue, Jul 7, 2015 at 9:30 AM, Ashish Dutt wrote: > Hello Shivaram, > Thank you for your response. Being a novice at this st

How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
the master? Thanks, Ashish

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
Thank you Ayan for your response.. But I have just realised that the Spark is configured to be a history server. Please, can somebody suggest to me how can I convert Spark history server to be a Master server? Thank you Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 12:28 PM, ayan guha wrote

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
g4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/07/08 11:28:35 INFO SecurityManager: Changing view acls to: Ashish Dutt 15/07/08 11:28:35 INFO Securit

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
. All I want for now is how to connect my laptop to the spark cluster machine using either pyspark or SparkR. (I have python 2.7) On my laptop I am using winutils in place of hadoop and have spark 1.4 installed Thank you Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks for your reply Akhil. How do you multithread it? Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 3:29 PM, Akhil Das wrote: > Whats the point of creating them in parallel? You can multi-thread it run > it in parallel though. > > Thanks > Best Regards > > On Wed, J

How to upgrade Spark version in CDH 5.4

2015-07-08 Thread Ashish Dutt
bee--7dc6__section_zd5_1yz_l4> but I do not see any thing relevant Any suggestions directing to a solution are welcome. Thanks, Ashish

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks you Akhil for the link Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Wed, Jul 8, 2015 at 3:43 PM, Akhil Das wrote: > Have a look > http://alvinalexander.com/scala/how-to-create-java-

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
and hence not much help to me. I am able to launch ipython on localhost but cannot get it to work on the cluster Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 5:49 PM, sooraj wrote: > That turned out to be a silly data type mistake. At one point in the > iterative call, I was passing an i

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
python.org/gist/fperez/6384491/00-Setup-IPython-PySpark.ipynb> Thanks, Ashish On Wed, Jul 8, 2015 at 5:49 PM, sooraj wrote: > That turned out to be a silly data type mistake. At one point in the > iterative call, I was passing an integer value for the parameter 'alpha' of > t

Re: Getting started with spark-scala developemnt in eclipse.

2015-07-08 Thread Ashish Dutt
Hello Prateek, I started with getting the pre built binaries so as to skip the hassle of building them from scratch. I am not familiar with scala so can't comment on it. I have documented my experiences on my blog www.edumine.wordpress.com Perhaps it might be useful to you. On 08-Jul-2015 9:39 PM,

Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
connect to the nodes I am usnig SSH Question: Would it be better if I work directly on the nodes rather than trying to connect my laptop to them ? Question 2: If yes, then can you suggest any python and R IDE that I can install on the nodes to make it work? Thanks for your help Sincerely, Ashish

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
Hello Sooraj, Thank you for your response. It indeed give me a ray of hope now. Can you please suggest any good tutorials for installing and working with ipython notebook server on the node. Thank you Ashish On 08-Jul-2015 6:16 PM, "sooraj" wrote: > > Hi Ashish, > > I am ru

Re: Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
The error is JVM has not responded after 10 seconds. On 08-Jul-2015 10:54 PM, "ayan guha" wrote: > What's the error you are getting? > On 9 Jul 2015 00:01, "Ashish Dutt" wrote: > >> Hi, >> >> We have a cluster with 4 nodes. The cluster uses CD

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
N", MAVEN_HOME="D:\MAVEN\BIN", PYTHON_HOME="C:\PYTHON27\", SBT_HOME="C:\SBT\" Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Thu, Jul 9, 2015 at 4:56 AM, Sujit Pal wrote: >

DLL load failed: %1 is not a valid win32 application on invoking pyspark

2015-07-08 Thread Ashish Dutt
your help. Sincerely, Ashish Dutt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
written something wrong here. Cannot seem to figure out, what is it? Thank you for your help Sincerely, Ashish Dutt On Thu, Jul 9, 2015 at 11:53 AM, Sujit Pal wrote: > Hi Ashish, > > >> Nice post. > Agreed, kudos to the author of the post, Benjamin Benfort of District Labs. >

Re: Connecting to nodes on cluster

2015-07-09 Thread Ashish Dutt
Hello Akhil, Thanks for the response. I will have to figure this out. Sincerely, Ashish On Thu, Jul 9, 2015 at 3:40 PM, Akhil Das wrote: > On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt > wrote: > >> Hi, >> >> We have a cluster with 4 nodes. The cluster uses CDH 5.4

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-07-13 Thread Ashish Dutt
n Windows environment? What I mean is how to setup .libPaths()? where is it in windows environment Thanks for your help Sincerely, Ashish Dutt On Mon, Jul 13, 2015 at 3:48 PM, Sun, Rui wrote: > Hi, Kachau, > > If you are using SparkR with RStudio, have you followed the guideli

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Ashish Mukherjee
MySQL and PgSQL scale to millions. Spark or any distributed/clustered computing environment would be inefficient for the kind of data size you mention. That's because of coordination of processes, moving data around etc. On Mon, Jul 13, 2015 at 5:34 PM, Sandeep Giri wrote: > Even for 2L records

Re: Is it possible to change the default port number 7077 for spark?

2015-07-13 Thread Ashish Dutt
Hello Arun, Thank you for the descriptive response. And thank you for providing the sample file too. It certainly is a great help. Sincerely, Ashish On Mon, Jul 13, 2015 at 10:30 PM, Arun Verma wrote: > > PFA sample file > > On Mon, Jul 13, 2015 at 7:37 PM, Arun Verma >

BroadCast on Interval ( eg every 10 min )

2015-07-16 Thread Ashish Soni
Hi All , How can i broadcast a data change to all the executor ever other 10 min or 1 min Ashish

XML Parsing

2015-07-19 Thread Ashish Soni
Hi All , I have an XML file with same tag repeated multiple times as below , Please suggest what would be best way to process this data inside spark as ... How can i extract each open and closing tag and process them or how can i combine multiple line into single line ... .. .. Thanks,

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
what the problem is? > > Tim > > On Mar 1, 2016, at 8:05 AM, Ashish Soni wrote: > > Not sure what is the issue but i am getting below error when i try to run > spark PI example > > Blacklisting Mesos slave value: "5345asdasdasdkas234234asdasdasdasd&

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
I have no luck and i would to ask the question to spark committers will this be ever designed to run on mesos ? spark app as a docker container not working at all on mesos ,if any one would like the code i can send it over to have a look. Ashish On Wed, Mar 2, 2016 at 12:23 PM, Sathish Kumaran

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
On Wed, Mar 2, 2016 at 5:49 PM, Charles Allen wrote: > @Tim yes, this is asking about 1.5 though > > On Wed, Mar 2, 2016 at 2:35 PM Tim Chen wrote: > >> Hi Charles, >> >> I thought that's fixed with your patch in latest master now right? >> >> A

Re: Spark 1.5 on Mesos

2016-03-03 Thread Ashish Soni
hed by the cluster > dispatcher, that shows you the spark-submit command it eventually ran? > > > Tim > > > > On Wed, Mar 2, 2016 at 5:42 PM, Ashish Soni wrote: > >> See below and Attached the Dockerfile to build the spark image ( >> between i just upgraded

Re: Spark 1.5 on Mesos

2016-03-04 Thread Ashish Soni
gt; chroot. > > Can you try mounting in a volume from the host when you launch the slave > for your slave's workdir? > docker run -v /tmp/mesos/slave:/tmp/mesos/slave mesos_image mesos-slave > --work_dir=/tmp/mesos/slave .... > > Tim > > On Thu, Mar 3, 2016 at 4:

Looking for Collaborator - Boston ( Spark Training )

2016-03-05 Thread Ashish Soni
Hi All, I am developing a detailed highly technical course on spark ( beyond word count ) and looking for a partner , let me know if anyone is interested. Ashish - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
You need to install spark on each mesos slave and then while starting container make a workdir to your spark home so that it can find the spark class. Ashish > On Mar 10, 2016, at 5:22 AM, Guillaume Eynard Bontemps > wrote: > > For an answer to my question see t

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
Hi Tim , Can you please share your dockerfiles and configuration as it will help a lot , I am planing to publish a blog post on the same . Ashish On Thu, Mar 10, 2016 at 10:34 AM, Timothy Chen wrote: > No you don't need to install spark on each slave, we have been running > th

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
When you say driver running on mesos can you explain how are you doing that...?? > On Mar 10, 2016, at 4:44 PM, Eran Chinthaka Withana > wrote: > > Yanling I'm already running the driver on mesos (through docker). FYI, I'm > running this on cluster mode with MesosClusterDispatcher. > > Mac (c

Spark for Log Analytics

2016-03-31 Thread ashish rawat
Kafka for the complex use cases, while logstash filters can be used for the simpler use cases. I was wondering if someone has already done this evaluation and could provide me some pointers on how/if to create this pipeline with Spark. Regards, Ashish

Re: Spark for Log Analytics

2016-03-31 Thread ashish rawat
ache/Nginx/Mongo etc) to Kafka, what could be the ideal strategy? Regards, Ashish On Thu, Mar 31, 2016 at 5:16 PM, Chris Fregly wrote: > oh, and I forgot to mention Kafka Streams which has been heavily talked > about the last few days at Strata here in San Jose. > > Streams can

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-07 Thread Ashish Dubey
How big is your file and can you also share the code snippet On Saturday, May 7, 2016, Johnny W. wrote: > hi spark-user, > > I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a > dataframe from a parquet data source with a single parquet file, it yields > a stage with lots of sma

Re: How to verify if spark is using kryo serializer for shuffle

2016-05-07 Thread Ashish Dubey
your driver heap size and application structure ( num of stages and tasks ) Ashish On Saturday, May 7, 2016, Nirav Patel wrote: > Right but this logs from spark driver and spark driver seems to use Akka. > > ERROR [sparkDriver-akka.actor.default-dispatcher-17] > akka.actor.Act

Re: Parse Json in Spark

2016-05-08 Thread Ashish Dubey
This limit is due to underlying inputFormat implementation. you can always write your own inputFormat and then use spark newAPIHadoopFile api to pass your inputFormat class path. You will have to place the jar file in /lib location on all the nodes.. Ashish On Sun, May 8, 2016 at 4:02 PM

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
Brandon, how much memory are you giving to your executors - did you check if there were dead executors in your application logs.. Most likely you require higher memory for executors.. Ashish On Sun, May 8, 2016 at 1:01 PM, Brandon White wrote: > Hello all, > > I am running a Spark ap

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
gt; On May 8, 2016 5:55 PM, "Ashish Dubey" wrote: > > Brandon, > > how much memory are you giving to your executors - did you check if there > were dead executors in your application logs.. Most likely you require > higher memory for executors.. > > Ashish > >

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-08 Thread Ashish Dubey
l.parquet.filterPushdown: true > spark.sql.parquet.mergeSchema: true > > Thanks, > J. > > On Sat, May 7, 2016 at 4:20 PM, Ashish Dubey wrote: > >> How big is your file and can you also share the code snippet >> >> >> On Saturday, May 7, 2016, Johnny W. wro

Re: Joining a RDD to a Dataframe

2016-05-08 Thread Ashish Dubey
Is there any reason you dont want to convert this - i dont think join b/w RDD and DF is supported. On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon wrote: > Hi, > > I have a RDD built during a spark streaming job and I'd like to join it to > a DataFrame (E/S input) to enrich it. > It seems that I

Discover SparkUI port for spark streaming job running in cluster mode

2015-12-14 Thread Ashish Nigam
:50571 INFO util.Utils: Successfully started service 'SparkUI' on port 50571. INFO ui.SparkUI: Started SparkUI at http://xxx:50571 Is there any way to know about the UI port automatically using some API? Thanks Ashish

Deployment and performance related queries for Spark and Cassandra

2015-12-21 Thread Ashish Gadkari
any performance related parameters in Spark, Cassandra, Solr which will reduce the job time Any help to increase the performance will be appreciated. Thanks -- Ashish Gadkari

How to change the no of cores assigned for a Submitted Job

2016-01-12 Thread Ashish Soni
Hi , I have a strange behavior when i creating standalone spark container using docker Not sure why by default it is assigning 4 cores to the first Job it submit and then all the other jobs are in wait state , Please suggest if there is an setting to change this i tried --executor-cores 1 but it

Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
TopicAndPartition(driverArgs.inputTopic, 0), 0L); Thanks, Ashish

Re: Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
, what is the correct approach. Ashish On Mon, Jan 25, 2016 at 11:38 AM, Gerard Maas wrote: > What are you trying to achieve? > > Looks like you want to provide offsets but you're not managing them > and I'm assuming you're using the direct stream approach. > &

Redirect Spark Logs to Kafka

2016-02-01 Thread Ashish Soni
Hi All , Please let me know how we can redirect spark logging files or tell spark to log to kafka queue instead of files .. Ashish

Dynamically Change Log Level Spark Streaming

2016-02-08 Thread Ashish Soni
Hi All , How do change the log level for the running spark streaming Job , Any help will be appriciated. Thanks,

Example of onEnvironmentUpdate Listener

2016-02-08 Thread Ashish Soni
Are there any examples as how to implement onEnvironmentUpdate method for customer listener Thanks,

Spark Submit

2016-02-12 Thread Ashish Soni
Hi All , How do i pass multiple configuration parameter while spark submit Please help i am trying as below spark-submit --conf "spark.executor.memory=512m spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.xml" Thanks,

Re: Spark Submit

2016-02-12 Thread Ashish Soni
; spark-submit --conf "spark.executor.memory=512m" --conf > "spark.executor.extraJavaOptions=x" --conf "Dlog4j.configuration=log4j.xml" > > Sent from Samsung Mobile. > > > Original message ---- > From: Ted Yu > Date:12/02/2016

Seperate Log4j.xml for Spark and Application JAR ( Application vs Spark )

2016-02-12 Thread Ashish Soni
Hi All , As per my best understanding we can have only one log4j for both spark and application as which ever comes first in the classpath takes precedence , Is there any way we can keep one in application and one in the spark conf folder .. is it possible ? Thanks

SPARK-9559

2016-02-18 Thread Ashish Soni
Hi All , Just wanted to know if there is any work around or resolution for below issue in Stand alone mode https://issues.apache.org/jira/browse/SPARK-9559 Ashish

Communication between two spark streaming Job

2016-02-19 Thread Ashish Soni
maintained in the second job so that it can take use of new metadata Please help Ashish

Spark 1.5 on Mesos

2016-02-26 Thread Ashish Soni
Hi All , Is there any proper documentation as how to run spark on mesos , I am trying from the last few days and not able to make it work. Please help Ashish

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
Yes i read that and not much details here. Is it true that we need to have spark installed on each mesos docker container ( master and slave ) ... Ashish On Fri, Feb 26, 2016 at 2:14 PM, Tim Chen wrote: > https://spark.apache.org/docs/latest/running-on-mesos.html should be the > best

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
) > and Mesos will automatically launch docker containers for you. > > Tim > > On Mon, Feb 29, 2016 at 7:36 AM, Ashish Soni > wrote: > >> Yes i read that and not much details here. >> >> Is it true that we need to have spark installed on each mesos docker >>

Re: Spark 1.5 on Mesos

2016-03-01 Thread Ashish Soni
Check your Mesos UI if you see Spark application in the > Frameworks tab > > On Mon, Feb 29, 2016 at 12:23 PM Ashish Soni > wrote: > >> What is the Best practice , I have everything running as docker container >> in single host ( mesos and marathon also as docker containe

Spark Submit using Convert to Marthon REST API

2016-03-01 Thread Ashish Soni
Hi All , Can some one please help me how do i translate below spark submit to marathon JSON request docker run -it --rm -e SPARK_MASTER="mesos://10.0.2.15:5050" -e SPARK_IMAGE="spark_driver:latest" spark_driver:latest /opt/spark/bin/spark-submit --name "PI Example" --class org.apache.spark.exam

Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
DEBUG or WARN Ashish

Re: Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
I am not running it using spark submit , i am running locally inside Eclipse IDE , how i set this using JAVA Code Ashish On Mon, Sep 28, 2015 at 10:42 AM, Adrian Tanase wrote: > You also need to provide it as parameter to spark submit > > http://stackoverflow.com/questions/288404

  1   2   >