Unsubscribe

2022-07-28 Thread Ashish
Unsubscribe Sent from my iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Problem of how to retrieve file from HDFS

2019-10-08 Thread Ashish Mittal
te().save("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv"); This code is successfully store csv file. but i don't know how to retrieve csv file from hdfs. Please help me. Thanks & Regards, Ashish Mittal

Re: Spark Streaming to REST API

2017-12-21 Thread ashish rawat
Sorry, for not making it explicit. We are using Spark Streaming as the streaming solution and I was wondering if it is a common pattern to do per tuple redis read/write and write to a REST API through Spark Streaming. Regards, Ashish On Fri, Dec 22, 2017 at 4:00 AM, Gourav Sengupta wrote: >

Spark Streaming to REST API

2017-12-21 Thread ashish rawat
into redis. Also, we need to write the final out to a system through REST API (the system doesn't provide any other mechanism to write). Is it a common pattern to read/write to db per tuple? Also, are there any connectors to write to REST endpoints. Regards, Ashish

Re: NLTK with Spark Streaming

2017-12-01 Thread ashish rawat
r Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Sun, Nov 26, 2017 at 8:19 AM, ashish rawat wrote: > Thanks Holden and Chetan. > > Holden - Have you tried it out, do you know the right way to do it? > Chetan - yes, if we use a Java NLP library, i

Re: NLTK with Spark Streaming

2017-11-26 Thread ashish rawat
e: > >> So it’s certainly doable (it’s not super easy mind you), but until the >> arrow udf release goes out it will be rather slow. >> >> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat wrote: >> >>> Hi, >>> >>> Has someone tried running N

NLTK with Spark Streaming

2017-11-25 Thread ashish rawat
nd data science flexibility. Regards, Ashish

Re: Spark based Data Warehouse

2017-11-17 Thread ashish rawat
Thanks everyone for their suggestions. Does any of you take care of auto scale up and down of your underlying spark clusters on AWS? On Nov 14, 2017 10:46 AM, "lucas.g...@gmail.com" wrote: Hi Ashish, bear in mind that EMR has some additional tooling available that smoothes out some S

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
d, or Livy? We run genie as a job server for the prod cluster, so users have to submit their queries through the genie. For better resource utilization, we rely on Yarn dynamic allocation to balance the load of multiple jobs/queries in Spark. Hope this helps. On Sat, Nov 11, 2017 at 11:21 PM as

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
one user fires a big query, then would that choke all other queries in the cluster? Regards, Ashish On Mon, Nov 13, 2017 at 3:10 AM, Patrick Alwell wrote: > Alcon, > > > > You can most certainly do this. I’ve done benchmarking with Spark SQL and > the TPCDS queries using S3

Re: Spark based Data Warehouse

2017-11-12 Thread ashish rawat
, I might be wrong but not all functionality of spark is spill to disk. So it still doesn't provide DB like reliability in execution. In case of DBs, queries get slow but they don't fail or go out of memory, specifically in concurrent user scenarios. Regards, Ashish On Nov 12, 20

Spark based Data Warehouse

2017-11-11 Thread ashish rawat
? Considering Spark still does not provide spill to disk, in many scenarios, are there frequent query failures when executing concurrent queries 4. Are there any open source implementations, which provide something similar? Regards, Ashish

Re: Azure Event Hub with Pyspark

2017-04-20 Thread Ashish Singh
Hi , You can try https://github.com/hdinsight/spark-eventhubs : which is eventhub receiver for spark streaming We are using it but you have scala version only i guess Thanks, Ashish Singh On Fri, Apr 21, 2017 at 9:19 AM, ayan guha wrote: > [image: Boxbe] <https://www.boxbe.com/overview

Document listing spark sql aggregate functions

2016-10-03 Thread Ashish Tadose
the docs and missed it. Thanks, Ashish

Spark 2.0 issue

2016-09-29 Thread Ashish Shrowty
created a JIRA too .. SPARK-17709 <https://issues.apache.org/jira/browse/SPARK-17709> Any help appreciated! Thanks, Ashish -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-issue-tp27818.html Sent from the Apache Spark User List mailing

Spark can't connect to secure phoenix

2016-09-16 Thread Ashish Gupta
Hi All, I am running a spark program on secured cluster which creates SqlContext for creating dataframe over phoenix table. When I run my program in local mode with --master option set to local[2] my program works completely fine, however when I try to run same program with master option set t

Returning DataFrame as Scala method return type

2016-09-08 Thread Ashish Tadose
F to driver will cause all data get passed to the driver code or it would be return just pointer to the DF? Thanks, Ashish

Logstash to collect Spark logs

2016-05-20 Thread Ashish Kumar Singh
We are trying to collect Spark logs using logstash for parsing app logs and collecting useful info. We can read the Nodemanager logs but unable to read Spark application logs using Logstash . Current Setup for Spark logs and Logstash 1- Spark runs on Yarn . 2- Using log4j socketAppenders to wr

Spark log collection via Logstash

2016-05-19 Thread Ashish Kumar Singh
-${user.dir} 4-Logstash input input { log4j { mode => "server" host => "0.0.0.0" port => 4560 type => "log4j" } } Any help on reading Spark logs via Logstash will be appreciated . Also, is there a better way to collect Spark logs via Logstash ? Thanks, Ashish

Re: Joining a RDD to a Dataframe

2016-05-08 Thread Ashish Dubey
Is there any reason you dont want to convert this - i dont think join b/w RDD and DF is supported. On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon wrote: > Hi, > > I have a RDD built during a spark streaming job and I'd like to join it to > a DataFrame (E/S input) to enrich it. > It seems that I

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-08 Thread Ashish Dubey
l.parquet.filterPushdown: true > spark.sql.parquet.mergeSchema: true > > Thanks, > J. > > On Sat, May 7, 2016 at 4:20 PM, Ashish Dubey wrote: > >> How big is your file and can you also share the code snippet >> >> >> On Saturday, May 7, 2016, Johnny W. wro

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
gt; On May 8, 2016 5:55 PM, "Ashish Dubey" wrote: > > Brandon, > > how much memory are you giving to your executors - did you check if there > were dead executors in your application logs.. Most likely you require > higher memory for executors.. > > Ashish > >

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
Brandon, how much memory are you giving to your executors - did you check if there were dead executors in your application logs.. Most likely you require higher memory for executors.. Ashish On Sun, May 8, 2016 at 1:01 PM, Brandon White wrote: > Hello all, > > I am running a Spark ap

Re: Parse Json in Spark

2016-05-08 Thread Ashish Dubey
This limit is due to underlying inputFormat implementation. you can always write your own inputFormat and then use spark newAPIHadoopFile api to pass your inputFormat class path. You will have to place the jar file in /lib location on all the nodes.. Ashish On Sun, May 8, 2016 at 4:02 PM

Re: How to verify if spark is using kryo serializer for shuffle

2016-05-07 Thread Ashish Dubey
your driver heap size and application structure ( num of stages and tasks ) Ashish On Saturday, May 7, 2016, Nirav Patel wrote: > Right but this logs from spark driver and spark driver seems to use Akka. > > ERROR [sparkDriver-akka.actor.default-dispatcher-17] > akka.actor.Act

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-07 Thread Ashish Dubey
How big is your file and can you also share the code snippet On Saturday, May 7, 2016, Johnny W. wrote: > hi spark-user, > > I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a > dataframe from a parquet data source with a single parquet file, it yields > a stage with lots of sma

Re: Spark for Log Analytics

2016-03-31 Thread ashish rawat
ache/Nginx/Mongo etc) to Kafka, what could be the ideal strategy? Regards, Ashish On Thu, Mar 31, 2016 at 5:16 PM, Chris Fregly wrote: > oh, and I forgot to mention Kafka Streams which has been heavily talked > about the last few days at Strata here in San Jose. > > Streams can

Spark for Log Analytics

2016-03-31 Thread ashish rawat
Kafka for the complex use cases, while logstash filters can be used for the simpler use cases. I was wondering if someone has already done this evaluation and could provide me some pointers on how/if to create this pipeline with Spark. Regards, Ashish

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
When you say driver running on mesos can you explain how are you doing that...?? > On Mar 10, 2016, at 4:44 PM, Eran Chinthaka Withana > wrote: > > Yanling I'm already running the driver on mesos (through docker). FYI, I'm > running this on cluster mode with MesosClusterDispatcher. > > Mac (c

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
Hi Tim , Can you please share your dockerfiles and configuration as it will help a lot , I am planing to publish a blog post on the same . Ashish On Thu, Mar 10, 2016 at 10:34 AM, Timothy Chen wrote: > No you don't need to install spark on each slave, we have been running > th

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
You need to install spark on each mesos slave and then while starting container make a workdir to your spark home so that it can find the spark class. Ashish > On Mar 10, 2016, at 5:22 AM, Guillaume Eynard Bontemps > wrote: > > For an answer to my question see t

Looking for Collaborator - Boston ( Spark Training )

2016-03-05 Thread Ashish Soni
Hi All, I am developing a detailed highly technical course on spark ( beyond word count ) and looking for a partner , let me know if anyone is interested. Ashish - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: Spark 1.5 on Mesos

2016-03-04 Thread Ashish Soni
gt; chroot. > > Can you try mounting in a volume from the host when you launch the slave > for your slave's workdir? > docker run -v /tmp/mesos/slave:/tmp/mesos/slave mesos_image mesos-slave > --work_dir=/tmp/mesos/slave .... > > Tim > > On Thu, Mar 3, 2016 at 4:

Re: Spark 1.5 on Mesos

2016-03-03 Thread Ashish Soni
hed by the cluster > dispatcher, that shows you the spark-submit command it eventually ran? > > > Tim > > > > On Wed, Mar 2, 2016 at 5:42 PM, Ashish Soni wrote: > >> See below and Attached the Dockerfile to build the spark image ( >> between i just upgraded

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
On Wed, Mar 2, 2016 at 5:49 PM, Charles Allen wrote: > @Tim yes, this is asking about 1.5 though > > On Wed, Mar 2, 2016 at 2:35 PM Tim Chen wrote: > >> Hi Charles, >> >> I thought that's fixed with your patch in latest master now right? >> >> A

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
I have no luck and i would to ask the question to spark committers will this be ever designed to run on mesos ? spark app as a docker container not working at all on mesos ,if any one would like the code i can send it over to have a look. Ashish On Wed, Mar 2, 2016 at 12:23 PM, Sathish Kumaran

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
what the problem is? > > Tim > > On Mar 1, 2016, at 8:05 AM, Ashish Soni wrote: > > Not sure what is the issue but i am getting below error when i try to run > spark PI example > > Blacklisting Mesos slave value: "5345asdasdasdkas234234asdasdasdasd&

Spark Submit using Convert to Marthon REST API

2016-03-01 Thread Ashish Soni
Hi All , Can some one please help me how do i translate below spark submit to marathon JSON request docker run -it --rm -e SPARK_MASTER="mesos://10.0.2.15:5050" -e SPARK_IMAGE="spark_driver:latest" spark_driver:latest /opt/spark/bin/spark-submit --name "PI Example" --class org.apache.spark.exam

Re: Spark 1.5 on Mesos

2016-03-01 Thread Ashish Soni
Check your Mesos UI if you see Spark application in the > Frameworks tab > > On Mon, Feb 29, 2016 at 12:23 PM Ashish Soni > wrote: > >> What is the Best practice , I have everything running as docker container >> in single host ( mesos and marathon also as docker containe

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
) > and Mesos will automatically launch docker containers for you. > > Tim > > On Mon, Feb 29, 2016 at 7:36 AM, Ashish Soni > wrote: > >> Yes i read that and not much details here. >> >> Is it true that we need to have spark installed on each mesos docker >>

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
Yes i read that and not much details here. Is it true that we need to have spark installed on each mesos docker container ( master and slave ) ... Ashish On Fri, Feb 26, 2016 at 2:14 PM, Tim Chen wrote: > https://spark.apache.org/docs/latest/running-on-mesos.html should be the > best

Spark 1.5 on Mesos

2016-02-26 Thread Ashish Soni
Hi All , Is there any proper documentation as how to run spark on mesos , I am trying from the last few days and not able to make it work. Please help Ashish

Communication between two spark streaming Job

2016-02-19 Thread Ashish Soni
maintained in the second job so that it can take use of new metadata Please help Ashish

SPARK-9559

2016-02-18 Thread Ashish Soni
Hi All , Just wanted to know if there is any work around or resolution for below issue in Stand alone mode https://issues.apache.org/jira/browse/SPARK-9559 Ashish

Seperate Log4j.xml for Spark and Application JAR ( Application vs Spark )

2016-02-12 Thread Ashish Soni
Hi All , As per my best understanding we can have only one log4j for both spark and application as which ever comes first in the classpath takes precedence , Is there any way we can keep one in application and one in the spark conf folder .. is it possible ? Thanks

Re: Spark Submit

2016-02-12 Thread Ashish Soni
; spark-submit --conf "spark.executor.memory=512m" --conf > "spark.executor.extraJavaOptions=x" --conf "Dlog4j.configuration=log4j.xml" > > Sent from Samsung Mobile. > > > Original message ---- > From: Ted Yu > Date:12/02/2016

Spark Submit

2016-02-12 Thread Ashish Soni
Hi All , How do i pass multiple configuration parameter while spark submit Please help i am trying as below spark-submit --conf "spark.executor.memory=512m spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.xml" Thanks,

Example of onEnvironmentUpdate Listener

2016-02-08 Thread Ashish Soni
Are there any examples as how to implement onEnvironmentUpdate method for customer listener Thanks,

Dynamically Change Log Level Spark Streaming

2016-02-08 Thread Ashish Soni
Hi All , How do change the log level for the running spark streaming Job , Any help will be appriciated. Thanks,

Redirect Spark Logs to Kafka

2016-02-01 Thread Ashish Soni
Hi All , Please let me know how we can redirect spark logging files or tell spark to log to kafka queue instead of files .. Ashish

Re: Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
, what is the correct approach. Ashish On Mon, Jan 25, 2016 at 11:38 AM, Gerard Maas wrote: > What are you trying to achieve? > > Looks like you want to provide offsets but you're not managing them > and I'm assuming you're using the direct stream approach. > &

Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
TopicAndPartition(driverArgs.inputTopic, 0), 0L); Thanks, Ashish

How to change the no of cores assigned for a Submitted Job

2016-01-12 Thread Ashish Soni
Hi , I have a strange behavior when i creating standalone spark container using docker Not sure why by default it is assigning 4 cores to the first Job it submit and then all the other jobs are in wait state , Please suggest if there is an setting to change this i tried --executor-cores 1 but it

Deployment and performance related queries for Spark and Cassandra

2015-12-21 Thread Ashish Gadkari
any performance related parameters in Spark, Cassandra, Solr which will reduce the job time Any help to increase the performance will be appreciated. Thanks -- Ashish Gadkari

Discover SparkUI port for spark streaming job running in cluster mode

2015-12-14 Thread Ashish Nigam
:50571 INFO util.Utils: Successfully started service 'SparkUI' on port 50571. INFO ui.SparkUI: Started SparkUI at http://xxx:50571 Is there any way to know about the UI port automatically using some API? Thanks Ashish

Re: Save GraphX to disk

2015-11-20 Thread Ashish Rawat
Hi Todd, Could you please provide an example of doing this. Mazerunner seems to be doing something similar with Neo4j but it goes via hdfs and updates only the graph properties. Is there a direct way to do this with Neo4j or Titan? Regards, Ashish From: SLiZn Liu mailto:sliznmail...@gmail.com

Re: Spark 1.5.1+Hadoop2.6 .. unable to write to S3 (HADOOP-12420)

2015-10-22 Thread Ashish Shrowty
Thanks Steve. I built it from source. On Thu, Oct 22, 2015 at 4:01 PM Steve Loughran wrote: > > > On 22 Oct 2015, at 15:12, Ashish Shrowty > wrote: > > > > I understand that there is some incompatibility with the API between > Hadoop > > 2.6/2.7 and Am

Spark 1.5.1+Hadoop2.6 .. unable to write to S3 (HADOOP-12420)

2015-10-22 Thread Ashish Shrowty
://issues.apache.org/jira/browse/HADOOP-12420) My question is - what are people doing today to access S3? I am unable to find an older JAR of the AWS SDK to test with. Thanks, Ashish -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-Hadoop2-6-unable

Re: question on make multiple external calls within each partition

2015-10-05 Thread Ashish Soni
Need more details but you might want to filter the data first ( create multiple RDD) and then process. > On Oct 5, 2015, at 8:35 PM, Chen Song wrote: > > We have a use case with the following design in Spark Streaming. > > Within each batch, > * data is read and partitioned by some key > * fo

Re: DStream Transformation to save JSON in Cassandra 2.1

2015-10-05 Thread Ashish Soni
try this You can use dstream.map to conver it to JavaDstream with only the data you are interested probably return an Pojo of your JSON and then call foreachRDD and inside that call below line javaFunctions(rdd).writerBuilder("table", "keyspace", mapToRow(Class.class)).saveToCassandra(); On Mo

Re: automatic start of streaming job on failure on YARN

2015-10-02 Thread Ashish Rangole
Are you running the job in yarn cluster mode? On Oct 1, 2015 6:30 AM, "Jeetendra Gangele" wrote: > We've a streaming application running on yarn and we would like to ensure > that is up running 24/7. > > Is there a way to tell yarn to automatically restart a specific > application on failure? > >

Re: Spark Streaming Log4j Inside Eclipse

2015-09-29 Thread Ashish Soni
I am using Java Streaming context and it doesnt have method setLogLevel and also i have tried by passing VM argument in eclipse and it doesnt work JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2)); Ashish On Tue, Sep 29, 2015 at 7:23 AM, Adrian Tanase wrote

Re: Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
I am not running it using spark submit , i am running locally inside Eclipse IDE , how i set this using JAVA Code Ashish On Mon, Sep 28, 2015 at 10:42 AM, Adrian Tanase wrote: > You also need to provide it as parameter to spark submit > > http://stackoverflow.com/questions/288404

Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
DEBUG or WARN Ashish

Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Ashish Soni
Hi All , Just wanted to find out if there is an benefits to installing kafka brokers and spark nodes on the same machine ? is it possible that spark can pull data from kafka if it is local to the node i.e. the broker or partition is on the same machine. Thanks, Ashish

Spark Cassandra Filtering

2015-09-16 Thread Ashish Soni
Hi , How can i pass an dynamic value inside below function to filter instead of hardcoded if have an existing RDD and i would like to use data in that for filter so instead of doing .where("name=?","Anna") i want to do .where("name=?",someobject.value) Please help JavaRDD rdd3 = javaFunctions(sc

Dynamic Workflow Execution using Spark

2015-09-15 Thread Ashish Soni
Hi All , Are there any framework which can be used to execute workflows with in spark or Is it possible to use ML Pipeline for workflow execution but not doing ML . Thanks, Ashish

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ashish Shenoy
Yup thanks Ted. My getPartition() method had a bug where a signed int was being moduloed with the number of partitions. Fixed that. Thanks, Ashish On Thu, Sep 10, 2015 at 10:44 AM, Ted Yu wrote: > Here is snippet of ExternalSorter.scala where ArrayIndexOutOfBoundsException > was

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ashish Shenoy
} else { return 1; } } } ... Thanks, Ashish On Wed, Sep 9, 2015 at 5:13 PM, Ted Yu wrote: > Which release of Spark are you using ? > > Can you show skeleton of your partitioner and comparator ? > > Thanks > > > > On Sep 9, 2015, at 4:45 PM, Ashish Shenoy &g

ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-09 Thread Ashish Shenoy
;s code and not my application code. Can you pls point out what I am doing wrong ? Thanks, Ashish

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-09 Thread Ashish Dutt
Dear Sasha, What I did was that I installed the parcels on all the nodes of the cluster. Typically the location was /opt/cloudera/parcels/CDH5.4.2-1.cdh5.4.2.p0.2 Hope this helps you. With regards, Ashish On Tue, Sep 8, 2015 at 10:18 PM, Sasha Kacanski wrote: > Hi Ashish, > Thanks f

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-07 Thread Ashish Dutt
m snippets on each worker, it works too. I am not sure if this will help or not for your use-case. Sincerely, Ashish On Mon, Sep 7, 2015 at 11:04 PM, Sasha Kacanski wrote: > Thanks Ashish, > nice blog but does not cover my issue. Actually I have pycharm running and > loading pyspar

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-06 Thread Ashish Dutt
flow <http://stackoverflow.com/search?q=no+module+named+pyspark> website Sincerely, Ashish Dutt On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski wrote: > Hi, > I am successfully running python app via pyCharm in local mode > setMaster("local[*]") > > When I turn on Spa

Re: FlatMap Explanation

2015-09-03 Thread Ashish Soni
Thanks a lot everyone. Very Helpful. Ashish On Thu, Sep 3, 2015 at 2:19 AM, Zalzberg, Idan (Agoda) < idan.zalzb...@agoda.com> wrote: > Hi, > > Yes, I can explain > > > > 1 to 3 -> 1,2,3 > > 2 to 3- > 2,3 > > 3 to 3 -> 3 > > 3 to 3 -> 3 &

FlatMap Explanation

2015-09-02 Thread Ashish Soni
Hi , Can some one please explain the output of the flat map data in RDD as below {1, 2, 3, 3} rdd.flatMap(x => x.to(3)) output as below {1, 2, 3, 2, 3, 3, 3} i am not able to understand how the output came as above. Thanks,

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ashish Shrowty
Yes .. I am closing the stream. Not sure what you meant by "bq. and then create rdd"? -Ashish On Mon, Aug 31, 2015 at 1:02 PM Ted Yu wrote: > I am not familiar with your code. > > bq. and then create the rdd > > I assume you call ObjectOutputStream.close() prior to t

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ashish Shrowty
PM Ted Yu wrote: > Ashish: > Can you post the complete stack trace for NotSerializableException ? > > Cheers > > On Mon, Aug 31, 2015 at 8:49 AM, Ashish Shrowty > wrote: > >> bcItemsIdx is just a broadcast variable constructed out of >> Array[(String)] .. it

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ashish Shrowty
re serializing a stream somewhere. I'd look at what's inside > bcItemsIdx as that is not shown here. > > On Mon, Aug 31, 2015 at 3:34 PM, Ashish Shrowty > wrote: > > Sean, > > > > Thanks for your comments. What I was really trying to do was to > transform a

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
Do you think I should create a JIRA? On Sun, Aug 30, 2015 at 12:56 PM Ted Yu wrote: > I got StackOverFlowError as well :-( > > On Sun, Aug 30, 2015 at 9:47 AM, Ashish Shrowty > wrote: > >> Yep .. I tried that too earlier. Doesn't make a difference. Are you able &g

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
guide.html#broadcast-variables > > Cheers > > On Sun, Aug 30, 2015 at 8:54 AM, Ashish Shrowty > wrote: > >> @Sean - Agree that there is no action, but I still get the >> stackoverflowerror, its very weird >> >> @Ted - Variable a is just an int - val a = 10 .

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ashish Rangole
I suggest taking a heap dump of driver process using jmap. Then open that dump in a tool like Visual VM to see which object(s) are taking up heap space. It is easy to do. We did this and found out that in our case it was the data structure that stores info about stages, jobs and tasks. There can be

Re: Worker Machine running out of disk for Long running Streaming process

2015-08-22 Thread Ashish Rangole
Interesting. TD, can you please throw some light on why this is and point to the relevant code in Spark repo. It will help in a better understanding of things that can affect a long running streaming job. On Aug 21, 2015 1:44 PM, "Tathagata Das" wrote: > Could you periodically (say every 10 mins

Java Streaming Context - File Stream use

2015-08-10 Thread Ashish Soni
Please help as not sure what is incorrect with below code as it gives me complilaton error in eclipse SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("JavaDirectKafkaWordCount"); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Duratio

How to connect to remote HDFS programmatically to retrieve data, analyse it and then write the data back to HDFS?

2015-08-05 Thread Ashish Dutt
y.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent

PySpark in Pycharm- unable to connect to remote server

2015-08-05 Thread Ashish Dutt
:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call l

spark.files.userClassPathFirst=true Return Error - Please help

2015-07-22 Thread Ashish Soni
Hi All , I am getting below error when i use the --conf spark.files.userClassPathFirst=true parameter Job aborted due to stage failure: Task 3 in stage 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 32, 10.200.37.161): java.lang.ClassCastException: cannot assign instance

Class Loading Issue - Spark Assembly and Application Provided

2015-07-21 Thread Ashish Soni
Hi All , I am having a class loading issue as Spark Assembly is using google guice internally and one of Jar i am using uses sisu-guice-3.1.0-no_aop.jar , How do i load my class first so that it doesn't result in error and tell spark to load its assembly later on Ashish

XML Parsing

2015-07-19 Thread Ashish Soni
Hi All , I have an XML file with same tag repeated multiple times as below , Please suggest what would be best way to process this data inside spark as ... How can i extract each open and closing tag and process them or how can i combine multiple line into single line ... .. .. Thanks,

BroadCast on Interval ( eg every 10 min )

2015-07-16 Thread Ashish Soni
Hi All , How can i broadcast a data change to all the executor ever other 10 min or 1 min Ashish

Re: Is it possible to change the default port number 7077 for spark?

2015-07-13 Thread Ashish Dutt
Hello Arun, Thank you for the descriptive response. And thank you for providing the sample file too. It certainly is a great help. Sincerely, Ashish On Mon, Jul 13, 2015 at 10:30 PM, Arun Verma wrote: > > PFA sample file > > On Mon, Jul 13, 2015 at 7:37 PM, Arun Verma >

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Ashish Mukherjee
MySQL and PgSQL scale to millions. Spark or any distributed/clustered computing environment would be inefficient for the kind of data size you mention. That's because of coordination of processes, moving data around etc. On Mon, Jul 13, 2015 at 5:34 PM, Sandeep Giri wrote: > Even for 2L records

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-07-13 Thread Ashish Dutt
n Windows environment? What I mean is how to setup .libPaths()? where is it in windows environment Thanks for your help Sincerely, Ashish Dutt On Mon, Jul 13, 2015 at 3:48 PM, Sun, Rui wrote: > Hi, Kachau, > > If you are using SparkR with RStudio, have you followed the guideli

Re: Connecting to nodes on cluster

2015-07-09 Thread Ashish Dutt
Hello Akhil, Thanks for the response. I will have to figure this out. Sincerely, Ashish On Thu, Jul 9, 2015 at 3:40 PM, Akhil Das wrote: > On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt > wrote: > >> Hi, >> >> We have a cluster with 4 nodes. The cluster uses CDH 5.4

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
written something wrong here. Cannot seem to figure out, what is it? Thank you for your help Sincerely, Ashish Dutt On Thu, Jul 9, 2015 at 11:53 AM, Sujit Pal wrote: > Hi Ashish, > > >> Nice post. > Agreed, kudos to the author of the post, Benjamin Benfort of District Labs. >

DLL load failed: %1 is not a valid win32 application on invoking pyspark

2015-07-08 Thread Ashish Dutt
your help. Sincerely, Ashish Dutt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
N", MAVEN_HOME="D:\MAVEN\BIN", PYTHON_HOME="C:\PYTHON27\", SBT_HOME="C:\SBT\" Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Thu, Jul 9, 2015 at 4:56 AM, Sujit Pal wrote: >

Re: Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
The error is JVM has not responded after 10 seconds. On 08-Jul-2015 10:54 PM, "ayan guha" wrote: > What's the error you are getting? > On 9 Jul 2015 00:01, "Ashish Dutt" wrote: > >> Hi, >> >> We have a cluster with 4 nodes. The cluster uses CD

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
Hello Sooraj, Thank you for your response. It indeed give me a ray of hope now. Can you please suggest any good tutorials for installing and working with ipython notebook server on the node. Thank you Ashish On 08-Jul-2015 6:16 PM, "sooraj" wrote: > > Hi Ashish, > > I am ru

Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
connect to the nodes I am usnig SSH Question: Would it be better if I work directly on the nodes rather than trying to connect my laptop to them ? Question 2: If yes, then can you suggest any python and R IDE that I can install on the nodes to make it work? Thanks for your help Sincerely, Ashish

Re: Getting started with spark-scala developemnt in eclipse.

2015-07-08 Thread Ashish Dutt
Hello Prateek, I started with getting the pre built binaries so as to skip the hassle of building them from scratch. I am not familiar with scala so can't comment on it. I have documented my experiences on my blog www.edumine.wordpress.com Perhaps it might be useful to you. On 08-Jul-2015 9:39 PM,

  1   2   >