from:"Prannoy"

Re: StreamingContext.textFileStream issue

2015-04-24 Thread Prannoy

Try putting files with different file name and see if the stream is able to detect them. On 25-Apr-2015 3:02 am, "Yang Lei [via Apache Spark User List]" < ml-node+s1001560n22650...@n3.nabble.com> wrote: > I hit the same issue "as if the directory has no files at all" when > running the sample "exa

Re: Spark RDD Lifecycle: whether RDD will be reclaimed out of scope

2015-04-23 Thread Prannoy

Hi, Yes, Spark automatically removes old RDDs from the cache when you make new ones. Unpersist forces it to remove them right away. On Thu, Apr 23, 2015 at 9:28 AM, Jeffery [via Apache Spark User List] < ml-node+s1001560n22618...@n3.nabble.com> wrote: > Hi, Dear Spark Users/Devs: > > In a method

Re: Spark REPL no progress when run in cluster

2015-04-21 Thread Prannoy

Hi, This is because your Logger setting is set to OFF. Just add the following lines into your code, probably this should resolve the issue. IMPORTS that are needed. import org.apache.log4j.Logger import org.apache.log4j.Level ADD the two lines to your code. Logger.getLogger("org").setLevel(Lev

Re: MEMORY_ONLY vs MEMORY_AND_DISK

2015-03-18 Thread Prannoy

It depends. If the data size on which the calculation is to be done is very large than caching it with MEMORY_AND_DISK is useful. Even in this case MEMORY_AND_DISK is useful if the computation on the RDD is expensive. If the compution is very small than even for large data sets MEMORY_ONLY can be u

Re: Unable to read files In Yarn Mode of Spark Streaming ?

2015-03-12 Thread Prannoy

gt; from spark streaming application point of view we need to set any > properties ,please help me > > > Thanks Prannoy.. > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://ap

Re: Unable to read files In Yarn Mode of Spark Streaming ?

2015-03-12 Thread Prannoy

Streaming takes only new files into consideration. Add the file after starting the job. On Thu, Mar 12, 2015 at 2:26 PM, CH.KMVPRASAD [via Apache Spark User List] < ml-node+s1001560n2201...@n3.nabble.com> wrote: > yes ! > for testing purpose i defined single file in the specified directory >

Re: Unable to read files In Yarn Mode of Spark Streaming ?

2015-03-12 Thread Prannoy

Are the files already present in HDFS before you are starting your application ? On Thu, Mar 12, 2015 at 11:11 AM, CH.KMVPRASAD [via Apache Spark User List] wrote: > Hi am successfully executed sparkPi example on yarn mode but i cant able > to read files from hdfs in my streaming application usi

Re: Spark streaming - tracking/deleting processed files

2015-02-03 Thread Prannoy

Hi, To keep processing the older file also you can use fileStream instead of textFileStream. It has a parameter to specify to look for already present files. For deleting the processed files one way is to get the list of all files in the dStream. This can be done by using the foreachRDD api of th

Re: save spark streaming output to single file on hdfs

2015-01-15 Thread Prannoy

Hi, You can use FileUtil.copyMerge API and specify the path to the folder where saveAsTextFile is save the part text file. Suppose your directory is /a/b/c/ use FileUtil.copyMerge(FileSystem of source, a/b/c, FileSystem of destination, Path to the merged file say (a/b/c.txt), true(to delete the

Re: Inserting an element in RDD[String]

2015-01-15 Thread Prannoy

Hi, You can take the schema line in another rdd and than do a union of the two rdd . List schemaList = new ArrayList; schemaList.add("xyz"); // where xyz is your schema line JavaRDD schemaRDD = sc.parallize(schemaList) ; //where sc is your sparkcontext JavaRDD newRDD = schemaRDD.union(yourRD

Re: saveAsTextFile

2015-01-15 Thread Prannoy

Hi, Before saving the rdd do a collect to the rdd and print the content of the rdd. Probably its a null value. Thanks. On Sat, Jan 3, 2015 at 5:37 PM, Pankaj Narang [via Apache Spark User List] < ml-node+s1001560n20953...@n3.nabble.com> wrote: > If you can paste the code here I can certainly he

Re: Failed to save RDD as text file to local file system

2015-01-13 Thread Prannoy

e cloudera manager itself. Thanks. On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List] < ml-node+s1001560n21105...@n3.nabble.com> wrote: > Prannoy > > > > I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return > without error. But

Re: java.io.IOException: Mkdirs failed to create file:/some/path/myapp.csv while using rdd.saveAsTextFile(fileAddress) Spark

2015-01-13 Thread Prannoy

What path you are giving in the saveAsTextFile ?? Can you show the whole line . On Tue, Jan 13, 2015 at 11:42 AM, shekhar [via Apache Spark User List] < ml-node+s1001560n21112...@n3.nabble.com> wrote: > I still i having this issue with rdd.saveAsTextFile() method. > > > thanks, > Shekhar reddy >

Re: Failed to save RDD as text file to local file system

2015-01-12 Thread Prannoy

Have you tried simple giving the path where you want to save the file ? For instance in your case just do *r.saveAsTextFile("home/cloudera/tmp/out1") * Dont use* file* This will create a folder with name out1. saveAsTextFile always write by making a directory, it does not write data into a sing

Re: How to set UI port #?

2015-01-12 Thread Prannoy

Set the port using spconf.set("spark.ui.port",""); where, is any port spconf is your spark configuration object. On Sun, Jan 11, 2015 at 2:08 PM, YaoPau [via Apache Spark User List] < ml-node+s1001560n21083...@n3.nabble.com> wrote: > I have multiple Spark Streaming jobs running all da

Re: [question]Where can I get the log file

2014-12-04 Thread Prannoy

Hi, You can access your logs in your /spark_home_directory/logs/ directory . cat the file names and you will get the logs. Thanks. On Thu, Dec 4, 2014 at 2:27 PM, FFeng [via Apache Spark User List] < ml-node+s1001560n20344...@n3.nabble.com> wrote: > I have wrote data to spark log. > I get it t

Re: object xxx is not a member of package com

2014-12-03 Thread Prannoy

Hi, Add the jars in the external library of you related project. Right click on package or class -> Build Path -> Configure Build Path -> Java Build Path -> Select the Libraries tab -> Add external library -> Browse to com.xxx.yyy.zzz._ -> ok Clean and build your project, most probably you will b

Re: How can I read an avro file in HDFS in Java?

2014-12-03 Thread Prannoy

Hi, Try using sc.newAPIHadoopFile("", AvroSequenceFileInputFormat.class, AvroKey.class, AvroValue.class, your Configuration) You will get the Avro related classes by importing org.apache.avro.* Thanks. On Tue, Dec 2, 2014 at 9:23 PM, leaviva [via Apache Spark User List] < ml-node+s10015

Re: How to use FlumeInputDStream in spark cluster?

2014-11-28 Thread Prannoy

Hi, BindException comes when two processes are using the same port. In your spark configuration just set ("spark.ui.port","x"), to some other port. x can be any number say 12345. BindException will not break your job in either case. Just to fix it change the port number. Thanks. On Fri,

Re: read both local path and HDFS path

2014-11-27 Thread Prannoy

Hi, The configuration you provide is just to access the HDFS when you give an HDFS path. When you provide a HDFS path with the HDFS nameservice, like in your case hmaster155:9000 it goes inside the HDFS to look for the file. For accessing local file just give the local path of the file. Go to the

Re: Execute Spark programs from local machine on Yarn-hadoop cluster

2014-11-21 Thread Prannoy

Hi naveen, I dont think this is possible. If you are setting the master with your cluster details you cannot execute any job from your local machine. You have to execute the jobs inside your yarn machine so that sparkconf is able to connect with all the provided details. If this is not the case s

Re: Parsing a large XML file using Spark

2014-11-21 Thread Prannoy

Hi, Parallel processing of xml files may be an issue due to the tags in the xml file. The xml file has to be intact as while parsing it matches the start and end entity and if its distributed in parts to workers possibly it may or may not find start and end tags within the same worker which will g

Re: Slow performance in spark streaming

2014-11-21 Thread Prannoy

Hi, Spark runs in local with a speed less than in cluster. Cluster machines usually have a high configuration and also the tasks are distrubuted in workers in order to get a faster result. So you will always find a difference in speed when running in local and when running in cluster. Try running

Re: Cores on Master

2014-11-21 Thread Prannoy

Hi, You can also set the cores in the spark application itself . http://spark.apache.org/docs/1.0.1/spark-standalone.html On Wed, Nov 19, 2014 at 6:11 AM, Pat Ferrel-2 [via Apache Spark User List] < ml-node+s1001560n19238...@n3.nabble.com> wrote: > OK hacking the start-slave.sh did it > > On No

Re: Persist streams to text files

2014-11-21 Thread Prannoy

Hi , You can use FileUtil.copemerge API and specify the path to the folder where saveAsTextFile is save the part text file. Suppose your directory is /a/b/c/ use FileUtil.copeMerge(FileSystem of source, a/b/c, FileSystem of destination, Path to the merged file say (a/b/c.txt), true(to delete the

Re: Spark Streaming Application Got killed after 2 hours

2014-11-16 Thread Prannoy

Hi Saj, What is the size of the input data that you are putting on the stream ? Have you tried running the same application with different set of data ? Its weird that exactly after 2 hours the streaming stops. Try running the same application with different data of different size to look if it ha

Re: saveAsTextFile error

2014-11-15 Thread Prannoy

Hi Niko, Have you tried it running keeping the wordCounts.print() ?? Possibly the import to the package *org.apache.spark.streaming._* is not there so during sbt package it is unable to locate the saveAsTextFile API. Go to https://github.com/apache/spark/blob/master/examples/src/main/scala/org/

Re: StreamingContext.textFileStream issue

Re: Spark RDD Lifecycle: whether RDD will be reclaimed out of scope

Re: Spark REPL no progress when run in cluster

Re: MEMORY_ONLY vs MEMORY_AND_DISK

Re: Unable to read files In Yarn Mode of Spark Streaming ?

Re: Unable to read files In Yarn Mode of Spark Streaming ?

Re: Unable to read files In Yarn Mode of Spark Streaming ?

Re: Spark streaming - tracking/deleting processed files

Re: save spark streaming output to single file on hdfs

Re: Inserting an element in RDD[String]

Re: saveAsTextFile

Re: Failed to save RDD as text file to local file system

Re: java.io.IOException: Mkdirs failed to create file:/some/path/myapp.csv while using rdd.saveAsTextFile(fileAddress) Spark

Re: Failed to save RDD as text file to local file system

Re: How to set UI port #?

Re: [question]Where can I get the log file

Re: object xxx is not a member of package com

Re: How can I read an avro file in HDFS in Java?

Re: How to use FlumeInputDStream in spark cluster?

Re: read both local path and HDFS path

Re: Execute Spark programs from local machine on Yarn-hadoop cluster

Re: Parsing a large XML file using Spark

Re: Slow performance in spark streaming

Re: Cores on Master

Re: Persist streams to text files

Re: Spark Streaming Application Got killed after 2 hours

Re: saveAsTextFile error

27 matches

Site Navigation

Mail list logo

Footer information