Try putting files with different file name and see if the stream is able to
detect them.
On 25-Apr-2015 3:02 am, "Yang Lei [via Apache Spark User List]" <
ml-node+s1001560n22650...@n3.nabble.com> wrote:
> I hit the same issue "as if the directory has no files at all" when
> running the sample "exa
Hi,
Yes, Spark automatically removes old RDDs from the cache when you make new
ones. Unpersist forces it to remove them right away.
On Thu, Apr 23, 2015 at 9:28 AM, Jeffery [via Apache Spark User List] <
ml-node+s1001560n22618...@n3.nabble.com> wrote:
> Hi, Dear Spark Users/Devs:
>
> In a method
Hi,
This is because your Logger setting is set to OFF. Just add the following
lines into your code, probably this should resolve the issue.
IMPORTS that are needed.
import org.apache.log4j.Logger
import org.apache.log4j.Level
ADD the two lines to your code.
Logger.getLogger("org").setLevel(Lev
It depends. If the data size on which the calculation is to be done is very
large than caching it with MEMORY_AND_DISK is useful. Even in this
case MEMORY_AND_DISK
is useful if the computation on the RDD is expensive. If the compution is
very small than even for large data sets MEMORY_ONLY can be u
gt; from spark streaming application point of view we need to set any
> properties ,please help me
>
>
> Thanks Prannoy..
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://ap
Streaming takes only new files into consideration. Add the file after
starting the job.
On Thu, Mar 12, 2015 at 2:26 PM, CH.KMVPRASAD [via Apache Spark User List] <
ml-node+s1001560n2201...@n3.nabble.com> wrote:
> yes !
> for testing purpose i defined single file in the specified directory
>
Are the files already present in HDFS before you are starting your
application ?
On Thu, Mar 12, 2015 at 11:11 AM, CH.KMVPRASAD [via Apache Spark User List]
wrote:
> Hi am successfully executed sparkPi example on yarn mode but i cant able
> to read files from hdfs in my streaming application usi
Hi,
To keep processing the older file also you can use fileStream instead of
textFileStream. It has a parameter to specify to look for already present
files.
For deleting the processed files one way is to get the list of all files in
the dStream. This can be done by using the foreachRDD api of th
Hi,
You can use FileUtil.copyMerge API and specify the path to the folder where
saveAsTextFile is save the part text file.
Suppose your directory is /a/b/c/
use FileUtil.copyMerge(FileSystem of source, a/b/c, FileSystem of
destination, Path to the merged file say (a/b/c.txt), true(to delete the
Hi,
You can take the schema line in another rdd and than do a union of the two
rdd .
List schemaList = new ArrayList;
schemaList.add("xyz");
// where xyz is your schema line
JavaRDD schemaRDD = sc.parallize(schemaList) ;
//where sc is your sparkcontext
JavaRDD newRDD = schemaRDD.union(yourRD
Hi,
Before saving the rdd do a collect to the rdd and print the content of the
rdd. Probably its a null value.
Thanks.
On Sat, Jan 3, 2015 at 5:37 PM, Pankaj Narang [via Apache Spark User List] <
ml-node+s1001560n20953...@n3.nabble.com> wrote:
> If you can paste the code here I can certainly he
e cloudera manager itself.
Thanks.
On Mon, Jan 12, 2015 at 9:51 PM, NingjunWang [via Apache Spark User List] <
ml-node+s1001560n21105...@n3.nabble.com> wrote:
> Prannoy
>
>
>
> I tried this r.saveAsTextFile("home/cloudera/tmp/out1"), it return
> without error. But
What path you are giving in the saveAsTextFile ?? Can you show the whole
line .
On Tue, Jan 13, 2015 at 11:42 AM, shekhar [via Apache Spark User List] <
ml-node+s1001560n21112...@n3.nabble.com> wrote:
> I still i having this issue with rdd.saveAsTextFile() method.
>
>
> thanks,
> Shekhar reddy
>
Have you tried simple giving the path where you want to save the file ?
For instance in your case just do
*r.saveAsTextFile("home/cloudera/tmp/out1") *
Dont use* file*
This will create a folder with name out1. saveAsTextFile always write by
making a directory, it does not write data into a sing
Set the port using
spconf.set("spark.ui.port","");
where, is any port
spconf is your spark configuration object.
On Sun, Jan 11, 2015 at 2:08 PM, YaoPau [via Apache Spark User List] <
ml-node+s1001560n21083...@n3.nabble.com> wrote:
> I have multiple Spark Streaming jobs running all da
Hi,
You can access your logs in your /spark_home_directory/logs/ directory .
cat the file names and you will get the logs.
Thanks.
On Thu, Dec 4, 2014 at 2:27 PM, FFeng [via Apache Spark User List] <
ml-node+s1001560n20344...@n3.nabble.com> wrote:
> I have wrote data to spark log.
> I get it t
Hi,
Add the jars in the external library of you related project.
Right click on package or class -> Build Path -> Configure Build Path ->
Java Build Path -> Select the Libraries tab -> Add external library ->
Browse to com.xxx.yyy.zzz._ -> ok
Clean and build your project, most probably you will b
Hi,
Try using
sc.newAPIHadoopFile("",
AvroSequenceFileInputFormat.class, AvroKey.class, AvroValue.class,
your Configuration)
You will get the Avro related classes by importing org.apache.avro.*
Thanks.
On Tue, Dec 2, 2014 at 9:23 PM, leaviva [via Apache Spark User List] <
ml-node+s10015
Hi,
BindException comes when two processes are using the same port. In your
spark configuration just set ("spark.ui.port","x"),
to some other port. x can be any number say 12345. BindException will
not break your job in either case. Just to fix it change the port number.
Thanks.
On Fri,
Hi,
The configuration you provide is just to access the HDFS when you give an
HDFS path. When you provide a HDFS path with the HDFS nameservice, like in
your case hmaster155:9000 it goes inside the HDFS to look for the file. For
accessing local file just give the local path of the file. Go to the
Hi naveen,
I dont think this is possible. If you are setting the master with your
cluster details you cannot execute any job from your local machine. You
have to execute the jobs inside your yarn machine so that sparkconf is able
to connect with all the provided details.
If this is not the case s
Hi,
Parallel processing of xml files may be an issue due to the tags in the xml
file. The xml file has to be intact as while parsing it matches the start
and end entity and if its distributed in parts to workers possibly it may
or may not find start and end tags within the same worker which will g
Hi,
Spark runs in local with a speed less than in cluster. Cluster machines
usually have a high configuration and also the tasks are distrubuted in
workers in order to get a faster result. So you will always find a
difference in speed when running in local and when running in cluster. Try
running
Hi,
You can also set the cores in the spark application itself .
http://spark.apache.org/docs/1.0.1/spark-standalone.html
On Wed, Nov 19, 2014 at 6:11 AM, Pat Ferrel-2 [via Apache Spark User List] <
ml-node+s1001560n19238...@n3.nabble.com> wrote:
> OK hacking the start-slave.sh did it
>
> On No
Hi ,
You can use FileUtil.copemerge API and specify the path to the folder where
saveAsTextFile is save the part text file.
Suppose your directory is /a/b/c/
use FileUtil.copeMerge(FileSystem of source, a/b/c, FileSystem of
destination, Path to the merged file say (a/b/c.txt), true(to delete the
Hi Saj,
What is the size of the input data that you are putting on the stream ?
Have you tried running the same application with different set of data ?
Its weird that exactly after 2 hours the streaming stops. Try running the
same application with different data of different size to look if it ha
Hi Niko,
Have you tried it running keeping the wordCounts.print() ?? Possibly the
import to the package *org.apache.spark.streaming._* is not there so during
sbt package it is unable to locate the saveAsTextFile API.
Go to
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/
27 matches
Mail list logo