Try
("hdfs:///localhost:8020/user/data/*")
With 3 "/".
Thx
tri
-----Original Message-----
From: Benjamin Cuthbert [mailto:[email protected]]
Sent: Monday, December 01, 2014 4:41 PM
To: [email protected]
Subject: hdfs streaming context
All,
Is it possible to stream on HDFS directory and listen for multiple files?
I have tried the following
val sparkConf = new SparkConf().setAppName("HdfsWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(2)) val lines =
ssc.textFileStream("hdfs://localhost:8020/user/data/*")
lines.filter(line => line.contains("GE"))
lines.print()
ssc.start()
But I get
14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time
1417469742000 ms
java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does not
exist.
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456)
at
org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
at
org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional
commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]