Hi Tathagata, I am very new to Spark streaming and I have never used the pipe() function yet.
I have written a Spark streaming program (JAVA API) which is receiving data from Kafka and simply printing now. *JavaStreamingContext ssc = new JavaStreamingContext(args[0], "SparkStreamExample", new Duration(1000), System.getenv("SPARK_HOME"), JavaStreamingContext.jarOfClass(SparkStreamExample.class)); JavaPairDStream<String, String> messages = KafkaUtils.createStream(ssc, args[1], args[2], topicMap); messages.print(); ssc.start(); ssc.awaitTermination();* There is another simple Spark program (Python API) which does some data cleaning and saves it to HDFS. *if __name__ == "__main__": if len(sys.argv) < 2: print >> sys.stderr, "Usage: <master> <file>" exit(-1) # Instead of reading from HDFS, I want this program to read from the Java-Spark streaming process sc = SparkContext(sys.argv[1], "TextCleanUp") lines = sc.textFile(sys.argv[2]) cleanText = lines.map(cleanFunction).filter(lambda x: len(x) > 0) cleanText.saveAsTextFile("hdfs://<IP>/user/root/cleanout")* What I want is that the Python Spark program should read the data from the std output of the Java Spark streaming program. I have somehow understood that I need to use pipe() for this. But I am unable to understand how to use it. Can you please provide me with an example of how to use Spark's *pipe()* function for the above context? Thanks in advance. Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-PySpark-for-Streaming-tp1882p6391.html Sent from the Apache Spark User List mailing list archive at Nabble.com.