Re: How to get and parse whole xml file in HDFS by Spark Streaming

2015-06-22 Thread Yong Feng
Thanks Akhil I will have a try and then go back to you Yong On Mon, Jun 22, 2015 at 8:25 AM, Akhil Das wrote: > Like this? > > val rawXmls = ssc.fileStream(path, classOf[XmlInputFormat], > classOf[LongWritable], > classOf[Text]) > > > Thanks > Best Regards > > On Mon, Jun 22, 2015 at 5:4

Re: How to get and parse whole xml file in HDFS by Spark Streaming

2015-06-22 Thread Akhil Das
Like this? val rawXmls = ssc.fileStream(path, classOf[XmlInputFormat], classOf[LongWritable], classOf[Text]) Thanks Best Regards On Mon, Jun 22, 2015 at 5:45 PM, Yong Feng wrote: > Thanks a lot, Akhil > > I saw this mail thread before, but still do not understand how to use > XmlInputFo

Re: How to get and parse whole xml file in HDFS by Spark Streaming

2015-06-22 Thread Yong Feng
Thanks a lot, Akhil I saw this mail thread before, but still do not understand how to use XmlInputFormatof mahout in Spark Streaming (I am not Spark Streaming Expert yet ;-)). Can you show me some sample code for explanation. Thanks in advance, Yong On Mon, Jun 22, 2015 at 6:44 AM, Akhil Das w

Re: How to get and parse whole xml file in HDFS by Spark Streaming

2015-06-22 Thread Akhil Das
You can use fileStream for that, look at the XMLInputFormat of mahout. It should give you full XML object as on record, (as opposed to an XM

Fwd: How to get and parse whole xml file in HDFS by Spark Streaming

2015-06-21 Thread Yong Feng
Hi Spark Experts I have a customer who wants to monitor coming data files (with xml format), and then analysize them after that put analysized data into DB. The size of each file is about 30MB (or even less in future). Spark streaming seems promising. After learning Spark Streaming and also googl