How to do dispatching in Streaming?

2015-04-12 Thread Jianshi Huang
Hi, I have a Kafka topic that contains dozens of different types of messages. And for each one I'll need to create a DStream for it. Currently I have to filter the Kafka stream over and over, which is very inefficient. So what's the best way to do dispatching in Spark Streaming? (one DStream ->

counters in spark

2015-04-12 Thread Grandl Robert
Hi guys, I was trying to figure out some counters in Spark, related to the amount of CPU or Memory used (in some metric), used by a task/stage/job, but I could not find any.  Is there any such counter available ? Thank you,Robert

Re: regarding ZipWithIndex

2015-04-12 Thread Ted Yu
Please also take a look at ZippedWithIndexRDDPartition which is 72 lines long. You can create your own version which extends RDD[(Long, T)] Cheers On Sun, Apr 12, 2015 at 1:29 PM, Ted Yu wrote: > bq. will return something like JavaPairRDD > > The long component of the pair fits your descriptio

Re: regarding ZipWithIndex

2015-04-12 Thread Ted Yu
bq. will return something like JavaPairRDD The long component of the pair fits your description of index. What other requirement does ZipWithIndex not provide you ? Cheers On Sun, Apr 12, 2015 at 1:16 PM, Jeetendra Gangele wrote: > Hi All I have an RDD JavaRDD and I want to convert it to > Jav

regarding ZipWithIndex

2015-04-12 Thread Jeetendra Gangele
Hi All I have an RDD JavaRDD and I want to convert it to JavaPairRDD.. Index should be unique and it should maintain the order. For first object It should have 1 and then for second 2 like that. I tried using ZipWithIndex but it will return something like JavaPairRDD I wanted to use this RDD for l

Re: function to convert to pair

2015-04-12 Thread Jeetendra Gangele
I have to create some kind of index from my JavaRDD it should be something like javaPairRDD but zipWith Index giving later I need to use this RDD for join so its looks it wont work for me. On 9 April 2015 at 04:17, Ted Yu wrote: > Please take a look at zipWithIndex() of RDD. > > Cheers > > On

RE: How to use Joda Time with Spark SQL?

2015-04-12 Thread Wang, Daoyuan
Actually, I did a little investigation on joda time when I was working on SPARK-4987 for Timestamp ser-de in parquet format. I think Joda offers interface to get java object from joda time object natively. For example, to transform a java.util.Date (parent of java.sql.Date and java.sql.Timestam

Re: How to use Joda Time with Spark SQL?

2015-04-12 Thread Cheng Lian
These common UDTs can always be wrapped in libraries and published to spark-packages http://spark-packages.org/ :-) Cheng On 4/12/15 3:00 PM, Justin Yip wrote: Cheng, this is great info. I have a follow up question. There are a few very common data types (i.e. Joda DateTime) that is not direct

Re: Spark TeraSort source request

2015-04-12 Thread Ewan Higgs
Hi all. The code is linked from my repo: https://github.com/ehiggs/spark-terasort " This is an example Spark program for running TeraSort benchmarks. It is based on work from Reynold Xin's branch , but it is not the same TeraSort program that curren

Re: How to use Joda Time with Spark SQL?

2015-04-12 Thread Justin Yip
Cheng, this is great info. I have a follow up question. There are a few very common data types (i.e. Joda DateTime) that is not directly supported by SparkSQL. Do you know if there are any plans for accommodating some common data types in SparkSQL? They don't need to be a first class datatype, but