Hi, Mich, I created the Kafka DStream with following Java code:
sparkConf = new SparkConf().setAppName(this.getClass().getSimpleName() + ", topic: " + topics); jssc = new JavaStreamingContext(sparkConf, Durations.seconds(batchInterval )); HashSet<String> topicsSet = new HashSet<>(Arrays.asList(topics.split(","))); HashMap<String, String> kafkaParams = new HashMap<>(); kafkaParams.put("metadata.broker.list", brokers); dStream = KafkaUtils.createDirectStream(jssc, byte[].class, byte[].class, DefaultDecoder.class, DefaultDecoder.class, kafkaParams, topicsSet); Do you know if there a way to do sampling for a stream when creating it? Thanks, Samuel On Mon, May 16, 2016 at 12:54 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > Hi Samuel, > > How do you create your RDD based on Kakfa direct stream? > > Do you have your code snippet? > > HTH > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 15 May 2016 at 23:24, Samuel Zhou <zhou...@gmail.com> wrote: > >> Hi, >> >> I was trying to use filter to sampling a Kafka direct stream, and the >> filter function just take 1 messages from 10 by using hashcode % 10 == 0, >> but the number of events of input for each batch didn't shrink to 10% of >> original traffic. So I want to ask if there are any way to shrink the batch >> size by a sampling function to save the traffic from Kafka? >> >> Thanks! >> Samuel >> > >