Re: Kafka stream message sampling

Samuel Zhou Mon, 16 May 2016 11:42:59 -0700

Hi, Mich,

I created the Kafka DStream with following Java code:


sparkConf = new SparkConf().setAppName(this.getClass().getSimpleName() + ",
topic: " + topics);

jssc = new JavaStreamingContext(sparkConf, Durations.seconds(batchInterval
));

HashSet<String> topicsSet = new HashSet<>(Arrays.asList(topics.split(",")));

HashMap<String, String> kafkaParams = new HashMap<>();

kafkaParams.put("metadata.broker.list", brokers);

dStream = KafkaUtils.createDirectStream(jssc, byte[].class, byte[].class,
DefaultDecoder.class, DefaultDecoder.class, kafkaParams, topicsSet);

Do you know if there a way to do sampling for a stream when creating it?

Thanks,

Samuel

On Mon, May 16, 2016 at 12:54 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> Hi Samuel,
>
> How do you create your RDD based on Kakfa direct stream?
>
> Do you have your code snippet?
>
> HTH
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 May 2016 at 23:24, Samuel Zhou <zhou...@gmail.com> wrote:
>
>> Hi,
>>
>> I was trying to use filter to sampling a Kafka direct stream, and the
>> filter function just take 1 messages from 10 by using hashcode % 10 == 0,
>> but the number of events of input for each batch didn't shrink to 10% of
>> original traffic. So I want to ask if there are any way to shrink the batch
>> size by a sampling function to save the traffic from Kafka?
>>
>> Thanks!
>> Samuel
>>
>
>

Re: Kafka stream message sampling

Reply via email to