Yeah, if you're reporting issues, please be clear as to whether
backpressure is enabled, and whether maxRatePerPartition is set.

I expect that there is something wrong with backpressure, see e.g.
https://issues.apache.org/jira/browse/SPARK-18371

On Wed, Nov 16, 2016 at 5:05 PM, bo yang <bobyan...@gmail.com> wrote:
> I hit similar issue with Spark Streaming. The batch size seemed a little
> random. Sometime it was large with many Kafka messages inside same batch,
> sometimes it was very small with just a few messages. Is it possible that
> was caused by the backpressure implementation in Spark Streaming?
>
> On Wed, Nov 16, 2016 at 4:22 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Moved to user list.
>>
>> I'm not really clear on what you're trying to accomplish (why put the
>> csv file through Kafka instead of reading it directly with spark?)
>>
>> auto.offset.reset=largest just means that when starting the job
>> without any defined offsets, it will start at the highest (most
>> recent) available offsets.  That's probably not what you want if
>> you've already loaded csv lines into kafka.
>>
>> On Wed, Nov 16, 2016 at 2:45 PM, Hoang Bao Thien <hbthien0...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I would like to ask a question related to the size of Kafka stream. I
>> > want
>> > to put data (e.g., file *.csv) to Kafka then use Spark streaming to get
>> > the
>> > output from Kafka and then save to Hive by using SparkSQL. The file csv
>> > is
>> > about 100MB with ~250K messages/rows (Each row has about 10 fields of
>> > integer). I see that Spark Streaming first received two
>> > partitions/batches,
>> > the first is of 60K messages and the second is of 50K msgs. But from the
>> > third batch, Spark just received 200 messages for each batch (or
>> > partition).
>> > I think that this problem is coming from Kafka or some configuration in
>> > Spark. I already tried to configure with the setting
>> > "auto.offset.reset=largest", but every batch only gets 200 messages.
>> >
>> > Could you please tell me how to fix this problem?
>> > Thank you so much.
>> >
>> > Best regards,
>> > Alex
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to