I've written an application to get content from a kafka topic with 1.7
billion entries, get the protobuf serialized entries, and insert into
hbase. Currently the environment that I'm running in is Spark 1.2.
With 8 executors and 2 cores, and 2 jobs, I'm only getting between
0-2500 writes / second
me extent.
>
> David Krieg | Enterprise Software Engineer
> Early Warning
> Direct: 480.426.2171 | Fax: 480.483.4628 | Mobile: 859.227.6173
>
>
> -Original Message-
> From: Colin Kincaid Williams [mailto:disc...@uw.edu]
> Sent: Monday, May 02, 2016 10:55 AM
&g
spark 1.2, or is upgrading possible? The
> kafka direct stream is available starting with 1.3. If you're stuck
> on 1.2, I believe there have been some attempts to backport it, search
> the mailing list archives.
>
> On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams
> wrot
ing with 1.3. If you're stuck
> on 1.2, I believe there have been some attempts to backport it, search
> the mailing list archives.
>
> On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams
> wrote:
>> I've written an application to get content from a kafka topic w
t;
> Really though, I'd try to start with spark 1.6 and direct streams, or
> even just kafkacat, as a baseline.
>
>
>
> On Mon, May 2, 2016 at 7:01 PM, Colin Kincaid Williams wrote:
>> Hello again. I searched for "backport kafka" in the list archives but
tributing across partitions evenly).
>
> On Tue, May 3, 2016 at 1:44 PM, Colin Kincaid Williams wrote:
>> Thanks again Cody. Regarding the details 66 kafka partitions on 3
>> kafka servers, likely 8 core systems with 10 disks each. Maybe the
>> issue with the receiver was the large n
-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/lib/*
\
/home/colin.williams/kafka-hbase.jar "FromTable" "ToTable"
"broker1:9092,broker2:9092"
On Tue, May 3, 2016 at 8:20 PM, Colin Kincaid Williams wrote:
> Thanks Cody, I can see that the partitions are well distributed...
> Then
eh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> On 18 June 2016 at 20:40, Colin Kincaid Williams wrote:
>>
>> I updated my app to Spark 1
I'm attaching a picture from the streaming UI.
On Sat, Jun 18, 2016 at 7:59 PM, Colin Kincaid Williams wrote:
> There are 25 nodes in the spark cluster.
>
> On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh
> wrote:
>> how many nodes are in your cluster?
>>
>&g
c?
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
&g
ocessing time is
> 1.16 seconds, you're always going to be falling behind. That would
> explain why you've built up an hour of scheduling delay after eight
> hours of running.
>
> On Sat, Jun 18, 2016 at 4:40 PM, Colin Kincaid Williams
> wrote:
>> Hi Mich again,
looking for advice regarding # Kafka Topic Partitions / Streaming
Duration / maxRatePerPartition / any other spark settings or code
changes that I should make to try to get a better consumption rate.
Thanks for all the help so far, this is the first Spark application I
have written.
On Mon, Jun 2
ion and just measure what your read
> performance is by doing something like
>
> createDirectStream(...).foreach(_.println)
>
> not take() or print()
>
> On Tue, Jun 21, 2016 at 3:19 PM, Colin Kincaid Williams
> wrote:
>> @Cody I was able to bring my processing ti
sible my issues were related to running on the Spark
1.5.2 cluster. Also is the missing event count in the completed
batches a bug? Should I file an issue?
On Tue, Jun 21, 2016 at 9:04 PM, Colin Kincaid Williams wrote:
> Thanks @Cody, I will try that out. In the interm, I tried to validate
> my
Streaming UI tab showing empty events and very different metrics than on 1.5.2
On Thu, Jun 23, 2016 at 5:06 AM, Colin Kincaid Williams wrote:
> After a bit of effort I moved from a Spark cluster running 1.5.2, to a
> Yarn cluster running 1.6.1 jars. I'm still setting the maxRPP. The
I launch around 30-60 of these jobs defined like start-job.sh in the
background from a wrapper script. I wait about 30 seconds between launches,
then the wrapper monitors yarn to determine when to launch more. There is a
limit defined at around 60 jobs, but even if I set it to 30, I run out of
memo
I'm using the spark shell v1.61 . I have a classpath conflict, where I
have an external library ( not OSS either :( , can't rebuild it.)
using httpclient-4.5.2.jar. I use spark-shell --jars
file:/path/to/httpclient-4.5.2.jar
However spark is using httpclient-4.3 internally. Then when I try to
use
My bad, gothos on IRC pointed me to the docs:
http://jhz.name/2016/01/10/spark-classpath.html
Thanks Gothos!
On Fri, Sep 9, 2016 at 9:23 PM, Colin Kincaid Williams wrote:
> I'm using the spark shell v1.61 . I have a classpath conflict, where I
> have an external library ( not
Hi,
I have been trying to get my yarn logs to display in the spark
history-server or yarn history-server. I can see the log information
yarn logs -applicationId application_1424740955620_0009
15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing
over to us3sm2hbqa04r07-comp-pr
; /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
>
>
> It may be slightly different for you if the resource manager and the
> history server are not on the same machine.
>
> Hope it will work for you as well!
> Christophe.
>
> On 24/02/2015 06:31, Colin Kinca
he info in one place.
>
> On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams
> wrote:
>
>> Looks like in my tired state, I didn't mention spark the whole time.
>> However, it might be implied by the application log above. Spark log
>> aggregation appears to b
21 matches
Mail list logo