"1 userid data" is ambiguous though (user-input data? stream? shard?),
since a kinesis worker fetch data from shards that the worker has an
ownership of, IIUC user-input data in a shard are transferred into an
assigned worker as long as you get no failure.
// maropu
On Mon, Nov 21, 2016 at 1:59 P
Hi
Thanks.
Have a doubt on spark streaming kinesis consumer. Say I have a batch time
of 500 ms and kiensis stream is partitioned on userid(uniformly
distributed).But since IdleTimeBetweenReadsInMillis is set to 1000ms so
Spark receiver nodes will fetch the data at interval of 1 second and store
in
Seems it it not a good design to frequently restart workers in a minute
because
their initialization and shutdown take much time as you said
(e.g., interconnection overheads with dynamodb and graceful shutdown).
Anyway, since this is a kind of questions about the aws kinesis library, so
you'd bett
1.No, I want to implement low level consumer on kinesis stream.
so need to stop the worker once it read the latest sequence number sent by
driver.
2.What is the cost of frequent register and deregister of worker node. Is
that when worker's shutdown is called it will terminate run method but
leasec
Is "aws kinesis get-shard-iterator --shard-iterator-type LATEST" not enough
for your usecase?
On Mon, Nov 14, 2016 at 10:23 PM, Shushant Arora
wrote:
> Thanks!
> Is there a way to get the latest sequence number of all shards of a
> kinesis stream?
>
>
>
> On Mon, Nov 14, 2016 at 5:43 PM, Takeshi
Thanks!
Is there a way to get the latest sequence number of all shards of a kinesis
stream?
On Mon, Nov 14, 2016 at 5:43 PM, Takeshi Yamamuro
wrote:
> Hi,
>
> The time interval can be controlled by `IdleTimeBetweenReadsInMillis` in
> KinesisClientLibConfiguration
> though,
> it is not configu
Hi,
The time interval can be controlled by `IdleTimeBetweenReadsInMillis`
in KinesisClientLibConfiguration though,
it is not configurable in the current implementation.
The detail can be found in;
https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/scala/org/apache/spark/str
I'm not familiar with the kafka implementation though, a kinesis receiver
runs in a thread of executors.
You can set any value in the interval, but frequent checkpoints cause
excess loads in dynamodb.
See:
http://spark.apache.org/docs/latest/streaming-kinesis-integration.html#kinesis-checkpointing
Hi
By receicer I meant spark streaming receiver architecture- means worker
nodes are different than receiver nodes. There is no direct consumer/low
level consumer like of Kafka in kinesis spark streaming?
Is there any limitation on interval checkpoint - minimum of 1second in
spark streaming with
I'm not exactly sure about the receiver you pointed though,
if you point the "KinesisReceiver" implementation, yes.
Also, we currently cannot disable the interval checkpoints.
On Tue, Oct 25, 2016 at 11:53 AM, Shushant Arora
wrote:
> Thanks!
>
> Is kinesis streams are receiver based only? Is th
Thanks!
Is kinesis streams are receiver based only? Is there non receiver based
consumer for Kinesis ?
And Instead of having fixed checkpoint interval,Can I disable auto
checkpoint and say when my worker has processed the data after last record
of mapPartition now checkpoint the sequence no usin
Hi,
The only thing you can do for Kinesis checkpoints is tune the interval of
them.
https://github.com/apache/spark/blob/master/external/
kinesis-asl/src/main/scala/org/apache/spark/streaming/
kinesis/KinesisUtils.scala#L68
Whether the dataloss occurs or not depends on the storage level you set;
Yes, it's against master: https://github.com/apache/spark/pull/10256
I'll push the KCL version bump after my local tests finish.
On Fri, Dec 11, 2015 at 10:42 AM Nick Pentreath
wrote:
> Is that PR against master branch?
>
> S3 read comes from Hadoop / jet3t afaik
>
> —
> Sent from Mailbox
Is that PR against master branch?
S3 read comes from Hadoop / jet3t afaik
—
Sent from Mailbox
On Fri, Dec 11, 2015 at 5:38 PM, Brian London
wrote:
> That's good news I've got a PR in to up the SDK version to 1.10.40 and the
> KCL to 1.6.1 which I'm running tests on locally now.
> Is the
That's good news I've got a PR in to up the SDK version to 1.10.40 and the
KCL to 1.6.1 which I'm running tests on locally now.
Is the AWS SDK not used for reading/writing from S3 or do we get that for
free from the Hadoop dependencies?
On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath
wrote:
> c
cc'ing dev list
Ok, looks like when the KCL version was updated in
https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
probably leading to dependency conflict, though as Burak mentions its hard
to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
and on my
Yeah also the integration tests need to be specifically run - I would have
thought the contributor would have run those tests and also tested the change
themselves using live Kinesis :(
—
Sent from Mailbox
On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz wrote:
> I don't think the Kinesis tests
I don't think the Kinesis tests specifically ran when that was merged into
1.5.2 :(
https://github.com/apache/spark/pull/8957
https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
AFAIK pom changes don't trigger the Kinesis tests.
Burak
On Thu, Dec 10, 2015 at 8:09 PM,
Yup also works for me on master branch as I've been testing DynamoDB Streams
integration. In fact works with latest KCL 1.6.1 also which I was using.
So theKCL version does seem like it could be the issue - somewhere along the
line an exception must be getting swallowed. Though the tests shou
Yes, it worked in the 1.6 branch as of commit
db5165246f2888537dd0f3d4c5a515875c7358ed. That makes it much less serious
of an issue, although it would be nice to know what the root cause is to
avoid a regression.
On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz wrote:
> I've noticed this happening w
I've noticed this happening when there was some dependency conflicts, and
it is super hard to debug.
It seems that the KinesisClientLibrary version in Spark 1.5.2 is 1.3.0, but
it is 1.2.1 in Spark 1.5.1.
I feel like that seems to be the problem...
Brian, did you verify that it works with the 1.6.
Nick's symptoms sound identical to mine. I should mention that I just
pulled the latest version from github and it seems to be working there. To
reproduce:
1. Download spark 1.5.2 from http://spark.apache.org/downloads.html
2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTes
Hi Nick,
Just to be sure: don't you see some ClassCastException in the log ?
Thanks,
Regards
JB
On 12/10/2015 07:56 PM, Nick Pentreath wrote:
Could you provide an example / test case and more detail on what issue
you're facing?
I've just tested a simple program reading from a dev Kinesis stre
I haven't tried this myself yet, but this sounds relevant:
https://github.com/apache/spark/pull/2535
Will be giving this a try today or so, will report back.
On Wednesday, October 29, 2014, Harold Nguyen wrote:
> Hi again,
>
> After getting through several dependencies, I finally got to this
>
Hi again,
After getting through several dependencies, I finally got to this
non-dependency type error:
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.http.impl.conn.DefaultClientConnectionOperator.(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)
25 matches
Mail list logo