[jira] [Created] (KUDU-2329) Random RPC timeout errors when inserting rows in a Kudu table

2018-03-02 Thread JIRA
Héctor Gutiérrez created KUDU-2329:
--

 Summary: Random RPC timeout errors when inserting rows in a Kudu 
table
 Key: KUDU-2329
 URL: https://issues.apache.org/jira/browse/KUDU-2329
 Project: Kudu
  Issue Type: Bug
  Components: rpc, server
Affects Versions: 1.5.0
Reporter: Héctor Gutiérrez


When executing inserts into a Kudu table, we are experiencing errors at random 
times. The first time we found one of these errors was during a bulk update of 
a Kudu table via Spark (in Scala):

{{kuduContext.updateRows(dataFrame, "table_name")}}

The error message in Spark was the following:

{{java.lang.RuntimeException: failed to write 579 rows from DataFrame to Kudu; 
sample errors: Timed out: can not complete before timeout: Batch{operations=6, 
tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x000F, 0x0010), 
ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, 
tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, 
DeadlineTracker(timeout=3, elapsed=30090), Traces: [0ms] sending RPC to 
server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server 
6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel, [10011ms] delaying RPC due to Network error: [peer 
6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel, [10033ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, 
[20050ms] received from server 6f273933b4d5498e87aadfb99b054a21 response 
Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read 
timeout; closing the channel, [20050ms] delaying RPC due to Network error: 
[peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel, [20072ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, 
[30090ms] received from server 6f273933b4d5498e87aadfb99b054a21 response 
Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read 
timeout; closing the channel, [30090ms] delaying RPC due to Network error: 
[peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel)}}}

(+ 4 more errors similar to this one in the error message)

We first thought it was actually a problem with our Spark code, but when we 
tried to execute a simple "INSERT INTO" query from the impala shell into a Kudu 
table, we got the following error:

{{[.] > insert into test_kudu values (282, 
'hola');}}
{{ Query: insert into test_kudu values (282, 'hola')}}
{{ Query submitted at: ..}}
{{ Query progress can be monitored at: }}
{{ WARNINGS: Kudu error(s) reported, first error: Timed out: Failed to write 
batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): 
Failed to write to server: 071bcafbb1644678a697c474662047b7 
(.:7050): Write RPC to :7050 timed 
out after 179.949s (SENT)}}

{{Error in Kudu table 'impala:kudu_db.test_kudu': Timed out: Failed to write 
batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): 
Failed to write to server: 071bcafbb1644678a697c474662047b7 
(...:7050): Write RPC to ..:7050 
timed out after 179.949s (SENT)}}

To make things even more confusing, despite getting this error in the impala 
shell, after a while (and not immediately), the inserted rows ended up in the 
table, so somehow they were actually inserted.

We also tried tweaking the Kudu timeout configuration values that we had 
previously set, but it didn't solve anything and the problem kept appearing.

Furthermore, we don't always get these errors, they only appear at random 
times. For example, right now we're just getting errors in that update we have 
in the Spark code, but we are not experiencing issues when working from the 
impala shell.

After all that we have tried, we are pretty certain that this is a bug in Kudu, 
although we think it is a bit strange that it is undocumented and certainly 
it's hard to reproduce.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2329) Random RPC timeout errors when inserting rows in a Kudu table

2018-03-02 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KUDU-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Héctor Gutiérrez updated KUDU-2329:
---
Description: 
When executing inserts into a Kudu table, we are experiencing errors at random 
times. The first time we found one of these errors was during a bulk update of 
a Kudu table via Spark (in Scala):

{{kuduContext.updateRows(dataFrame, "table_name")}}

The error message in Spark was the following:

{{java.lang.RuntimeException: failed to write 579 rows from DataFrame to Kudu; 
sample errors: Timed out: can not complete before timeout: Batch

{operations=6, tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x000F, 
0x0010), ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, 
tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, 
DeadlineTracker(timeout=3, elapsed=30090), Traces: [0ms] sending RPC to 
server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server 
6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel, [10011ms] delaying RPC due to Network error: [peer 
6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel, [10033ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, 
[20050ms] received from server 6f273933b4d5498e87aadfb99b054a21 response 
Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read 
timeout; closing the channel, [20050ms] delaying RPC due to Network error: 
[peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel, [20072ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, 
[30090ms] received from server 6f273933b4d5498e87aadfb99b054a21 response 
Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read 
timeout; closing the channel, [30090ms] delaying RPC due to Network error: 
[peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel)}

}}

(+ 4 more errors similar to this one in the error message)

We first thought it was actually a problem with our Spark code, but when we 
tried to execute a simple "INSERT INTO" query from the impala shell into a Kudu 
table, we got the following error:

{{[.] > insert into test_kudu values (282, 
'hola');}}
{{ \{{ Query: insert into test_kudu values (282, 'hola')
{{ \{{ Query submitted at: ..
{{ \{{ Query progress can be monitored at: 
{{ \{{ WARNINGS: Kudu error(s) reported, first error: Timed out: Failed to 
write batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 
attempt(s): Failed to write to server: 071bcafbb1644678a697c474662047b7 
(.:7050): Write RPC to :7050 timed 
out after 179.949s (SENT)

{{Error in Kudu table 'impala:kudu_db.test_kudu': Timed out: Failed to write 
batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): 
Failed to write to server: 071bcafbb1644678a697c474662047b7 
(...:7050): Write RPC to ..:7050 
timed out after 179.949s (SENT)}}

To make things even more confusing, despite getting this error in the impala 
shell, after a while (and not immediately), the inserted rows ended up in the 
table, so somehow they were actually inserted.

We also tried tweaking the Kudu timeout configuration values that we had 
previously set, but it didn't solve anything and the problem kept appearing.

Furthermore, we don't always get these errors, they only appear at random 
times. For example, right now we're just getting errors in that update we have 
in the Spark code, but we are not experiencing issues when working from the 
impala shell.

After all that we have tried, we are pretty certain that this is a bug in Kudu, 
although we think it is a bit strange that it is undocumented and certainly 
it's hard to reproduce.

  was:
When executing inserts into a Kudu table, we are experiencing errors at random 
times. The first time we found one of these errors was during a bulk update of 
a Kudu table via Spark (in Scala):

{{kuduContext.updateRows(dataFrame, "table_name")}}

The error message in Spark was the following:

{{java.lang.RuntimeException: failed to write 579 rows from DataFrame to Kudu; 
sample errors: Timed out: can not complete before timeout: Batch{operations=6, 
tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x000F, 0x0010), 
ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, 
tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, 
DeadlineTracker(timeout=3, elapsed=30090), Traces: [0ms] sending RPC to 
server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server 
6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
channel, [10011ms] delaying RPC due 

[jira] [Updated] (KUDU-2319) Follower masters cannot accept authn tokens for verification

2018-03-02 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2319:

   Resolution: Fixed
Fix Version/s: 1.7.0
   Status: Resolved  (was: In Review)

Fixed in 7e54a17ec63f3d2a01c21aa53fc8f94adf7f1909

> Follower masters cannot accept authn tokens for verification
> 
>
> Key: KUDU-2319
> URL: https://issues.apache.org/jira/browse/KUDU-2319
> Project: Kudu
>  Issue Type: Bug
>  Components: master, security
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.4.1, 1.6.0, 1.7.0
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
> Fix For: 1.7.0
>
>
> In case of multi-master setup, the follower masters which haven't been 
> leaders yet, cannot accept authn tokens for verification because they don't 
> have public parts of TSKs in their TokenVerifier.
> A small integration test posted as a WIP patch illustrates that:
>   http://gerrit.cloudera.org:8080/9373



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

2018-03-02 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16384166#comment-16384166
 ] 

Todd Lipcon commented on KUDU-1945:
---

[~granthenke] I noticed you assigned this to yourself. Would be good to publish 
some sort of short design doc or blurb on your intended approach if you're 
working on it. I had a few thoughts as well.

> Support generation of surrogate primary keys (or tables with no PK)
> ---
>
> Key: KUDU-1945
> URL: https://issues.apache.org/jira/browse/KUDU-1945
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, master, tablet
>Reporter: Todd Lipcon
>Assignee: Grant Henke
>Priority: Major
>
> Many use cases have data where there is no "natural" primary key. For 
> example, a web log use case mostly cares about partitioning and not about 
> precise sorting by timestamp, and timestamps themselves are not necessarily 
> unique. Rather than forcing users to come up with their own surrogate primary 
> keys, Kudu should support some kind of "auto_increment" equivalent which 
> generates primary keys on insertion. Alternatively, Kudu could support tables 
> which are partitioned but not internally sorted.
> The advantages would be:
> - Kudu can pick primary keys on insertion to guarantee that there is no 
> compaction required on the table (eg always assign a new key higher than any 
> existing key in the local tablet). This can improve write throughput 
> substantially, especially compared to naive PK generation schemes that a user 
> might pick such as UUID, which would generate a uniform random-insert 
> workload (worst case for performance)
> - Make Kudu easier to use for such use cases (no extra client code necessary)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2329) Random RPC timeout errors when inserting rows in a Kudu table

2018-03-02 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16384170#comment-16384170
 ] 

Todd Lipcon commented on KUDU-2329:
---

Hi Hector. While I agree timeouts are not intended and thus might be considered 
a "bug", we need to have a more specific bug report here to be tracked as a 
JIRA. Would you mind sending your description to the Kudu user mailing list so 
we can help you try to troubleshoot? If we can root cause it to a more specific 
issue we can track it as a JIRA, but as is, it's not scoped down enough to 
start working on it via JIRA.

> Random RPC timeout errors when inserting rows in a Kudu table
> -
>
> Key: KUDU-2329
> URL: https://issues.apache.org/jira/browse/KUDU-2329
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc, server
>Affects Versions: 1.5.0
>Reporter: Héctor Gutiérrez
>Priority: Major
>
> When executing inserts into a Kudu table, we are experiencing errors at 
> random times. The first time we found one of these errors was during a bulk 
> update of a Kudu table via Spark (in Scala):
> {{kuduContext.updateRows(dataFrame, "table_name")}}
> The error message in Spark was the following:
> {{java.lang.RuntimeException: failed to write 579 rows from DataFrame to 
> Kudu; sample errors: Timed out: can not complete before timeout: Batch
> {operations=6, tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x000F, 
> 0x0010), ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, 
> tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, 
> DeadlineTracker(timeout=3, elapsed=30090), Traces: [0ms] sending RPC to 
> server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server 
> 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [10011ms] delaying RPC due to Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [10033ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, 
> [20050ms] received from server 6f273933b4d5498e87aadfb99b054a21 response 
> Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read 
> timeout; closing the channel, [20050ms] delaying RPC due to Network error: 
> [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing 
> the channel, [20072ms] sending RPC to server 
> 6f273933b4d5498e87aadfb99b054a21, [30090ms] received from server 
> 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [30090ms] delaying RPC due to Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel)}
> }}
> (+ 4 more errors similar to this one in the error message)
> We first thought it was actually a problem with our Spark code, but when we 
> tried to execute a simple "INSERT INTO" query from the impala shell into a 
> Kudu table, we got the following error:
> {{[.] > insert into test_kudu values (282, 
> 'hola');}}
> {{ \{{ Query: insert into test_kudu values (282, 'hola')
> {{ \{{ Query submitted at: ..
> {{ \{{ Query progress can be monitored at: 
> {{ \{{ WARNINGS: Kudu error(s) reported, first error: Timed out: Failed to 
> write batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 
> attempt(s): Failed to write to server: 071bcafbb1644678a697c474662047b7 
> (.:7050): Write RPC to :7050 
> timed out after 179.949s (SENT)
> {{Error in Kudu table 'impala:kudu_db.test_kudu': Timed out: Failed to write 
> batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): 
> Failed to write to server: 071bcafbb1644678a697c474662047b7 
> (...:7050): Write RPC to ..:7050 
> timed out after 179.949s (SENT)}}
> To make things even more confusing, despite getting this error in the impala 
> shell, after a while (and not immediately), the inserted rows ended up in the 
> table, so somehow they were actually inserted.
> We also tried tweaking the Kudu timeout configuration values that we had 
> previously set, but it didn't solve anything and the problem kept appearing.
> Furthermore, we don't always get these errors, they only appear at random 
> times. For example, right now we're just getting errors in that update we 
> have in the Spark code, but we are not experiencing issues when working from 
> the impala shell.
> After all that we have tried, we are pretty certain that this is a bug in 
> Kudu, although we think it is a bit strange that it is undocumented and 
> certainly it's

[jira] [Resolved] (KUDU-2329) Random RPC timeout errors when inserting rows in a Kudu table

2018-03-02 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-2329.
---
   Resolution: Incomplete
Fix Version/s: n/a

> Random RPC timeout errors when inserting rows in a Kudu table
> -
>
> Key: KUDU-2329
> URL: https://issues.apache.org/jira/browse/KUDU-2329
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc, server
>Affects Versions: 1.5.0
>Reporter: Héctor Gutiérrez
>Priority: Major
> Fix For: n/a
>
>
> When executing inserts into a Kudu table, we are experiencing errors at 
> random times. The first time we found one of these errors was during a bulk 
> update of a Kudu table via Spark (in Scala):
> {{kuduContext.updateRows(dataFrame, "table_name")}}
> The error message in Spark was the following:
> {{java.lang.RuntimeException: failed to write 579 rows from DataFrame to 
> Kudu; sample errors: Timed out: can not complete before timeout: Batch
> {operations=6, tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x000F, 
> 0x0010), ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, 
> tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, 
> DeadlineTracker(timeout=3, elapsed=30090), Traces: [0ms] sending RPC to 
> server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server 
> 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [10011ms] delaying RPC due to Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [10033ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, 
> [20050ms] received from server 6f273933b4d5498e87aadfb99b054a21 response 
> Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read 
> timeout; closing the channel, [20050ms] delaying RPC due to Network error: 
> [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing 
> the channel, [20072ms] sending RPC to server 
> 6f273933b4d5498e87aadfb99b054a21, [30090ms] received from server 
> 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [30090ms] delaying RPC due to Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel)}
> }}
> (+ 4 more errors similar to this one in the error message)
> We first thought it was actually a problem with our Spark code, but when we 
> tried to execute a simple "INSERT INTO" query from the impala shell into a 
> Kudu table, we got the following error:
> {{[.] > insert into test_kudu values (282, 
> 'hola');}}
> {{ \{{ Query: insert into test_kudu values (282, 'hola')
> {{ \{{ Query submitted at: ..
> {{ \{{ Query progress can be monitored at: 
> {{ \{{ WARNINGS: Kudu error(s) reported, first error: Timed out: Failed to 
> write batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 
> attempt(s): Failed to write to server: 071bcafbb1644678a697c474662047b7 
> (.:7050): Write RPC to :7050 
> timed out after 179.949s (SENT)
> {{Error in Kudu table 'impala:kudu_db.test_kudu': Timed out: Failed to write 
> batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): 
> Failed to write to server: 071bcafbb1644678a697c474662047b7 
> (...:7050): Write RPC to ..:7050 
> timed out after 179.949s (SENT)}}
> To make things even more confusing, despite getting this error in the impala 
> shell, after a while (and not immediately), the inserted rows ended up in the 
> table, so somehow they were actually inserted.
> We also tried tweaking the Kudu timeout configuration values that we had 
> previously set, but it didn't solve anything and the problem kept appearing.
> Furthermore, we don't always get these errors, they only appear at random 
> times. For example, right now we're just getting errors in that update we 
> have in the Spark code, but we are not experiencing issues when working from 
> the impala shell.
> After all that we have tried, we are pretty certain that this is a bug in 
> Kudu, although we think it is a bit strange that it is undocumented and 
> certainly it's hard to reproduce.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

2018-03-02 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16384183#comment-16384183
 ] 

Grant Henke commented on KUDU-1945:
---

Yeah, I had just started thinking about this and had some ideas I wanted to put 
in a doc. I wanted to assign it to myself to make sure I did that before I 
forgot about it. 

> Support generation of surrogate primary keys (or tables with no PK)
> ---
>
> Key: KUDU-1945
> URL: https://issues.apache.org/jira/browse/KUDU-1945
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, master, tablet
>Reporter: Todd Lipcon
>Assignee: Grant Henke
>Priority: Major
>
> Many use cases have data where there is no "natural" primary key. For 
> example, a web log use case mostly cares about partitioning and not about 
> precise sorting by timestamp, and timestamps themselves are not necessarily 
> unique. Rather than forcing users to come up with their own surrogate primary 
> keys, Kudu should support some kind of "auto_increment" equivalent which 
> generates primary keys on insertion. Alternatively, Kudu could support tables 
> which are partitioned but not internally sorted.
> The advantages would be:
> - Kudu can pick primary keys on insertion to guarantee that there is no 
> compaction required on the table (eg always assign a new key higher than any 
> existing key in the local tablet). This can improve write throughput 
> substantially, especially compared to naive PK generation schemes that a user 
> might pick such as UUID, which would generate a uniform random-insert 
> workload (worst case for performance)
> - Make Kudu easier to use for such use cases (no extra client code necessary)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)