date:20141030

bulk data load

2014-10-30 Thread koray mehmet

Hi,

I have to add a column and value to current data. I tried with batch
statement but got warning at cassandra as

WARN [Native-Transport-Requests:1423504] 2014-10-22 16:49:11,426
BatchStatement.java (line 223) Batch of prepared statements for
[keyspace.table] is of size 26600, exceeding specified threshold of 5120 by
21480.

After that messages, could not write to cassandra also.

How should i go on?

Thanks
Koray Sariteke

Re: bulk data load

2014-10-30 Thread Juho Mäkinen

You should split your batch statements into smaller batches, say 100
operations per batch (or less if you keep getting those errors). You can
also grow the batch_size_warn_threshold_in_kb in your cassandra.yaml a bit,
I'm using 20kb in my cluster. You can read more from the relevant Jira:
https://issues.apache.org/jira/browse/CASSANDRA-6487

On Thu, Oct 30, 2014 at 11:36 AM, koray mehmet 
wrote:

> Hi,
>
> I have to add a column and value to current data. I tried with batch
> statement but got warning at cassandra as
>
> WARN [Native-Transport-Requests:1423504] 2014-10-22 16:49:11,426
> BatchStatement.java (line 223) Batch of prepared statements for
> [keyspace.table] is of size 26600, exceeding specified threshold of 5120 by
> 21480.
>
> After that messages, could not write to cassandra also.
>
> How should i go on?
>
> Thanks
> Koray Sariteke
>

"Did not get positive replies from all endpoints" error on incremental repair

2014-10-30 Thread Juho Mäkinen

I'm having problems running nodetool repair -inc -par -pr on my 2.1.1
cluster due to "Did not get positive replies from all endpoints" error.

Here's an example output:
root@db08-3:~# nodetool repair -par -inc -pr

[2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system'
[2014-10-30 10:33:02,420] Starting repair command #10, repairing 256 ranges
for keyspace profiles (seq=false, full=false)
[2014-10-30 10:33:17,240] Repair failed with error Did not get positive
replies from all endpoints.
[2014-10-30 10:33:17,263] Starting repair command #11, repairing 256 ranges
for keyspace OpsCenter (seq=false, full=false)
[2014-10-30 10:33:32,242] Repair failed with error Did not get positive
replies from all endpoints.
[2014-10-30 10:33:32,249] Starting repair command #12, repairing 256 ranges
for keyspace system_traces (seq=false, full=false)
[2014-10-30 10:33:44,243] Repair failed with error Did not get positive
replies from all endpoints.

The local system log shows that the repair commands got started, but it
seems that they immediately get cancelled due to that error, which btw
can't be seen in the cassandra log.

I tried monitoring all logs from all machines in case another machine would
show up with some useful error, but so far I haven't found nothing.

Any ideas where this error comes from?

 - Garo

Re: "Did not get positive replies from all endpoints" error on incremental repair

2014-10-30 Thread Rahul Neelakantan

It appears to come from the ActiveRepairService.prepareForRepair portion of the 
Code.

Are you sure all nodes are reachable from the node you are initiating repair 
on, at the same time?

Any Node up/down/died messages?

Rahul Neelakantan

> On Oct 30, 2014, at 6:37 AM, Juho Mäkinen  wrote:
> 
> I'm having problems running nodetool repair -inc -par -pr on my 2.1.1 cluster 
> due to "Did not get positive replies from all endpoints" error.
> 
> Here's an example output:
> root@db08-3:~# nodetool repair -par -inc -pr  
>
> [2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system'
> [2014-10-30 10:33:02,420] Starting repair command #10, repairing 256 ranges 
> for keyspace profiles (seq=false, full=false)
> [2014-10-30 10:33:17,240] Repair failed with error Did not get positive 
> replies from all endpoints.
> [2014-10-30 10:33:17,263] Starting repair command #11, repairing 256 ranges 
> for keyspace OpsCenter (seq=false, full=false)
> [2014-10-30 10:33:32,242] Repair failed with error Did not get positive 
> replies from all endpoints.
> [2014-10-30 10:33:32,249] Starting repair command #12, repairing 256 ranges 
> for keyspace system_traces (seq=false, full=false)
> [2014-10-30 10:33:44,243] Repair failed with error Did not get positive 
> replies from all endpoints.
> 
> The local system log shows that the repair commands got started, but it seems 
> that they immediately get cancelled due to that error, which btw can't be 
> seen in the cassandra log.
> 
> I tried monitoring all logs from all machines in case another machine would 
> show up with some useful error, but so far I haven't found nothing.
> 
> Any ideas where this error comes from?
> 
>  - Garo
>

Re: "Did not get positive replies from all endpoints" error on incremental repair

2014-10-30 Thread Juho Mäkinen

No, the cluster seems to be performing just fine. It seems that the
prepareForRepair callback() could be easily modified to print which node(s)
are unable to respond, so that the debugging effort could be focused
better. This of course doesn't help this case as it's not trivial to add
the log lines and to roll it out to the entire cluster.

The cluster is relatively young, containing only 450GB with RF=3 spread
over nine nodes and I'm still practicing how to run incremental repairs on
the cluster when I stumbled on this issue.

On Thu, Oct 30, 2014 at 12:52 PM, Rahul Neelakantan  wrote:

> It appears to come from the ActiveRepairService.prepareForRepair portion
> of the Code.
>
> Are you sure all nodes are reachable from the node you are initiating
> repair on, at the same time?
>
> Any Node up/down/died messages?
>
> Rahul Neelakantan
>
> > On Oct 30, 2014, at 6:37 AM, Juho Mäkinen 
> wrote:
> >
> > I'm having problems running nodetool repair -inc -par -pr on my 2.1.1
> cluster due to "Did not get positive replies from all endpoints" error.
> >
> > Here's an example output:
> > root@db08-3:~# nodetool repair -par -inc -pr
> > [2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system'
> > [2014-10-30 10:33:02,420] Starting repair command #10, repairing 256
> ranges for keyspace profiles (seq=false, full=false)
> > [2014-10-30 10:33:17,240] Repair failed with error Did not get positive
> replies from all endpoints.
> > [2014-10-30 10:33:17,263] Starting repair command #11, repairing 256
> ranges for keyspace OpsCenter (seq=false, full=false)
> > [2014-10-30 10:33:32,242] Repair failed with error Did not get positive
> replies from all endpoints.
> > [2014-10-30 10:33:32,249] Starting repair command #12, repairing 256
> ranges for keyspace system_traces (seq=false, full=false)
> > [2014-10-30 10:33:44,243] Repair failed with error Did not get positive
> replies from all endpoints.
> >
> > The local system log shows that the repair commands got started, but it
> seems that they immediately get cancelled due to that error, which btw
> can't be seen in the cassandra log.
> >
> > I tried monitoring all logs from all machines in case another machine
> would show up with some useful error, but so far I haven't found nothing.
> >
> > Any ideas where this error comes from?
> >
> >  - Garo
> >
>

Re: OOM at Bootstrap Time

2014-10-30 Thread Maxime

I will give a shot adding the logging.

I've tried some experiments and I have no clue what could be happening
anymore:

I tried setting all nodes to a streamthroughput of 1 except 1, to see if
somehow it was getting overloaded by too many streams coming in at once,
nope.
I went through the source at ColumnFamilyStore.java:856 where the huge
burst of "Enqueuing flush..." occurs, and it's clearly at the moment
memtables get converted to SSTables on disk. So I started the bootstrap
process and using a bash script trigerred a 'nodetool flush' every minute
during the processes. At first it seemed to work, but again after what
seems to be a locally-trigered cue, the burst (many many thousands of
Enqueuing flush...). But through my previous experiment, I am fairly
certain it's not a question of volume of data coming in (throughput), or
number of SSTables being streamed (dealing at max 150 files pr node).

Does anyone know if such Enqueuing bursts are normal during bootstrap? I'd
like to be able to say "it's because my nodes are underpowered", but at the
moment, I'm leaning towards a bug of some kind.

On Wed, Oct 29, 2014 at 3:05 PM, DuyHai Doan  wrote:

> Some ideas:
>
> 1) Put on DEBUG log on the joining node to see what is going on in details
> with the stream with 1500 files
>
> 2) Check the stream ID to see whether it's a new stream or an old one
> pending
>
>
>
> On Wed, Oct 29, 2014 at 2:21 AM, Maxime  wrote:
>
>> Doan, thanks for the tip, I just read about it this morning, just waiting
>> for the new version to pop up on the debian datastax repo.
>>
>> Michael, I do believe you are correct in the general running of the
>> cluster and I've reset everything.
>>
>> So it took me a while to reply, I finally got the SSTables down, as seen
>> in the OpsCenter graphs. I'm stumped however because when I bootstrap the
>> new node, I still see very large number of files being streamed (~1500 for
>> some nodes) and the bootstrap process is failing exactly as it did before,
>> in a flury of "Enqueuing flush of ..."
>>
>> Any ideas? I'm reaching the end of what I know I can do, OpsCenter says
>> around 32 SStables per CF, but still streaming tons of "files". :-/
>>
>>
>> On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan 
>> wrote:
>>
>>> "Tombstones will be a very important issue for me since the dataset is
>>> very much a rolling dataset using TTLs heavily."
>>>
>>> --> You can try the new DateTiered compaction strategy (
>>> https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1
>>> if you have a time series data model to eliminate tombstones
>>>
>>> On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael <
>>> michael.la...@nytimes.com> wrote:
>>>
 Again, from our experience w 2.0.x:

 Revert to the defaults - you are manually setting heap way too high
 IMHO.

 On our small nodes we tried LCS - way too much compaction - switch all
 CFs to STCS.

 We do a major rolling compaction on our small nodes weekly during less
 busy hours - works great. Be sure you have enough disk.

 We never explicitly delete and only use ttls or truncation. You can set
 GC to 0 in that case, so tombstones are more readily expunged. There are a
 couple threads in the list that discuss this... also normal rolling repair
 becomes optional, reducing load (still repair if something unusual happens
 tho...).

 In your current situation, you need to kickstart compaction - are there
 any CFs you can truncate at least temporarily? Then try compacting a small
 CF, then another, etc.

 Hopefully you can get enough headroom to add a node.

 ml

 On Sun, Oct 26, 2014 at 6:24 PM, Maxime  wrote:

> Hmm, thanks for the reading.
>
> I initially followed some (perhaps too old) maintenance scripts, which
> included weekly 'nodetool compact'. Is there a way for me to undo the
> damage? Tombstones will be a very important issue for me since the dataset
> is very much a rolling dataset using TTLs heavily.
>
> On Sun, Oct 26, 2014 at 6:04 PM, DuyHai Doan 
> wrote:
>
>> "Should doing a major compaction on those nodes lead to a restructuration
>> of the SSTables?" --> Beware of the major compaction on SizeTiered, it 
>> will
>> create 2 giant SSTables and the expired/outdated/tombstone columns in 
>> this
>> big file will be never cleaned since the SSTable will never get a chance 
>> to
>> be compacted again
>>
>> Essentially to reduce the fragmentation of small SSTables you can
>> stay with SizeTiered compaction and play around with compaction 
>> properties
>> (the thresholds) to make C* group a bunch of files each time it compacts 
>> so
>> that the file number shrinks to a reasonable count
>>
>> Since you're using C* 2.1 and anti-compaction has been introduced, I
>> hesitate advising you to use Leveled compaction as a work-a

Re: Commissioning failure

2014-10-30 Thread Robert Coli

On Wed, Oct 29, 2014 at 10:39 PM, Aravindan T  wrote:

> What could be the reasons for the stream error other than SSTABLE
> corruption?
>

There's tons of reasons streams fail. Cassandra team is aware of how
painful it makes things, so they are working on them.

Be sure that a firewall is not dropping long running connections.

=Rob
http://twitter.com/rcolidba

Re: OOM at Bootstrap Time

2014-10-30 Thread Maxime

I've been trying to go through the logs but I can't say I understand very
well the details:

INFO  [SlabPoolCleaner] 2014-10-30 19:20:18,446 ColumnFamilyStore.java:856
- Enqueuing flush of loc: 7977119 (1%) on-heap, 0 (0%) off-heap
DEBUG [SharedPool-Worker-22] 2014-10-30 19:20:18,446
AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
2c95cbbb61fb8ec3bd06d70058bfa236ccad5195e48fd00c056f7e1e3fdd4368 in
ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815026000
!63072000,])
DEBUG [SharedPool-Worker-6] 2014-10-30 19:20:18,446
AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
41fc260427a88d2f084971702fdcb32756e0731c6042f93e9761e03db5197990 in
ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815333000
!63072000,])
DEBUG [SharedPool-Worker-25] 2014-10-30 19:20:18,446
AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
2e8c4dab33faade0a4fc265e4126e43dc2e58fb72830f73d7e9b8e836101d413 in
ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815335000
!63072000,])
DEBUG [SharedPool-Worker-26] 2014-10-30 19:20:18,446
AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
245bec68c5820364a72db093d5c9899b631e692006881c98f0abf4da5fbff4cd in
ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815344000
!63072000,])
DEBUG [SharedPool-Worker-20] 2014-10-30 19:20:18,446
AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
ea8dfb47177bd40f46aac4fe41d3cfea3316cf35451ace0825f46b6e0fa9e3ef in
ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815262000
!63072000,])

This is a sample of Enqueuing flush events in the storm.

On Thu, Oct 30, 2014 at 12:20 PM, Maxime  wrote:

> I will give a shot adding the logging.
>
> I've tried some experiments and I have no clue what could be happening
> anymore:
>
> I tried setting all nodes to a streamthroughput of 1 except 1, to see if
> somehow it was getting overloaded by too many streams coming in at once,
> nope.
> I went through the source at ColumnFamilyStore.java:856 where the huge
> burst of "Enqueuing flush..." occurs, and it's clearly at the moment
> memtables get converted to SSTables on disk. So I started the bootstrap
> process and using a bash script trigerred a 'nodetool flush' every minute
> during the processes. At first it seemed to work, but again after what
> seems to be a locally-trigered cue, the burst (many many thousands of
> Enqueuing flush...). But through my previous experiment, I am fairly
> certain it's not a question of volume of data coming in (throughput), or
> number of SSTables being streamed (dealing at max 150 files pr node).
>
> Does anyone know if such Enqueuing bursts are normal during bootstrap? I'd
> like to be able to say "it's because my nodes are underpowered", but at the
> moment, I'm leaning towards a bug of some kind.
>
> On Wed, Oct 29, 2014 at 3:05 PM, DuyHai Doan  wrote:
>
>> Some ideas:
>>
>> 1) Put on DEBUG log on the joining node to see what is going on in
>> details with the stream with 1500 files
>>
>> 2) Check the stream ID to see whether it's a new stream or an old one
>> pending
>>
>>
>>
>> On Wed, Oct 29, 2014 at 2:21 AM, Maxime  wrote:
>>
>>> Doan, thanks for the tip, I just read about it this morning, just
>>> waiting for the new version to pop up on the debian datastax repo.
>>>
>>> Michael, I do believe you are correct in the general running of the
>>> cluster and I've reset everything.
>>>
>>> So it took me a while to reply, I finally got the SSTables down, as seen
>>> in the OpsCenter graphs. I'm stumped however because when I bootstrap the
>>> new node, I still see very large number of files being streamed (~1500 for
>>> some nodes) and the bootstrap process is failing exactly as it did before,
>>> in a flury of "Enqueuing flush of ..."
>>>
>>> Any ideas? I'm reaching the end of what I know I can do, OpsCenter says
>>> around 32 SStables per CF, but still streaming tons of "files". :-/
>>>
>>>
>>> On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan 
>>> wrote:
>>>
 "Tombstones will be a very important issue for me since the dataset is
 very much a rolling dataset using TTLs heavily."

 --> You can try the new DateTiered compaction strategy (
 https://issues.apache.org/jira/browse/CASSANDRA-6602) released on
 2.1.1 if you have a time series data model to eliminate tombstones

 On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael <
 michael.la...@nytimes.com> wrote:

> Again, from our experience w 2.0.x:
>
> Revert to the defaults - you are manually setting heap way too high
> IMHO.
>
> On our small nodes we tried LCS - way too much compaction - switch all
> CFs to STCS.
>
> We do a major rolling compaction on our small nodes weekly during less
> busy hours - works great. Be sure you have enough disk.
>
> We never explicitly delete and only use ttls or truncation. You can
> set GC to 0 in that c

Re: OOM at Bootstrap Time

2014-10-30 Thread Maxime

Well, the answer was Secondary indexes. I am guessing they were corrupted
somehow. I dropped all of them, cleanup, and now nodes are bootstrapping
fine.

On Thu, Oct 30, 2014 at 3:50 PM, Maxime  wrote:

> I've been trying to go through the logs but I can't say I understand very
> well the details:
>
> INFO  [SlabPoolCleaner] 2014-10-30 19:20:18,446 ColumnFamilyStore.java:856
> - Enqueuing flush of loc: 7977119 (1%) on-heap, 0 (0%) off-heap
> DEBUG [SharedPool-Worker-22] 2014-10-30 19:20:18,446
> AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
> 2c95cbbb61fb8ec3bd06d70058bfa236ccad5195e48fd00c056f7e1e3fdd4368 in
> ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815026000
> !63072000,])
> DEBUG [SharedPool-Worker-6] 2014-10-30 19:20:18,446
> AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
> 41fc260427a88d2f084971702fdcb32756e0731c6042f93e9761e03db5197990 in
> ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815333000
> !63072000,])
> DEBUG [SharedPool-Worker-25] 2014-10-30 19:20:18,446
> AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
> 2e8c4dab33faade0a4fc265e4126e43dc2e58fb72830f73d7e9b8e836101d413 in
> ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815335000
> !63072000,])
> DEBUG [SharedPool-Worker-26] 2014-10-30 19:20:18,446
> AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
> 245bec68c5820364a72db093d5c9899b631e692006881c98f0abf4da5fbff4cd in
> ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815344000
> !63072000,])
> DEBUG [SharedPool-Worker-20] 2014-10-30 19:20:18,446
> AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row
> ea8dfb47177bd40f46aac4fe41d3cfea3316cf35451ace0825f46b6e0fa9e3ef in
> ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815262000
> !63072000,])
>
> This is a sample of Enqueuing flush events in the storm.
>
> On Thu, Oct 30, 2014 at 12:20 PM, Maxime  wrote:
>
>> I will give a shot adding the logging.
>>
>> I've tried some experiments and I have no clue what could be happening
>> anymore:
>>
>> I tried setting all nodes to a streamthroughput of 1 except 1, to see if
>> somehow it was getting overloaded by too many streams coming in at once,
>> nope.
>> I went through the source at ColumnFamilyStore.java:856 where the huge
>> burst of "Enqueuing flush..." occurs, and it's clearly at the moment
>> memtables get converted to SSTables on disk. So I started the bootstrap
>> process and using a bash script trigerred a 'nodetool flush' every minute
>> during the processes. At first it seemed to work, but again after what
>> seems to be a locally-trigered cue, the burst (many many thousands of
>> Enqueuing flush...). But through my previous experiment, I am fairly
>> certain it's not a question of volume of data coming in (throughput), or
>> number of SSTables being streamed (dealing at max 150 files pr node).
>>
>> Does anyone know if such Enqueuing bursts are normal during bootstrap?
>> I'd like to be able to say "it's because my nodes are underpowered", but at
>> the moment, I'm leaning towards a bug of some kind.
>>
>> On Wed, Oct 29, 2014 at 3:05 PM, DuyHai Doan 
>> wrote:
>>
>>> Some ideas:
>>>
>>> 1) Put on DEBUG log on the joining node to see what is going on in
>>> details with the stream with 1500 files
>>>
>>> 2) Check the stream ID to see whether it's a new stream or an old one
>>> pending
>>>
>>>
>>>
>>> On Wed, Oct 29, 2014 at 2:21 AM, Maxime  wrote:
>>>
 Doan, thanks for the tip, I just read about it this morning, just
 waiting for the new version to pop up on the debian datastax repo.

 Michael, I do believe you are correct in the general running of the
 cluster and I've reset everything.

 So it took me a while to reply, I finally got the SSTables down, as
 seen in the OpsCenter graphs. I'm stumped however because when I bootstrap
 the new node, I still see very large number of files being streamed (~1500
 for some nodes) and the bootstrap process is failing exactly as it did
 before, in a flury of "Enqueuing flush of ..."

 Any ideas? I'm reaching the end of what I know I can do, OpsCenter says
 around 32 SStables per CF, but still streaming tons of "files". :-/


 On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan 
 wrote:

> "Tombstones will be a very important issue for me since the dataset
> is very much a rolling dataset using TTLs heavily."
>
> --> You can try the new DateTiered compaction strategy (
> https://issues.apache.org/jira/browse/CASSANDRA-6602) released on
> 2.1.1 if you have a time series data model to eliminate tombstones
>
> On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael <
> michael.la...@nytimes.com> wrote:
>
>> Again, from our experience w 2.0.x:
>>
>> Revert to the defaults - you are manually setting heap way too high
>> IMHO.
>>
>>

bulk data load

Re: bulk data load

"Did not get positive replies from all endpoints" error on incremental repair

Re: "Did not get positive replies from all endpoints" error on incremental repair

Re: "Did not get positive replies from all endpoints" error on incremental repair

Re: OOM at Bootstrap Time

Re: Commissioning failure

Re: OOM at Bootstrap Time

Re: OOM at Bootstrap Time

9 matches

Site Navigation

Mail list logo

Footer information