bulk data load
Hi, I have to add a column and value to current data. I tried with batch statement but got warning at cassandra as WARN [Native-Transport-Requests:1423504] 2014-10-22 16:49:11,426 BatchStatement.java (line 223) Batch of prepared statements for [keyspace.table] is of size 26600, exceeding specified threshold of 5120 by 21480. After that messages, could not write to cassandra also. How should i go on? Thanks Koray Sariteke
Re: bulk data load
You should split your batch statements into smaller batches, say 100 operations per batch (or less if you keep getting those errors). You can also grow the batch_size_warn_threshold_in_kb in your cassandra.yaml a bit, I'm using 20kb in my cluster. You can read more from the relevant Jira: https://issues.apache.org/jira/browse/CASSANDRA-6487 On Thu, Oct 30, 2014 at 11:36 AM, koray mehmet wrote: > Hi, > > I have to add a column and value to current data. I tried with batch > statement but got warning at cassandra as > > WARN [Native-Transport-Requests:1423504] 2014-10-22 16:49:11,426 > BatchStatement.java (line 223) Batch of prepared statements for > [keyspace.table] is of size 26600, exceeding specified threshold of 5120 by > 21480. > > After that messages, could not write to cassandra also. > > How should i go on? > > Thanks > Koray Sariteke >
"Did not get positive replies from all endpoints" error on incremental repair
I'm having problems running nodetool repair -inc -par -pr on my 2.1.1 cluster due to "Did not get positive replies from all endpoints" error. Here's an example output: root@db08-3:~# nodetool repair -par -inc -pr [2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system' [2014-10-30 10:33:02,420] Starting repair command #10, repairing 256 ranges for keyspace profiles (seq=false, full=false) [2014-10-30 10:33:17,240] Repair failed with error Did not get positive replies from all endpoints. [2014-10-30 10:33:17,263] Starting repair command #11, repairing 256 ranges for keyspace OpsCenter (seq=false, full=false) [2014-10-30 10:33:32,242] Repair failed with error Did not get positive replies from all endpoints. [2014-10-30 10:33:32,249] Starting repair command #12, repairing 256 ranges for keyspace system_traces (seq=false, full=false) [2014-10-30 10:33:44,243] Repair failed with error Did not get positive replies from all endpoints. The local system log shows that the repair commands got started, but it seems that they immediately get cancelled due to that error, which btw can't be seen in the cassandra log. I tried monitoring all logs from all machines in case another machine would show up with some useful error, but so far I haven't found nothing. Any ideas where this error comes from? - Garo
Re: "Did not get positive replies from all endpoints" error on incremental repair
It appears to come from the ActiveRepairService.prepareForRepair portion of the Code. Are you sure all nodes are reachable from the node you are initiating repair on, at the same time? Any Node up/down/died messages? Rahul Neelakantan > On Oct 30, 2014, at 6:37 AM, Juho Mäkinen wrote: > > I'm having problems running nodetool repair -inc -par -pr on my 2.1.1 cluster > due to "Did not get positive replies from all endpoints" error. > > Here's an example output: > root@db08-3:~# nodetool repair -par -inc -pr > > [2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system' > [2014-10-30 10:33:02,420] Starting repair command #10, repairing 256 ranges > for keyspace profiles (seq=false, full=false) > [2014-10-30 10:33:17,240] Repair failed with error Did not get positive > replies from all endpoints. > [2014-10-30 10:33:17,263] Starting repair command #11, repairing 256 ranges > for keyspace OpsCenter (seq=false, full=false) > [2014-10-30 10:33:32,242] Repair failed with error Did not get positive > replies from all endpoints. > [2014-10-30 10:33:32,249] Starting repair command #12, repairing 256 ranges > for keyspace system_traces (seq=false, full=false) > [2014-10-30 10:33:44,243] Repair failed with error Did not get positive > replies from all endpoints. > > The local system log shows that the repair commands got started, but it seems > that they immediately get cancelled due to that error, which btw can't be > seen in the cassandra log. > > I tried monitoring all logs from all machines in case another machine would > show up with some useful error, but so far I haven't found nothing. > > Any ideas where this error comes from? > > - Garo >
Re: "Did not get positive replies from all endpoints" error on incremental repair
No, the cluster seems to be performing just fine. It seems that the prepareForRepair callback() could be easily modified to print which node(s) are unable to respond, so that the debugging effort could be focused better. This of course doesn't help this case as it's not trivial to add the log lines and to roll it out to the entire cluster. The cluster is relatively young, containing only 450GB with RF=3 spread over nine nodes and I'm still practicing how to run incremental repairs on the cluster when I stumbled on this issue. On Thu, Oct 30, 2014 at 12:52 PM, Rahul Neelakantan wrote: > It appears to come from the ActiveRepairService.prepareForRepair portion > of the Code. > > Are you sure all nodes are reachable from the node you are initiating > repair on, at the same time? > > Any Node up/down/died messages? > > Rahul Neelakantan > > > On Oct 30, 2014, at 6:37 AM, Juho Mäkinen > wrote: > > > > I'm having problems running nodetool repair -inc -par -pr on my 2.1.1 > cluster due to "Did not get positive replies from all endpoints" error. > > > > Here's an example output: > > root@db08-3:~# nodetool repair -par -inc -pr > > [2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system' > > [2014-10-30 10:33:02,420] Starting repair command #10, repairing 256 > ranges for keyspace profiles (seq=false, full=false) > > [2014-10-30 10:33:17,240] Repair failed with error Did not get positive > replies from all endpoints. > > [2014-10-30 10:33:17,263] Starting repair command #11, repairing 256 > ranges for keyspace OpsCenter (seq=false, full=false) > > [2014-10-30 10:33:32,242] Repair failed with error Did not get positive > replies from all endpoints. > > [2014-10-30 10:33:32,249] Starting repair command #12, repairing 256 > ranges for keyspace system_traces (seq=false, full=false) > > [2014-10-30 10:33:44,243] Repair failed with error Did not get positive > replies from all endpoints. > > > > The local system log shows that the repair commands got started, but it > seems that they immediately get cancelled due to that error, which btw > can't be seen in the cassandra log. > > > > I tried monitoring all logs from all machines in case another machine > would show up with some useful error, but so far I haven't found nothing. > > > > Any ideas where this error comes from? > > > > - Garo > > >
Re: OOM at Bootstrap Time
I will give a shot adding the logging. I've tried some experiments and I have no clue what could be happening anymore: I tried setting all nodes to a streamthroughput of 1 except 1, to see if somehow it was getting overloaded by too many streams coming in at once, nope. I went through the source at ColumnFamilyStore.java:856 where the huge burst of "Enqueuing flush..." occurs, and it's clearly at the moment memtables get converted to SSTables on disk. So I started the bootstrap process and using a bash script trigerred a 'nodetool flush' every minute during the processes. At first it seemed to work, but again after what seems to be a locally-trigered cue, the burst (many many thousands of Enqueuing flush...). But through my previous experiment, I am fairly certain it's not a question of volume of data coming in (throughput), or number of SSTables being streamed (dealing at max 150 files pr node). Does anyone know if such Enqueuing bursts are normal during bootstrap? I'd like to be able to say "it's because my nodes are underpowered", but at the moment, I'm leaning towards a bug of some kind. On Wed, Oct 29, 2014 at 3:05 PM, DuyHai Doan wrote: > Some ideas: > > 1) Put on DEBUG log on the joining node to see what is going on in details > with the stream with 1500 files > > 2) Check the stream ID to see whether it's a new stream or an old one > pending > > > > On Wed, Oct 29, 2014 at 2:21 AM, Maxime wrote: > >> Doan, thanks for the tip, I just read about it this morning, just waiting >> for the new version to pop up on the debian datastax repo. >> >> Michael, I do believe you are correct in the general running of the >> cluster and I've reset everything. >> >> So it took me a while to reply, I finally got the SSTables down, as seen >> in the OpsCenter graphs. I'm stumped however because when I bootstrap the >> new node, I still see very large number of files being streamed (~1500 for >> some nodes) and the bootstrap process is failing exactly as it did before, >> in a flury of "Enqueuing flush of ..." >> >> Any ideas? I'm reaching the end of what I know I can do, OpsCenter says >> around 32 SStables per CF, but still streaming tons of "files". :-/ >> >> >> On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan >> wrote: >> >>> "Tombstones will be a very important issue for me since the dataset is >>> very much a rolling dataset using TTLs heavily." >>> >>> --> You can try the new DateTiered compaction strategy ( >>> https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1 >>> if you have a time series data model to eliminate tombstones >>> >>> On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael < >>> michael.la...@nytimes.com> wrote: >>> Again, from our experience w 2.0.x: Revert to the defaults - you are manually setting heap way too high IMHO. On our small nodes we tried LCS - way too much compaction - switch all CFs to STCS. We do a major rolling compaction on our small nodes weekly during less busy hours - works great. Be sure you have enough disk. We never explicitly delete and only use ttls or truncation. You can set GC to 0 in that case, so tombstones are more readily expunged. There are a couple threads in the list that discuss this... also normal rolling repair becomes optional, reducing load (still repair if something unusual happens tho...). In your current situation, you need to kickstart compaction - are there any CFs you can truncate at least temporarily? Then try compacting a small CF, then another, etc. Hopefully you can get enough headroom to add a node. ml On Sun, Oct 26, 2014 at 6:24 PM, Maxime wrote: > Hmm, thanks for the reading. > > I initially followed some (perhaps too old) maintenance scripts, which > included weekly 'nodetool compact'. Is there a way for me to undo the > damage? Tombstones will be a very important issue for me since the dataset > is very much a rolling dataset using TTLs heavily. > > On Sun, Oct 26, 2014 at 6:04 PM, DuyHai Doan > wrote: > >> "Should doing a major compaction on those nodes lead to a restructuration >> of the SSTables?" --> Beware of the major compaction on SizeTiered, it >> will >> create 2 giant SSTables and the expired/outdated/tombstone columns in >> this >> big file will be never cleaned since the SSTable will never get a chance >> to >> be compacted again >> >> Essentially to reduce the fragmentation of small SSTables you can >> stay with SizeTiered compaction and play around with compaction >> properties >> (the thresholds) to make C* group a bunch of files each time it compacts >> so >> that the file number shrinks to a reasonable count >> >> Since you're using C* 2.1 and anti-compaction has been introduced, I >> hesitate advising you to use Leveled compaction as a work-a
Re: Commissioning failure
On Wed, Oct 29, 2014 at 10:39 PM, Aravindan T wrote: > What could be the reasons for the stream error other than SSTABLE > corruption? > There's tons of reasons streams fail. Cassandra team is aware of how painful it makes things, so they are working on them. Be sure that a firewall is not dropping long running connections. =Rob http://twitter.com/rcolidba
Re: OOM at Bootstrap Time
I've been trying to go through the logs but I can't say I understand very well the details: INFO [SlabPoolCleaner] 2014-10-30 19:20:18,446 ColumnFamilyStore.java:856 - Enqueuing flush of loc: 7977119 (1%) on-heap, 0 (0%) off-heap DEBUG [SharedPool-Worker-22] 2014-10-30 19:20:18,446 AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row 2c95cbbb61fb8ec3bd06d70058bfa236ccad5195e48fd00c056f7e1e3fdd4368 in ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815026000 !63072000,]) DEBUG [SharedPool-Worker-6] 2014-10-30 19:20:18,446 AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row 41fc260427a88d2f084971702fdcb32756e0731c6042f93e9761e03db5197990 in ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815333000 !63072000,]) DEBUG [SharedPool-Worker-25] 2014-10-30 19:20:18,446 AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row 2e8c4dab33faade0a4fc265e4126e43dc2e58fb72830f73d7e9b8e836101d413 in ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815335000 !63072000,]) DEBUG [SharedPool-Worker-26] 2014-10-30 19:20:18,446 AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row 245bec68c5820364a72db093d5c9899b631e692006881c98f0abf4da5fbff4cd in ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815344000 !63072000,]) DEBUG [SharedPool-Worker-20] 2014-10-30 19:20:18,446 AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row ea8dfb47177bd40f46aac4fe41d3cfea3316cf35451ace0825f46b6e0fa9e3ef in ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815262000 !63072000,]) This is a sample of Enqueuing flush events in the storm. On Thu, Oct 30, 2014 at 12:20 PM, Maxime wrote: > I will give a shot adding the logging. > > I've tried some experiments and I have no clue what could be happening > anymore: > > I tried setting all nodes to a streamthroughput of 1 except 1, to see if > somehow it was getting overloaded by too many streams coming in at once, > nope. > I went through the source at ColumnFamilyStore.java:856 where the huge > burst of "Enqueuing flush..." occurs, and it's clearly at the moment > memtables get converted to SSTables on disk. So I started the bootstrap > process and using a bash script trigerred a 'nodetool flush' every minute > during the processes. At first it seemed to work, but again after what > seems to be a locally-trigered cue, the burst (many many thousands of > Enqueuing flush...). But through my previous experiment, I am fairly > certain it's not a question of volume of data coming in (throughput), or > number of SSTables being streamed (dealing at max 150 files pr node). > > Does anyone know if such Enqueuing bursts are normal during bootstrap? I'd > like to be able to say "it's because my nodes are underpowered", but at the > moment, I'm leaning towards a bug of some kind. > > On Wed, Oct 29, 2014 at 3:05 PM, DuyHai Doan wrote: > >> Some ideas: >> >> 1) Put on DEBUG log on the joining node to see what is going on in >> details with the stream with 1500 files >> >> 2) Check the stream ID to see whether it's a new stream or an old one >> pending >> >> >> >> On Wed, Oct 29, 2014 at 2:21 AM, Maxime wrote: >> >>> Doan, thanks for the tip, I just read about it this morning, just >>> waiting for the new version to pop up on the debian datastax repo. >>> >>> Michael, I do believe you are correct in the general running of the >>> cluster and I've reset everything. >>> >>> So it took me a while to reply, I finally got the SSTables down, as seen >>> in the OpsCenter graphs. I'm stumped however because when I bootstrap the >>> new node, I still see very large number of files being streamed (~1500 for >>> some nodes) and the bootstrap process is failing exactly as it did before, >>> in a flury of "Enqueuing flush of ..." >>> >>> Any ideas? I'm reaching the end of what I know I can do, OpsCenter says >>> around 32 SStables per CF, but still streaming tons of "files". :-/ >>> >>> >>> On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan >>> wrote: >>> "Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily." --> You can try the new DateTiered compaction strategy ( https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1 if you have a time series data model to eliminate tombstones On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael < michael.la...@nytimes.com> wrote: > Again, from our experience w 2.0.x: > > Revert to the defaults - you are manually setting heap way too high > IMHO. > > On our small nodes we tried LCS - way too much compaction - switch all > CFs to STCS. > > We do a major rolling compaction on our small nodes weekly during less > busy hours - works great. Be sure you have enough disk. > > We never explicitly delete and only use ttls or truncation. You can > set GC to 0 in that c
Re: OOM at Bootstrap Time
Well, the answer was Secondary indexes. I am guessing they were corrupted somehow. I dropped all of them, cleanup, and now nodes are bootstrapping fine. On Thu, Oct 30, 2014 at 3:50 PM, Maxime wrote: > I've been trying to go through the logs but I can't say I understand very > well the details: > > INFO [SlabPoolCleaner] 2014-10-30 19:20:18,446 ColumnFamilyStore.java:856 > - Enqueuing flush of loc: 7977119 (1%) on-heap, 0 (0%) off-heap > DEBUG [SharedPool-Worker-22] 2014-10-30 19:20:18,446 > AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row > 2c95cbbb61fb8ec3bd06d70058bfa236ccad5195e48fd00c056f7e1e3fdd4368 in > ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815026000 > !63072000,]) > DEBUG [SharedPool-Worker-6] 2014-10-30 19:20:18,446 > AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row > 41fc260427a88d2f084971702fdcb32756e0731c6042f93e9761e03db5197990 in > ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815333000 > !63072000,]) > DEBUG [SharedPool-Worker-25] 2014-10-30 19:20:18,446 > AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row > 2e8c4dab33faade0a4fc265e4126e43dc2e58fb72830f73d7e9b8e836101d413 in > ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815335000 > !63072000,]) > DEBUG [SharedPool-Worker-26] 2014-10-30 19:20:18,446 > AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row > 245bec68c5820364a72db093d5c9899b631e692006881c98f0abf4da5fbff4cd in > ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815344000 > !63072000,]) > DEBUG [SharedPool-Worker-20] 2014-10-30 19:20:18,446 > AbstractSimplePerColumnSecondaryIndex.java:124 - applying index row > ea8dfb47177bd40f46aac4fe41d3cfea3316cf35451ace0825f46b6e0fa9e3ef in > ColumnFamily(loc.loc_id_idx [66652e312e31332e3830:0:false:0@1414696815262000 > !63072000,]) > > This is a sample of Enqueuing flush events in the storm. > > On Thu, Oct 30, 2014 at 12:20 PM, Maxime wrote: > >> I will give a shot adding the logging. >> >> I've tried some experiments and I have no clue what could be happening >> anymore: >> >> I tried setting all nodes to a streamthroughput of 1 except 1, to see if >> somehow it was getting overloaded by too many streams coming in at once, >> nope. >> I went through the source at ColumnFamilyStore.java:856 where the huge >> burst of "Enqueuing flush..." occurs, and it's clearly at the moment >> memtables get converted to SSTables on disk. So I started the bootstrap >> process and using a bash script trigerred a 'nodetool flush' every minute >> during the processes. At first it seemed to work, but again after what >> seems to be a locally-trigered cue, the burst (many many thousands of >> Enqueuing flush...). But through my previous experiment, I am fairly >> certain it's not a question of volume of data coming in (throughput), or >> number of SSTables being streamed (dealing at max 150 files pr node). >> >> Does anyone know if such Enqueuing bursts are normal during bootstrap? >> I'd like to be able to say "it's because my nodes are underpowered", but at >> the moment, I'm leaning towards a bug of some kind. >> >> On Wed, Oct 29, 2014 at 3:05 PM, DuyHai Doan >> wrote: >> >>> Some ideas: >>> >>> 1) Put on DEBUG log on the joining node to see what is going on in >>> details with the stream with 1500 files >>> >>> 2) Check the stream ID to see whether it's a new stream or an old one >>> pending >>> >>> >>> >>> On Wed, Oct 29, 2014 at 2:21 AM, Maxime wrote: >>> Doan, thanks for the tip, I just read about it this morning, just waiting for the new version to pop up on the debian datastax repo. Michael, I do believe you are correct in the general running of the cluster and I've reset everything. So it took me a while to reply, I finally got the SSTables down, as seen in the OpsCenter graphs. I'm stumped however because when I bootstrap the new node, I still see very large number of files being streamed (~1500 for some nodes) and the bootstrap process is failing exactly as it did before, in a flury of "Enqueuing flush of ..." Any ideas? I'm reaching the end of what I know I can do, OpsCenter says around 32 SStables per CF, but still streaming tons of "files". :-/ On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan wrote: > "Tombstones will be a very important issue for me since the dataset > is very much a rolling dataset using TTLs heavily." > > --> You can try the new DateTiered compaction strategy ( > https://issues.apache.org/jira/browse/CASSANDRA-6602) released on > 2.1.1 if you have a time series data model to eliminate tombstones > > On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael < > michael.la...@nytimes.com> wrote: > >> Again, from our experience w 2.0.x: >> >> Revert to the defaults - you are manually setting heap way too high >> IMHO. >> >>