Thanks a lot Ben, actually I managed to make it work erasing the SimpleDB Priam uses to keeps instances... I would pulled the last commit from the repo, not sure if it helped or not.
But you message made me curious about something... How do you do to add more Cassandra nodes on the fly? Just update the autoscale properties? I saw instaclustr.com changes the instance type as the number of nodes increase (not sure why the price also becomes higher per instance in this case), I am guessing priam use the data backed up to S3 to restore a node data in another instance, right? []s 2013/2/28 Ben Bromhead <b...@relational.io> > Off the top of my head I would check to make sure the Autoscaling Group > you created is restricted to a single Availability Zone, also Priam sets > the number of EC2 instances it expects based on the maximum instance count > you set on your scaling group (it did this last time i checked a few months > ago, it's behaviour may have changed). > > So I would make your desired, min and max instances for your scaling group > are all the same, make sure your ASG is restricted to a > single availability zone (e.g. us-east-1b) and then (if you are able to > and there is no data in your cluster) delete all the SimpleDB entries Priam > has created and then also possibly clear out the cassandra data directory. > > Other than that I see you've raised it as an issue on the Priam project > page , so see what they say ;) > > Cheers > > Ben > > On Thu, Feb 28, 2013 at 3:40 AM, Marcelo Elias Del Valle < > mvall...@gmail.com> wrote: > >> One additional important info, I checked here and the seeds seems really >> different on each node. The command >> echo `curl >> http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds`<http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds> >> returns ip2 on first node and ip1,ip1 on second node. >> Any idea why? It's probably what is causing cassandra to die, right? >> >> >> 2013/2/27 Marcelo Elias Del Valle <mvall...@gmail.com> >> >>> Hello Ben, Thanks for the willingness to help, >>> >>> 2013/2/27 Ben Bromhead <b...@instaclustr.com> >>>> >>>> Have your added the priam java agent to cassandras JVM argurments (e.g. >>>> -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar) and does >>>> the web container running priam have permissions to write to the cassandra >>>> config directory? Also what do the priam logs say? >>>> >>> >>> I put the priam log of the first node bellow. Yes, I have added >>> priam-cass-extensions to java args and Priam IS actually writting to >>> cassandra dir. >>> >>> >>>> If you want to get up and running quickly with cassandra, AWS and priam >>>> quickly check out >>>> www.instaclustr.com<http://www.instaclustr.com/?cid=cass-list>you. >>>> We deploy Cassandra under your AWS account and you have full root >>>> access to the nodes if you want to explore and play around + there is a >>>> free tier which is great for experimenting and trying Cassandra out. >>>> >>> >>> That sounded really great. I am not sure if it would apply to our case >>> (will consider it though), but some partners would have a great benefit >>> from it, for sure! I will send your link to them. >>> >>> What priam says: >>> >>> 2013-02-27 14:14:58.0614 INFO pool-2-thread-1 >>> com.netflix.priam.utils.SystemUtils Calling URL API: >>> http://169.254.169.254/latest/meta-data/public-hostname returns: >>> ec2-174-129-59-107.compute-1.amazon >>> aws.com >>> 2013-02-27 14:14:58.0615 INFO pool-2-thread-1 >>> com.netflix.priam.utils.SystemUtils Calling URL API: >>> http://169.254.169.254/latest/meta-data/public-ipv4 returns: >>> 174.129.59.107 >>> 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 >>> com.netflix.priam.utils.SystemUtils Calling URL API: >>> http://169.254.169.254/latest/meta-data/instance-id returns: i-88b32bfb >>> 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 >>> com.netflix.priam.utils.SystemUtils Calling URL API: >>> http://169.254.169.254/latest/meta-data/instance-type returns: c1.medium >>> 2013-02-27 14:14:59.0614 INFO pool-2-thread-1 >>> com.netflix.priam.defaultimpl.PriamConfiguration REGION set to us-east-1, >>> ASG Name set to dmp_cluster-useast1b >>> 2013-02-27 14:14:59.0746 INFO pool-2-thread-1 >>> com.netflix.priam.defaultimpl.PriamConfiguration appid used to fetch >>> properties is: dmp_cluster >>> 2013-02-27 14:14:59.0843 INFO pool-2-thread-1 >>> org.quartz.simpl.SimpleThreadPool Job execution threads will use class >>> loader of thread: pool-2-thread-1 >>> 2013-02-27 14:14:59.0861 INFO pool-2-thread-1 >>> org.quartz.core.SchedulerSignalerImpl Initialized Scheduler Signaller of >>> type: class org.quartz.core.SchedulerSignalerImpl >>> 2013-02-27 14:14:59.0862 INFO pool-2-thread-1 >>> org.quartz.core.QuartzScheduler Quartz Scheduler v.1.7.3 created. >>> 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 >>> org.quartz.simpl.RAMJobStore RAMJobStore initialized. >>> 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 >>> org.quartz.impl.StdSchedulerFactory Quartz scheduler >>> 'DefaultQuartzScheduler' initialized from default resource file in Quartz >>> package: 'quartz.propertie >>> s' >>> 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 >>> org.quartz.impl.StdSchedulerFactory Quartz scheduler version: 1.7.3 >>> 2013-02-27 14:14:59.0864 INFO pool-2-thread-1 >>> org.quartz.core.QuartzScheduler JobFactory set to: >>> com.netflix.priam.scheduler.GuiceJobFactory@1b6a1c4 >>> 2013-02-27 14:15:00.0239 INFO pool-2-thread-1 >>> com.netflix.priam.aws.AWSMembership Querying Amazon returned following >>> instance in the ASG: us-east-1b --> i-8eb32bfd,i-88b32bfb >>> 2013-02-27 14:15:01.0470 INFO Timer-0 org.quartz.utils.UpdateChecker New >>> update(s) found: 1.8.5 [ >>> http://www.terracotta.org/kit/reflector?kitID=default&pageID=QuartzChangeLog >>> ] >>> 2013-02-27 14:15:10.0925 INFO pool-2-thread-1 >>> com.netflix.priam.identity.InstanceIdentity Found dead instances: i-d49a0da7 >>> 2013-02-27 14:15:11.0397 ERROR pool-2-thread-1 >>> com.netflix.priam.aws.SDBInstanceFactory Conditional check failed. >>> Attribute (instanceId) value exists >>> 2013-02-27 14:15:11.0398 ERROR pool-2-thread-1 >>> com.netflix.priam.utils.RetryableCallable Retry #1 for: Status Code: 409, >>> AWS Service: AmazonSimpleDB, AWS Request ID: >>> 96ca7ae5-f352-b13a-febd-8801d46fe >>> e83, AWS Error Code: ConditionalCheckFailed, AWS Error Message: >>> Conditional check failed. Attribute (instanceId) value exists >>> 2013-02-27 14:15:11.0686 INFO pool-2-thread-1 >>> com.netflix.priam.aws.AWSMembership Querying Amazon returned following >>> instance in the ASG: us-east-1b --> i-8eb32bfd,i-88b32bfb >>> 2013-02-27 14:15:25.0258 INFO pool-2-thread-1 >>> com.netflix.priam.identity.InstanceIdentity Found dead instances: i-d89a0dab >>> 2013-02-27 14:15:25.0588 INFO pool-2-thread-1 >>> com.netflix.priam.identity.InstanceIdentity Trying to grab slot 1808575601 >>> with availability zone us-east-1b >>> 2013-02-27 14:15:25.0732 INFO pool-2-thread-1 >>> com.netflix.priam.identity.InstanceIdentity My token: >>> 56713727820156410577229101240436610842 >>> 2013-02-27 14:15:25.0732 INFO pool-2-thread-1 >>> org.quartz.core.QuartzScheduler Scheduler >>> DefaultQuartzScheduler_$_NON_CLUSTERED started. >>> 2013-02-27 14:15:25.0878 INFO pool-2-thread-1 >>> org.apache.cassandra.db.HintedHandOffManager cluster_name: dmp_cluster >>> initial_token: null >>> hinted_handoff_enabled: true >>> max_hint_window_in_ms: 8 >>> hinted_handoff_throttle_in_kb: 1024 >>> max_hints_delivery_threads: 2 >>> authenticator: org.apache.cassandra.auth.AllowAllAuthenticator >>> authorizer: org.apache.cassandra.auth.AllowAllAuthorizer >>> partitioner: org.apache.cassandra.dht.RandomPartitioner >>> data_file_directories: >>> - /var/lib/cassandra/data >>> commitlog_directory: /var/lib/cassandra/commitlog >>> disk_failure_policy: stop >>> key_cache_size_in_mb: null >>> key_cache_save_period: 14400 >>> row_cache_size_in_mb: 0 >>> row_cache_save_period: 0 >>> row_cache_provider: SerializingCacheProvider >>> saved_caches_directory: /var/lib/cassandra/saved_caches >>> commitlog_sync: periodic >>> commitlog_sync_period_in_ms: 10000 >>> commitlog_segment_size_in_mb: 32 >>> seed_provider: >>> - class_name: com.netflix.priam.cassandra.extensions.NFSeedProvider >>> parameters: >>> - seeds: 127.0.0.1 >>> flush_largest_memtables_at: 0.75 >>> reduce_cache_sizes_at: 0.85 >>> reduce_cache_capacity_to: 0.6 >>> concurrent_reads: 32 >>> concurrent_writes: 32 >>> memtable_flush_queue_size: 4 >>> trickle_fsync: false >>> trickle_fsync_interval_in_kb: 10240 >>> storage_port: 7000 >>> ssl_storage_port: 7001 >>> listen_address: null >>> start_native_transport: false >>> native_transport_port: 9042 >>> start_rpc: true >>> rpc_address: null >>> rpc_port: 9160 >>> rpc_keepalive: true >>> rpc_server_type: sync >>> thrift_framed_transport_size_in_mb: 15 >>> thrift_max_message_length_in_mb: 16 >>> incremental_backups: true >>> snapshot_before_compaction: false >>> auto_snapshot: true >>> column_index_size_in_kb: 64 >>> in_memory_compaction_limit_in_mb: 128 >>> multithreaded_compaction: false >>> compaction_throughput_mb_per_sec: 8 >>> compaction_preheat_key_cache: true >>> read_request_timeout_in_ms: 10000 >>> range_request_timeout_in_ms: 10000 >>> write_request_timeout_in_ms: 10000 >>> truncate_request_timeout_in_ms: 60000 >>> request_timeout_in_ms: 10000 >>> cross_node_timeout: false >>> endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch >>> dynamic_snitch_update_interval_in_ms: 100 >>> dynamic_snitch_reset_interval_in_ms: 600000 >>> dynamic_snitch_badness_threshold: 0.1 >>> request_scheduler: org.apache.cassandra.scheduler.NoScheduler >>> index_interval: 128 >>> server_encryption_options: >>> internode_encryption: none >>> keystore: conf/.keystore >>> keystore_password: cassandra >>> truststore: conf/.truststore >>> truststore_password: cassandra >>> client_encryption_options: >>> enabled: false >>> keystore: conf/.keystore >>> keystore_password: cassandra >>> internode_compression: all >>> inter_dc_tcp_nodelay: true >>> auto_bootstrap: true >>> memtable_total_space_in_mb: 1024 >>> stream_throughput_outbound_megabits_per_sec: 400 >>> num_tokens: 1 >>> >>> 2013-02-27 14:15:25.0884 INFO pool-2-thread-1 >>> com.netflix.priam.utils.SystemUtils Starting cassandra server ....Join >>> ring=true >>> 2013-02-27 14:15:25.0915 INFO pool-2-thread-1 >>> com.netflix.priam.utils.SystemUtils Starting cassandra server .... >>> 2013-02-27 14:15:30.0013 INFO http-bio-8080-exec-1 >>> com.netflix.priam.aws.AWSMembership Query on ASG returning 3 instances >>> 2013-02-27 14:15:31.0726 INFO http-bio-8080-exec-2 >>> com.netflix.priam.aws.AWSMembership Query on ASG returning 3 instances >>> 2013-02-27 14:15:37.0360 INFO DefaultQuartzScheduler_Worker-5 >>> com.netflix.priam.aws.S3FileSystem Uploading to >>> backup/us-east-1/dmp_cluster/56713727820156410577229101240436610842/201302271415/SST/system/local/system-local-ib-1-CompressionInfo.db >>> with chunk size 10485760 >>> >>> >>> >>> Best regards, >>> -- >>> Marcelo Elias Del Valle >>> http://mvalle.com - @mvallebr >>> >> >> >> >> -- >> Marcelo Elias Del Valle >> http://mvalle.com - @mvallebr >> > > > > -- > Ben Bromhead > > Co-founder > *relational.io* | @benbromhead <https://twitter.com/BenBromhead> | ph: +61 > 415 936 359 > -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr