Combining Cassandra with some SQL language
Hi there, I'm currently busy with the technical design of a new project. Of course it will depend on your needs, but is it weird to combine Cassandra with a SQL language like MySQL? In my usecase it would be nice because we have some tables/CF's with lots and lots of data that does not really have to be consistent 100%, but also have some data that should be always consistent. What do you think of this? With kind regards, Robin Verlangen
Re: Combining Cassandra with some SQL language
On Sun, Feb 26, 2012 at 1:06 PM, R. Verlangen wrote: > I'm currently busy with the technical design of a new project. Of course it > will depend on your needs, but is it weird to combine Cassandra with a SQL > language like MySQL? > > In my usecase it would be nice because we have some tables/CF's with lots > and lots of data that does not really have to be consistent 100%, but also > have some data that should be always consistent. > > What do you think of this? It seems entirely reasonable to hybridise your stack to take advantage of the qualities of different data stores. The tradeoff is your system will have more moving parts, increasing its learning curve, complicating provisioning, etc. Where I work, we moved a lot of our domain out of MySQL into Cassandra, and are now porting select parts of the domain that change infrequently but require greater consistency back into MySQL. We are also using other forms of storage (Redis and S3). -- Benjamin Hawkes-Lewis
Re: Combining Cassandra with some SQL language
I've been using a combination of MySQL and Cassandra for about a year now on a project that now serves about 20k users. We use Cassandra for storing large entities and MySQL to store meta data that allows us to do better ad hoc querying. It's worked quite well for us. During this time we have also been able to migrate some of our tables in MySQL to Cassandra if MySQL performance / capacity became a problem. This may seem obvious but if you're planning on creating a data model that spans multiple databases make sure you encapsulate the logic to read/write/delete information in a good data model library and only use that library to access your data. This is good practice anyway but when you add the extra complication of multiple databases that may reference one another it's an absolute must. On Sun, Feb 26, 2012 at 8:06 AM, R. Verlangen wrote: > Hi there, > > I'm currently busy with the technical design of a new project. Of course > it will depend on your needs, but is it weird to combine Cassandra with a > SQL language like MySQL? > > In my usecase it would be nice because we have some tables/CF's with lots > and lots of data that does not really have to be consistent 100%, but also > have some data that should be always consistent. > > What do you think of this? > > With kind regards, > Robin Verlangen >
Re: Frequency of Flushing in 1.0
> if a node goes down, it will take longer for commitlog replay. commit log replay time is insignificant. most time during node startup is wasted on index sampling. Index sampling here runs for about 15 minutes.
Re: Frequency of Flushing in 1.0
If you are doing a planned maintenance you can flush first as well ensuring the that the commit logs will not be as large. On Sun, Feb 26, 2012 at 10:09 AM, Radim Kolar wrote: >> if a node goes down, it will take longer for commitlog replay. > > commit log replay time is insignificant. most time during node startup is > wasted on index sampling. Index sampling here runs for about 15 minutes.
Re: unidirectional communication/replication
All nodes in the cluster need two way communication. Nodes need to talk to Gossip to each other so they know they are alive. If you need to dump a lot of data consider the Hadoop integration. http://wiki.apache.org/cassandra/HadoopSupport It can run a bit faster than going through the thrift api. Copying sstables may be another option depending on the data size. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/02/2012, at 3:21 AM, Alexandru Sicoe wrote: > Hello everyone, > > I'm battling with this contraint that I have: I need to regularly ship out > timeseries data from a Cassandra cluster that sits within an enclosed > network, outside of the network. > > I tried to select all the data within a certian time window, writing to a > file, and then copying the file out but this hits the I/O performance because > even for a small time window (say 5mins) I am hitting more than a million > rows. > > It would really help if I used Cassandra to replicate the data automatically > outside. The problem is they will only allow me to have outbound traffic out > of the enclosed network (not inbound). Is there any way to configure the > cluster or have 2 data centers in such a way that the data center (node or > cluster) outside of the enclosed network only gets a replica of the data, > without ever needing to communicate anything back? > > I appreciate the help, > Alex
Re: How to delete a range of columns using first N components of CompositeType Column?
it has been discussed a few times :) https://issues.apache.org/jira/browse/CASSANDRA-494 A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/02/2012, at 8:06 AM, Praveen Baratam wrote: > Thank you Aaron for the clarification. > > May be this could be a feature that Cassandra team should consider > implementing. Instead of two network round trips the logic could be > consolidated on the server side if read before range delete is unavoidable. > > On Fri, Feb 24, 2012 at 12:46 AM, aaron morton > wrote: > Unfortunately you can use column ranges for delete operations. > > So while what you want to do is something like... > > Delete 'Jack:*:*'...'Jack:*:*' from Test where KEY = "friends"; > > You cannot do it. > > You need to read and then delete by name. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 23/02/2012, at 8:08 PM, Praveen Baratam wrote: > >> More precisely, >> >> Lets say we have a CF with the following spec. >> >> create column family Test >> with comparator = 'CompositeType(UTF8Type,UTF8Type,UTF8Type)' >> and key_validation_class = 'UTF8Type' >> and default_validation_class = 'UTF8Type'; >> >> And I have columns such as: >> >> Jack:Name:First - Jackson >> Jack:Name:Last - Samuel >> Jack:Age - 50 >> >> Now To delete all columns related to Jack, I need to use as far as I can >> comprehend >> >> Delete 'Jack:Name:First', 'Jack:Name:Last', 'Jack:Age' from Test where KEY = >> "friends"; >> >> The problem is we do not usually know what meta-data is associated with a >> user as it may include Timestamp based columns. >> >> such as: Jack:1234567890:Location - Chicago >> >> Can something like - >> >> Delete 'Jack' from Test where KEY = "friends"; >> >> be done using the First N components of the CompositeType? >> >> Or should we read first and then delete? >> >> Thank You. >> >> On Thu, Feb 23, 2012 at 4:47 AM, Praveen Baratam >> wrote: >> I am using CompositeType columns and its very convenient to query for a >> range of columns using the First N components but how do I delete a range of >> columns using the First N components of the CompositeType column. >> >> In order to specify the exact column names to delete, I would have to read >> first and then delete. >> >> Is there a better way? >> > >
Re: Server crashed due to "OutOfMemoryError: Java heap space"
> several compactions on few 200-300 GB SSTables Sounds like some big files. Out of interest how much data do you have per node ? Also do you have wide rows ? Can check via nodetool cfstats. In cases where OOM / GC is related to compaction these are the steps i take first. It's heavy handed and will probably increase the IO load. Once you stabilise you should see if you can increase them. in cassandra.yaml * set concurrent_compactors to 2 - this will reduce the number of concurrent compactions. * if you have wide rows reduce in_memory_compaction_limit_in_mb to 32 or lower. (as you are on 0.8.X also check memtable_total_space_in_mb is enabled) Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/02/2012, at 10:14 AM, Feng Qu wrote: > Hello, > > We have a 6-node ring running 0.8.6 on RHEL 6.1. The first node also runs > OpsCenter community. This node has crashed few time recently with > "OutOfMemoryError: Java heap space" while several compactions on few 200-300 > GB SSTables were running. We are using 8GB Java heap on host with 96GB RAM. > > I would appreciate for help to figure out the root cause and solution. > > Feng Qu > > > INFO [GossipTasks:1] 2012-02-22 13:15:59,135 Gossiper.java (line 697) > InetAddress /10.89.74.67 is now dead. > INFO [ScheduledTasks:1] 2012-02-22 13:16:12,114 StatusLogger.java (line 65) > ReadStage 0 0 0 > ERROR [CompactionExecutor:10538] 2012-02-22 13:16:12,115 > AbstractCassandraDaemon.java (line 139) Fatal exception in thread > Thread[CompactionExecutor:10538,1, > main] > java.lang.OutOfMemoryError: Java heap space > at > org.apache.cassandra.io.util.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:123) > at > org.apache.cassandra.io.sstable.SSTableScanner.(SSTableScanner.java:57) > at > org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:664) > at > org.apache.cassandra.db.compaction.CompactionIterator.getCollatingIterator(CompactionIterator.java:92) > at > org.apache.cassandra.db.compaction.CompactionIterator.(CompactionIterator.java:68) > at > org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:553) > at > org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:507) > at > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:142) > at > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:108) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > INFO [GossipTasks:1] 2012-02-22 13:16:12,115 Gossiper.java (line 697) > InetAddress /10.2.128.55 is now dead. > ERROR [Thread-734] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java (line > 139) Fatal exception in thread Thread[Thread-734,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) > at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136) > ERROR [Thread-68450] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[Thread-68450,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60) > at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) > at > java.util.concurrent.ThreadPoolExecutor.ensureQueuedTaskHandled(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) > at > org.apache.cassandra.net.MessagingService.receive(MessagingService.java:490) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:136) > ERROR [Thread-731] 2012-02-22 13:16:48,189 AbstractCassandraDaemon.java (line > 139) Fatal exception in thread Thread[Thread-731,5,main] > java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut > down > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThread
Re: Querying all keys in a column family
When you say query 1 million records in my mind i'm saying "dump 1 million records to another system as a back office job". Hadoop will split the job over multiple nodes and will assign a task to read the range "owned" by each node. From memory it uses CL ONE (by default) for the read so the node it is connected to is the only one involved in the read. Also the task can be run on the node rather than off node. This does not magic up up some new IO capacity though. It will spread the work load so to add IO capacity add nodes. You could do something similar by reducing the CL level and querying through the thrift interface. Then only ask a node for data in the key range it "owns". If this does not help the next step is to borrow from the ideas in Data Stax Brisk (now Data Stax Enterprise). Use the NetworkTopologyStrategy and two data centres (or a Virtual Data Centre http://wiki.apache.org/cassandra/HadoopSupport). One DC is for OLTP and the other for OLAP / Export. The OLTP side will be able to run without interruption from the OLAP side. Another option is use something like Kafka and fork the data stream, send it to cassandra and the external system at the same time. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/02/2012, at 2:21 PM, Martin Arrowsmith wrote: > Hi Alexandru, > > Things got hectic and I put off the project until this weekend. I'm actually > learning about Hadoop right now and how to implement it. I can respond to > this thread when I have something running. > > In the meantime, I'd like to bump this email up and see if there are others > who can provide some feedback. 1) Will Hadoop speed up the time to read all > the rows? 2) Are there other options? > > My guess was that hadoop could split up your jobs, so each node could handle > a portion of the query. For instance, having 2 nodes would do the job twice > as fast. That is my naive guess though and could be far from the truth. > > Best wishes, > > Martin > > On Fri, Feb 24, 2012 at 5:29 AM, Alexandru Sicoe wrote: > Hi Aaron and Martin, > > Sorry about my previous reply, I thought you wanted to process only all the > row keys in CF. > > I have a similar issue as Martin because I see myself being forced to hit > more than a million rows with a query (I only get a few columns from every > row). Aaron, we've talked about this in another thread, basically I am > constrained to ship out a window of data from my online cluster to an offline > cluster. For this I need to read for example 5 min window of all the data I > have. This simply accesses too many rows and I am hitting the I/O limit on > the nodes. As I understand for every row it will do 2 random disk seeks (I > have no caches). > > My question is, what can I do to improve the performance of shipping windows > of data entirely out? > > Martin, did you use Hadoop as Aaron suggested? How did that work with > Cassandra? I don't understand how accessing 1 million of rows through map > reduce jobs be any faster? > > Cheers, > Alexandru > > > > On Tue, Feb 14, 2012 at 10:00 AM, aaron morton > wrote: > If you want to process 1 million rows use Hadoop with Hive or Pig. If you use > Hadoop you are not doing things in real time. > > You may need to rephrase the problem. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 14/02/2012, at 11:00 AM, Martin Arrowsmith wrote: > >> Hi Experts, >> >> My program is such that it queries all keys on Cassandra. I want to do this >> as quick as possible, in order to get as close to real-time as possible. >> >> One solution I heard was to use the sstables2json tool, and read the data in >> as JSON. I understand that reading from each line in Cassandra might take >> longer. >> >> Are there any other ideas for doing this ? Or can you confirm that >> sstables2json is the way to go. >> >> Querying 100 rows in Cassandra the normal way is fast enough. I'd like to >> query a million rows, do some calculations on them, and spit out the result >> like it's real time. >> >> Thanks for any help you can give, >> >> Martin > > >
Re: Frequency of Flushing in 1.0
Nathan Milford has a post about taking a node down http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ The only thing I would do differently would be turn off thrift first. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/02/2012, at 4:35 AM, Edward Capriolo wrote: > If you are doing a planned maintenance you can flush first as well > ensuring the that the commit logs will not be as large. > > On Sun, Feb 26, 2012 at 10:09 AM, Radim Kolar wrote: >>> if a node goes down, it will take longer for commitlog replay. >> >> commit log replay time is insignificant. most time during node startup is >> wasted on index sampling. Index sampling here runs for about 15 minutes.
how to cast traditional sql schema to nosql
Hi all, I'm newbie in nosql and can't understand how to create nosql style schema. First, I what describe my problem: I need to store results of tests. Each test consists of a list of parameters(if tests have the same list of parameters that means, two tests belong to the same testcase), tag or tags, and test result, For exmaple: Test1 : params: -user_role:admin -miss_captcha:true -test_name:login_test -locales:en,es,fr -- as you can see, parameter can be the list. testcase: testcase_1_id -- test case id formed as md5 of params. tags: -aaa_site_test -smoke result: -passed -some_other_results_stuff( logs, errors' codes and so on ) start_time: 1330287048 ( time stamp) Test2 : params: -user_role:admin -miss_captcha:true -test_name:login_test -locales:en,es,fr -- as you can see, parameter can be the list. testcase: testcase_1_id -- test case id formed as md5 of params. tags: -aaa_site_test -function_tests result: -failed -some_other_results_stuff( logs, errors' codes and so on ) start_time: 1330290648 Test3 : params: -user_role:user -miss_captcha:true -test_name:change_password -locales:en testcase: testcase_2_id -- test case id formed as md5 of params. tags: -bbb_site_test -function_tests result: -failed -some_other_results_stuff( logs, errors' codes and so on ) start_time: 1330290648 So, above you can see 3 tests, the first two belong to the same testcase, but test 1 and test 2 are different test runs, also they have different tags. Test 3 one more test case. Usually I will need to execute the following queries: 1)Get latest result for specific tag/tags, for exmale: Get latest result for aaa_site. Result should be: Test2 result, because test 1 and test 2 is the same test case, but test 2 is newer. 2)Or get latest result for locale == es, result is test 2. 3)Get the latest results for each test case, result is: test 2, test 3. 4)Get get history for test case 1, result: test 1 and test 2. I create the following schema: TestRuns: *test run id(key) | test case id | start_time | result id* test_1_id| testcase_1_id | 1330287048 | result_1 test_2_id| testcase_1_id | 1330290648 | result_2 test_3_id| testcase_2_id | 1330290648 | result_3 Result: *result id | result_value | other stuff...* result_1 | passed | ... result_2 | failed | ... result_3 | failed | ... ParamsAndTags:( for tags I put $tag for tagParamName, $ - for case if we have parameter with name 'tag' ) *key (not used, but required by cassandra)| test run id | tagParamName | value* some key |test_1_id| $tag | aaa_site_test some key |test_1_id| $tag | smoke some key |test_1_id| user_role| admin some key |test_1_id| miss_captcha | true some key |test_1_id| test_name| login_test some key |test_1_id| locales | en --- list is splited some key |test_1_id| locales | es --- list is splited some key |test_1_id| locales | fr --- list is splited and so on... But it's look very heavy to perform queries. To take latest result for tag aaa_site_test and with locale es I need perform the following steps: Fetch all rows from ParamsAndTags with tag aaa_site_test, then fetch all rows for param locale == es. Then find intersection of first and second result so I receive test runs id, but this is not the end. After that I should fetch test runs and in result find the latest results only. As you can see for that simple query I should perform 3 query to DB and a lot of work inside my application to merge results and filter latests results. I'am afraid it will work too slowly. Can someone advise more nosql solution for this task?
Re: Frequency of Flushing in 1.0
On Sun, Feb 26, 2012 at 12:18 PM, aaron morton wrote: > Nathan Milford has a post about taking a node down > > http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ > > The only thing I would do differently would be turn off thrift first. > > Cheers > Isn't decomission meant to do the same thing as disablethrift and gossip? > > >- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 27/02/2012, at 4:35 AM, Edward Capriolo wrote: > > If you are doing a planned maintenance you can flush first as well > ensuring the that the commit logs will not be as large. > > On Sun, Feb 26, 2012 at 10:09 AM, Radim Kolar wrote: > > if a node goes down, it will take longer for commitlog replay. > > > commit log replay time is insignificant. most time during node startup is > > wasted on index sampling. Index sampling here runs for about 15 minutes. > > >
CounterColumn java.lang.AssertionError: Wrong class type.
Using v1.0.7, we see many of the following errors. Any thoughts on why this is occurring? Thanks in advance. -gary ERROR [ReadRepairStage:9] 2012-02-24 18:31:28,623 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReadRepairStage:9,5,main] java.lang.AssertionError: Wrong class type. at org.apache.cassandra.db.CounterColumn.diff(CounterColumn.java:112) at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:230) at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:309) at org.apache.cassandra.service.RowRepairResolver.scheduleRepairs(RowRepairReso lver.java:117) at org.apache.cassandra.service.RowRepairResolver.resolve(RowRepairResolver.jav a:94) at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCa llback.java:54) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 10) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 03) at java.lang.Thread.run(Thread.java:722) ERROR [ReadRepairStage:9] 2012-02-24 18:31:28,625 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReadRepairStage:9,5,main] -- >From cassandra-cli "show schema", I think the relevant CF is: create column family QOSCounters with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'CounterColumnType' and key_validation_class = 'UTF8Type' and rows_cached = 0.0 and row_cache_save_period = 0 and row_cache_keys_to_save = 2147483647 and keys_cached = 20.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 604800 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'SerializingCacheProvider' and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';
Re: Frequency of Flushing in 1.0
The challenge that we face is that our commitlog disk capacity is much much less (under 10 GB in some cases) than the disk capacity of SSTables. So we cannot really have the commitlog data continuously growing. This is the reason that we need to be able to tune the the way we flush the memtables. >From this link - http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management, it looks like "commitlog_total_space_in_mb" is the parameter to control the rate at which memtables get flushed. Also it seems "memtable_total_space_in_mb" is also another setting to play with. we are planning to do some load testing with changes to these two settings, but can anyone confirm that i am headed in the right direction? Or any other pointers on this? On Sun, Feb 26, 2012 at 5:26 PM, Mohit Anchlia wrote: > > > On Sun, Feb 26, 2012 at 12:18 PM, aaron morton wrote: > >> Nathan Milford has a post about taking a node down >> >> http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ >> >> The only thing I would do differently would be turn off thrift first. >> >> Cheers >> > > Isn't decomission meant to do the same thing as disablethrift and gossip? > >> >> >>- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 27/02/2012, at 4:35 AM, Edward Capriolo wrote: >> >> If you are doing a planned maintenance you can flush first as well >> ensuring the that the commit logs will not be as large. >> >> On Sun, Feb 26, 2012 at 10:09 AM, Radim Kolar wrote: >> >> if a node goes down, it will take longer for commitlog replay. >> >> >> commit log replay time is insignificant. most time during node startup is >> >> wasted on index sampling. Index sampling here runs for about 15 minutes. >> >> >> >
Cassandra 1.1 beta on Maven?
Hi, I could not find cassandra 1.1 jars on maven repo. Can a beta version be released? Thanks, Praveen
Re: Frequency of Flushing in 1.0
>> if a node goes down, it will take longer for commitlog replay. > > commit log replay time is insignificant. most time during node startup is > wasted on index sampling. Index sampling here runs for about 15 minutes. Depends entirely on your situation. If you have few keys and lots of writes, index sampling will be insignificant. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
RE: Combining Cassandra with some SQL language
Kundera (https://github.com/impetus-opensource/Kundera)- an open source APL Java ORM allows polyglot persistence between RDBMS and NoSQL databases such as Cassandra, MongoDB, HBase etc. transparently to the business logic developer. A note of caution- this does not mean that Cassandra data modeling can be bypassed- NoSQL entities still need to be modeled in such a way so as to best use Cassandra capabilities. Kundera can also take care of relationship between the entities in RDBMS. Transactions management is still pending however. Regards, Sanjay From: Adam Haney [mailto:adam.ha...@retickr.com] Sent: Sunday, February 26, 2012 7:51 PM To: user@cassandra.apache.org Subject: Re: Combining Cassandra with some SQL language I've been using a combination of MySQL and Cassandra for about a year now on a project that now serves about 20k users. We use Cassandra for storing large entities and MySQL to store meta data that allows us to do better ad hoc querying. It's worked quite well for us. During this time we have also been able to migrate some of our tables in MySQL to Cassandra if MySQL performance / capacity became a problem. This may seem obvious but if you're planning on creating a data model that spans multiple databases make sure you encapsulate the logic to read/write/delete information in a good data model library and only use that library to access your data. This is good practice anyway but when you add the extra complication of multiple databases that may reference one another it's an absolute must. On Sun, Feb 26, 2012 at 8:06 AM, R. Verlangen mailto:ro...@us2.nl>> wrote: Hi there, I'm currently busy with the technical design of a new project. Of course it will depend on your needs, but is it weird to combine Cassandra with a SQL language like MySQL? In my usecase it would be nice because we have some tables/CF's with lots and lots of data that does not really have to be consistent 100%, but also have some data that should be always consistent. What do you think of this? With kind regards, Robin Verlangen Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 'Big Data Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets & Smartphones' available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: newer Cassandra + Hadoop = TimedOutException()
On Sun, Feb 26, 2012 at 04:25, Edward Capriolo wrote: > Did you see the notes here? I'm not sure what do you mean by the notes? I'm using the mapred.* settings suggested there: mapred.max.tracker.failures 20 mapred.map.max.attempts 20 mapred.reduce.max.attempts 20 But I still see the timeouts that I haven't with cassandra-all 0.8.7. P. > http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
Re: Combining Cassandra with some SQL language
Ok, thank you all for your opinions. Seems that I can continue without any extra db-model headaches ;-) 2012/2/27 Sanjay Sharma > Kundera (https://github.com/impetus-opensource/Kundera)- an open source > APL Java ORM allows polyglot persistence between RDBMS and NoSQL databases > such as Cassandra, MongoDB, HBase etc. transparently to the business logic > developer. > > > > A note of caution- this does not mean that Cassandra data modeling can be > bypassed- NoSQL entities still need to be modeled in such a way so as to > best use Cassandra capabilities. > > Kundera can also take care of relationship between the entities in RDBMS. > Transactions management is still pending however. > > > > > > Regards, > > Sanjay > > *From:* Adam Haney [mailto:adam.ha...@retickr.com] > *Sent:* Sunday, February 26, 2012 7:51 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Combining Cassandra with some SQL language > > > > I've been using a combination of MySQL and Cassandra for about a year now > on a project that now serves about 20k users. We use Cassandra for storing > large entities and MySQL to store meta data that allows us to do better ad > hoc querying. It's worked quite well for us. During this time we have also > been able to migrate some of our tables in MySQL to Cassandra if MySQL > performance / capacity became a problem. This may seem obvious but if > you're planning on creating a data model that spans multiple databases make > sure you encapsulate the logic to read/write/delete information in a good > data model library and only use that library to access your data. This is > good practice anyway but when you add the extra complication of multiple > databases that may reference one another it's an absolute must. > > On Sun, Feb 26, 2012 at 8:06 AM, R. Verlangen wrote: > > Hi there, > > > > I'm currently busy with the technical design of a new project. Of course > it will depend on your needs, but is it weird to combine Cassandra with a > SQL language like MySQL? > > > > In my usecase it would be nice because we have some tables/CF's with lots > and lots of data that does not really have to be consistent 100%, but also > have some data that should be always consistent. > > > > What do you think of this? > > With kind regards, > > Robin Verlangen > > > > -- > > Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big > Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) > http://bit.ly/bSMWd7. > > Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets & > Smartphones’ available at http://bit.ly/yQC1oD. > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. >