in the new CF and then deletes the original row. By doing
this, my disk space requirements (before replication) went from over
1.1TB/year to 305GB/year.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windo
t that information? do we scan through each node row as we will have
> row for each node?
>
> thanks
>
> -Aaron Turner wrote: -
> To: user@cassandra.apache.org
> From: Aaron Turner
> Date: 08/09/2012 07:38PM
> Subject: Re: Cassandra data model help
>
> On Thu,
Curious, but does cassandra store the rowkey along with every
column/value pair on disk (pre-compaction) like Hbase does? If so
(which makes the most sense), I assume that's something that is
optimized during compaction?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
; That is, in the spesial case where you get sstable file per column/value, you
> are correct, but normally, I guess most of us are storing more per key.
>
> Regards,
> Terje
>
> On 11 Aug 2012, at 10:34, Aaron Turner wrote:
>
>> Curious, but does cassandra store the row
table.
>
> See http://wiki.apache.org/cassandra/MemtableSSTable
>
> On Sat, Aug 11, 2012 at 11:03 AM, Aaron Turner wrote:
>> So how does that work? An sstable is for a single CF, but it can and
>> likely will have multiple rows. There is no read to write and as I
>> unde
ould
>> you select to serve HTTP requests to ensure you get:
>>
>> a) The best support from the cassandra community (e.g. timely updates
>> of drivers, better stability)
>> b) Optimal efficiency between webservers and cassandra cluster, in
>> terms of the pe
caching, pooling, etc) of Cassandra 1.X.
> Right now i come to know that following client exists:
>
> 1) Hector(Java)
> 2) Thrift (Java)
> 3) Kundera (Java)
>
>
> With Regards,
> Amit
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.syn
er
or subsequent compaction activity? All my CF's I'll be writing to
use compression and leveled compaction.
Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical (not sure yet how much we'll actually load
yet, but minimally 1 years worth).
. Most
people don't use them because of the rather poor performance
characteristics SC's have.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liber
just under 4 months of data is less then 2GB! I'm pretty
thrilled.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safe
om disclosure under
> applicable law. Global Relay will not be liable for any compliance or
> technical information provided herein. All trademarks are the property of
> their respective owners.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.ne
ue, Aug 28, 2012 at 7:03 AM, Edward Capriolo wrote:
> You can consider adding -pr. When iterating through all your hosts
> like this. -pr means primary range, and will do less duplicated work.
>
> On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner wrote:
>> I use cron. On one box I j
ep.
> Secondly, what's the need for sleep 120?
just give the cluster a chance to settle down between repairs...
there's no real need for it, just is there "because".
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editi
or
read & writes. I would strongly suggest 3 nodes per DC if you care
about consistent reads. Generally speaking, 3 nodes per-DC is
considered the recommended minimum number of nodes for a production
system.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.
On Mon, Sep 10, 2012 at 10:17 PM, Morantus, James (PCLN-NW)
wrote:
> Hey folks,
>
>
>
> Can you recommend any tools to pull data from MySQL and pump it to
> Cassandra?
This: http://www.datastax.com/dev/blog/bulk-loading
--
Aaron Turner
http://synfin.net/ Twitter
e so that compactions take less space in the
> future meaning we can buy less nodes?
>
> Thanks,
> Dean
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up es
7;s freaking huge. From my conversations
with various developers 5-10MB seems far more reasonable. I guess it
really depends on your usage patterns, but that seems excessive to me-
especially as sstables are promoted.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcp
of data. I'm thinking about repairing the rolling 48 hours CF more
often and reducing the gc_grace time so that compaction has a better
chance of removing stale data from disk.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and
ble,
but even then I'd guesstimate 50MB is far more reasonable then 512MB.
-Aaron
> 2012/9/23 Aaron Turner
>>
>> On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин
>> wrote:
>> > If you think about space, use Leveled compaction! This won't only allow
>> >
On Tue, Sep 25, 2012 at 10:36 AM, Віталій Тимчишин wrote:
> See my comments inline
>
> 2012/9/25 Aaron Turner
>>
>> On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин
>> wrote:
>> > Why so?
>> > What are pluses and minuses?
>> > As
disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.
yeah, there isn't a hard limit for the number of CF's, but there
is overhead associated with each one and so I wouldn't consider your
design as scalable. Generally speaking, hundreds are ok, but
thousands is pushing it.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
h
On Thu, Sep 27, 2012 at 7:35 PM, Marcelo Elias Del Valle
wrote:
>
>
> 2012/9/27 Aaron Turner
>>
>> How strict are your security requirements? If it wasn't for that,
>> you'd be much better off storing data on a per-statistic basis then
>> per-dev
e applications
> which would be huge and then all the tables which is large, it just keeps
> growing. It is a very nice concept(all data in one location), though we
> will see how implementing it goes.
>
> How much overhead per column family in RAM? So far we have around 4000
>
meone has to already have an automated project for this, anyone
> know of one??
>
> Thanks,
> Dean
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would giv
ated tool for running
> repairs every X days(this should really be an automated/schedulable
> thing)???
I use a cron job. It's a good idea to use the '-pr' flag btw. Also,
you only need to run repair against CF's which actually have deletes.
--
Aaron Turner
http:/
h isn't replicated to all the nodes
for whatever reason, then the data can come back. Repair just
guarantees that all the nodes that should of gotten the tombstones got
them.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and repla
eparating my read heavy from write heavy CF's because generally
speaking they benefit from different compaction methods. But don't go
crazy creating 1000's of CF's either.
Hope that gives you some ideas to investigate further!
--
Aaron Turner
http://synfin.net/ Tw
> leveled compaction will kill your performance. get patch from jira for
> > maximum sstable size per CF and force cassandra to make smaller tables,
> they
> > expire faster.
> >
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.
>1) run a major compaction
>> >2) code up sstablesplit
>> >3) profit!
>> >
>> >This method incurs a management penalty if not automated, but is
>> >otherwise the most efficient way to deal with tombstones and obsolete
>> >data.. :D
>> >
&g
>> sort of EOL issues you're referring to. Unfortunately previous
>> requests on this list for such a statement have gone unanswered.
>>
>> The non-official response is that various people run in production
>> with Java 7 and it seems to work. :
a Aleixo
>> Bacharel em Ciência da Computação pela UFG
>> Mestrando em Ciência da Computação pela UFG
>> Programador no LUPA
>>
>
>
>
> --
> Everton Lima Aleixo
> Bacharel em Ciência da Computação pela UFG
> Mestrando em Ciência da Computação pela UFG
> Program
ave two questions here:
> 1) What is the timestamp column used for?
> 2) How can I retrieve this timestamp column using Hector client?
>
> Thanks in advance!
>
>
> Renato M.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap edi
hat is this
> specifically for?
> Thanks again for the help!
>
>
> Renato M.
>
> 2013/1/15 Aaron Turner :
>> I don't think so. Usually you'd use either a Time-UUID or something
>> like epoch time as the column name to get a range of columns by time
>
ere
> http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/
>
> Thanks,
> Matt
>
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liber
ained.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay
tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin
at character '+'
Column names are Long's, hence the INT = INT + INT
Ideas?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a
line 1:53 no viable alternative at character '+'
On Tue, Jul 12, 2011 at 5:35 PM, Jonathan Ellis wrote:
> Try quoting the column name.
>
> On Tue, Jul 12, 2011 at 5:30 PM, Aaron Turner wrote:
>> Using Cassandra 0.8.1 and cql 1.0.3 and following the syntax mentioned
>
ve at character '+'
Frankly, I'm about ready to open a ticket against 0.8.1 saying
CQL/Counter support does not work at all.
Or is there a trick which isn't documented in the ticket? I tried
reading the Java code referred to in ticket #2473, but i'm over my
head.
On Tue, Jul
EY =
> '1_20110728_ifoutmulticastpkts';
> cqlsh>
> _
> [default@test] list counts;
> Using default limit of 100
> ---
> RowKey: 1_20110728_ifoutmulticastpkts
> => (counter=12, value=16)
> => (counter=1310367600,
t; yes, but with regular columns, retry is OK, while counter is not.
I know I've heard that fixing this issue is "hard". I've assumed this
to mean "don't expect a fix anytime soon". Is that accurate?
Beginning to start having second thoughts that Cassandra is
on the client and try
again later, but that's not what timeout means. Without any means to
recover I've actually lost a lot of reliability that I currently have
with my single PostgreSQL server backed data store.
Right now I'm trying to come up with a way that my distributed snmp
po
On Mon, Jul 25, 2011 at 11:24 AM, Sylvain Lebresne wrote:
> On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner wrote:
>> On Sun, Jul 24, 2011 at 3:36 PM, aaron morton
>> wrote:
>>> What's your use case ? There are people out there having good times with
&
use SuperColumns since
Cassandra has to read all the supercolumns anyways, so storing as json
requires less overhead.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essent
you have a personal blog and want us to include the link, let us know.
> Feedback is always welcome.
> Thanks!
> Hector Team.
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those wh
Seems fine now.
2011/10/13 Patricio Echagüe :
> Hi Aaron. does it still happen ? We didn't set up any password on the page.
>
> On Tue, Oct 11, 2011 at 9:15 AM, Aaron Turner wrote:
>>
>> Just a FYI:
>>
>> http://hector-client.org is requesting a username/p
(since 2^16/8 = 8K)
Alternatively, you could store 16K columns per row (each column is a
/24) and each column would have 8 bytes. Off the top of my head I'm
not sure which would be faster, but the first solution would be more
disk space efficient. If you need to update your bitmasks regul
guration. I am learning on
> the job so to speak.
>
> Thank you kindly for any comments or pointers.
>
> # Cassandra store properties
> # keyspace=
> # name=
> # class=
> # qualifier=
> # family=
> # type=
> # cluster=
> # host=
>
> --
> Lewis
&g
1. Basic SQL-like summary transforms for both CQL and Thrift API clients like:
SUM
AVG
MIN
MAX
2. Native 64bit UNsigned datatype
3. Add support for matching column names via LIKE (% and _ wildcards)
for ascii type
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http
ed to leave such utilities external. At its core was "get
> and put".
> Did I miss something in my reading of intent?
> -Sarah
>
> -Original Message-
> From: Aaron Turner [mailto:synfina...@gmail.com]
> Sent: Sunday, November 06, 2011 8:25 AM
> To: user@cass
Haven't found this in the docs yet, but is the TTL the number of
seconds in the future to expire? Unix epoch time to expire?
something else?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
ct Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up
x27;t efficient for the server to do, but
the client could do that. I really don't care too much about
performance since this is a debugging/diagnostics tool.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &a
1, 2011 at 10:29 AM, Aaron Turner wrote:
> Lately I've been working on some data processing code in Cassandra and
> apparently I don't write bug-free code the very first time. :) Hence,
> while debugging, I often need to look at data in Cassandra to see what
> my code is do
ry real
> and serious performance loss, I'm working on a strategy of moving forward.
>
> If the tombstones do cause such problem, where should I be looking for
> performance bottlenecks?
> Is it disk, CPU or something else? Thing is, I don't see anything
> outstanding in
he right way to go?
>>>>> That is, the requirement is for a large data store, that can move with
>>>>> product changes and requirements swiftly.
>>>>> Given that in Cassandra one thinks hard about the queries, and then
>>>>> builds a model
n
the Rails side, not Hector/Cassandra which has been pretty rock solid
so far in my testing).
I basically wrote my own custom ORM on top of Hector. It's not AR
compliant or anything like that and pretty application specific.
Mostly it just tries to simplify the Hector API.
--
Aaron Tur
On Wed, Dec 7, 2011 at 3:59 PM, Christof Bornhoevd wrote:
> Hi All,
>
> I'm using Cassandra 1.0.3. Can I have (simple) Columns and SuperColumns
> within the same row of a SuperColumnFamily?
Nope. Personally, i avoid super columns all together.
--
Aaron Turner
readed performance of Cassandra isn't anything to write home about.
Anyways, I'm not sure I would recommend JRuby+Hector if this is the
only reason you'd use JRuby over MRI, but if you might find the
plethora of Java libraries useful it's definitely worth looking into.
--
Aaron Tu
ng what parameters I
> could tweak to improve the performance.
>
Is your client mult-threaded? The single threaded performance of
Cassandra isn't at all impressive and it really is designed for
dealing with a lot of simultaneous requests.
--
Aaron Turner
http://synfin.net/
one tombstone for the row delete, rather
then 288 for each column deleted.
I don't use compression.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to
On Wed, May 2, 2012 at 8:22 AM, Tim Wintle wrote:
> On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote:
>> Tens or a few hundred MB per row seems reasonable. You could do
>> thousands/MB if you wanted to, but that can make things harder to
>> manage.
>
> thanks (Bot
he performance? Thanks!
Have you tried using more threads on the client side? Generally
speaking, when I need faster read/write performance I look for ways to
parallelize my requests and it scales pretty much linearly.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http:/
for a given user/stat combination.
If I need to get multiple stats per user, I just use more threads on
the client side. I'm not using composite row keys (it's just
AsciiType) as that can lead to hotspots on disk. My timestamps are
also just plain unix epoch's as that takes less space
debug this further to see what is causing this?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Libert
t to write a recipe. Several
> people added content to the first edition and it would be great to see
> that type of participation again.
>
> Thank you,
> Edward
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and repl
on Hector/pycassa/etc. Of course, you still need to
write code around it, and if that's Java I'm not sure how much it
matters.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Thos
you care
about the facility and priority, then you'll need to some how encode
that in the row/column name. Otherwise you'll have to filter out
records post-query. So for read performance, chances are you'll have
to insert the information multiple times depending on your search
par
l it work? Possibly. What are the disadvantages? Well
it depends on a bunch of things you haven't told us.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential
build a 4TB disk
array, doesn't mean you can have a single Cassandra node with 4TB of
data. Typically, people around here seem to recommend ~400GB, but
that depends on hardware.
Honestly, for the price of a single computer you could test this
pretty easy. That's what I'd do.
--
Aa
ts are *cheap*. There's almost literally zero I/O
associated with a snapshot. Backing up all that data off the system
is a different story, but at least it's large sequential reads which
is pretty well optimized.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcprep
t Thrift a binary protocol?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benj
e new java driver is, but have not verified(I
> hope it is)
>
> Dean
>
> From: Aaron Turner mailto:synfina...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandr
one socket is just
> as fast as 10/20…..I would love to know the truth/answer to that though.
>
> Later,
> Dean
>
>
> From: Aaron Turner mailto:synfina...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> us
Have you tried running your code in GDB to find which line is causing the
error? That would be what I'd do first.
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for
Unix & Windows
Those who woul
Physical machines unless you're running your cluster in the cloud (AWS/etc).
Reason is simple: Look how Cassandra scales and provides redundancy.
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for
On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus wrote:
> Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we
> don't go the physical route.
>
> " Look how Cassandra scales and provides redundancy. "
> But how does it differ for physical machines or VMs (in cloud.) Or after
> yo
xample). Cloud can work (Netflix
uses Cassandra on AWS), but your performance will be a lot more consistent
on physical hardware and Cassandra like all databases likes lots of RAM
(although this can be offset some with SSD's) which tends to be expensive
in the cloud.
--
Aaron Turner
http://synf
ly
rollups. Perhaps there's even an open source project or two
implementing this sorta thing? I've found flewton
(https://github.com/flewton/flewton), which is possibly relevant, but
my Java skills are pretty non-existent so I'm having a hard time
figuring it out.
Thanks,
Aar
note: looks like the Perl API isn't being maintained well...
how's the ruby API overall? stable? performance?
Thanks!
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would gi
to 0.7.4 and ran scrub without
>>> > any error. Now 'list CF' in CLI does not return any data as followings:
>>> >
>>> > list User;
>>> > Using default limit of 100
>>> > Input length = 1
>>> >
>>> > I
81 matches
Mail list logo