date:20190117

Re: Upgrade to v3.11.3

2019-01-17 Thread shalom sagges

Thanks a lot Anuj!



On Wed, Jan 16, 2019 at 4:56 PM Anuj Wadehra  wrote:

> Hi Shalom,
>
> Just a suggestion. Before upgrading to 3.11.3 make sure you are not
> impacted by any open crtitical defects especially related to RT which may
> cause data loss e.g.14861.
>
> Please find my response below:
>
> The upgrade process that I know of is from 2.0.14 to 2.1.x (higher than
> 2.1.9 I think) and then from 2.1.x to 3.x. Do I need to upgrade first to
> 3.0.x or can I upgraded directly from 2.1.x to 3.11.3?
>
> Response: Yes, you can upgrade from 2.0.14 to some latest stable version
> of 2.1.x (only 2.1.9+)  and then upgrade to 3.11.3.
>
> Can I run upgradesstables on several nodes in parallel? Is it crucial to
> run it one node at a time?
>
> Response: Yes, you can run in parallel.
>
>
> When running upgradesstables on a node, does that node still serves writes
> and reads?
>
> Response: Yes.
>
>
> Can I use open JDK 8 (instead of Oracle JDK) with C* 3.11.3?
>
> Response: We have not tried but it should be okay. See
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13916
> .
>
>
> Is there a way to speed up the upgradesstables process? (besides
> compaction_throughput)
>
>
> Response: If clearing pending compactions caused by rewriting sstable is a
> concern,probably you can also try increasing concurrent compactors.
>
>
>
> Disclaimer: The information provided in above response is my personal
> opinion based on the best of my knowledge and experience. We do
> not take any responsibility and we are not liable for any damage caused by
> actions taken based on above information.
> Thanks
> Anuj
>
>
> On Wed, 16 Jan 2019 at 19:15, shalom sagges
>  wrote:
> Hi All,
>
> I'm about to start a rolling upgrade process from version 2.0.14 to
> version 3.11.3.
> I have a few small questions:
>
>1. The upgrade process that I know of is from 2.0.14 to 2.1.x (higher
>than 2.1.9 I think) and then from 2.1.x to 3.x. Do I need to upgrade first
>to 3.0.x or can I upgraded directly from 2.1.x to 3.11.3?
>
>2. Can I run upgradesstables on several nodes in parallel? Is it
>crucial to run it one node at a time?
>
>3. When running upgradesstables on a node, does that node still serves
>writes and reads?
>
>4. Can I use open JDK 8 (instead of Oracle JDK) with C* 3.11.3?
>
>5. Is there a way to speed up the upgradesstables process? (besides
>compaction_throughput)
>
>
> Thanks!
>
>

Partition key with 300K rows can it be queried and distributed using Spark

2019-01-17 Thread Goutham reddy

Hi,
As each partition key can hold up to 2 Billion rows, even then it is an
anti-pattern to have such huge data set for one partition key in our case
it is 300k rows only, but when trying to query for one particular key we
are getting timeout exception. If I use Spark to get the 300k rows for a
particular key does it solve the problem of timeouts and distribute the
data across the spark nodes or will it still throw timeout exceptions. Can
you please help me with the best practice to retrieve the data for the key
with 300k rows. Any help is highly appreciated.

Regards
Goutham.

Re: Partition key with 300K rows can it be queried and distributed using Spark

2019-01-17 Thread Nitan Kainth

Not sure about spark data distribution but yeah spark can be used to retrieve 
such data from Cassandra.


Regards,
Nitan
Cell: 510 449 9629

> On Jan 17, 2019, at 2:15 PM, Goutham reddy  wrote:
> 
> Hi,
> As each partition key can hold up to 2 Billion rows, even then it is an 
> anti-pattern to have such huge data set for one partition key in our case it 
> is 300k rows only, but when trying to query for one particular key we are 
> getting timeout exception. If I use Spark to get the 300k rows for a 
> particular key does it solve the problem of timeouts and distribute the data 
> across the spark nodes or will it still throw timeout exceptions. Can you 
> please help me with the best practice to retrieve the data for the key with 
> 300k rows. Any help is highly appreciated.
> 
> Regards
> Goutham.

Re: Partition key with 300K rows can it be queried and distributed using Spark

2019-01-17 Thread Jeff Jirsa

The reason big rows are painful in Cassandra is that by default, we index
it every 64kb. With 300k objects, it may or may not have a lot of those
little index blocks/objects. How big is each row?

If you try to read it and it's very wide, you may see heap pressure / GC.
If so, you could try changing the column index size from 64k to something
larger (128k, 256k, etc) - small point reads will be more disk IO, but less
heap pressure.

On Thu, Jan 17, 2019 at 12:15 PM Goutham reddy 
wrote:

> Hi,
> As each partition key can hold up to 2 Billion rows, even then it is an
> anti-pattern to have such huge data set for one partition key in our case
> it is 300k rows only, but when trying to query for one particular key we
> are getting timeout exception. If I use Spark to get the 300k rows for a
> particular key does it solve the problem of timeouts and distribute the
> data across the spark nodes or will it still throw timeout exceptions. Can
> you please help me with the best practice to retrieve the data for the key
> with 300k rows. Any help is highly appreciated.
>
> Regards
> Goutham.
>

Re: Partition key with 300K rows can it be queried and distributed using Spark

2019-01-17 Thread Goutham reddy

Thanks Jeff, yes we have 18 columns in total. But my question was does
spark can retrieve data by partitioning 300k data into spark nodes?

On Thu, Jan 17, 2019 at 1:30 PM Jeff Jirsa  wrote:

> The reason big rows are painful in Cassandra is that by default, we index
> it every 64kb. With 300k objects, it may or may not have a lot of those
> little index blocks/objects. How big is each row?
>
> If you try to read it and it's very wide, you may see heap pressure / GC.
> If so, you could try changing the column index size from 64k to something
> larger (128k, 256k, etc) - small point reads will be more disk IO, but less
> heap pressure.
>
>
>
> On Thu, Jan 17, 2019 at 12:15 PM Goutham reddy 
> wrote:
>
>> Hi,
>> As each partition key can hold up to 2 Billion rows, even then it is an
>> anti-pattern to have such huge data set for one partition key in our case
>> it is 300k rows only, but when trying to query for one particular key we
>> are getting timeout exception. If I use Spark to get the 300k rows for a
>> particular key does it solve the problem of timeouts and distribute the
>> data across the spark nodes or will it still throw timeout exceptions. Can
>> you please help me with the best practice to retrieve the data for the key
>> with 300k rows. Any help is highly appreciated.
>>
>> Regards
>> Goutham.
>>
> --
Regards
Goutham Reddy

Re: Upgrade to v3.11.3

Partition key with 300K rows can it be queried and distributed using Spark

Re: Partition key with 300K rows can it be queried and distributed using Spark

Re: Partition key with 300K rows can it be queried and distributed using Spark

Re: Partition key with 300K rows can it be queried and distributed using Spark

5 matches

Site Navigation

Mail list logo

Footer information