By the way - if you're going this route, see
https://github.com/datastax/spark-cassandra-connector

On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim <[email protected]>
wrote:

> You'll probably have to install it separately.
>
> On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker <[email protected]> wrote:
>
>> Hi Vetle,
>>
>> IndexedRDD is persisted in the same way RDDs are as far as I am aware.
>> Are you aware if Cassandra can be built into my application or has to be a
>> stand alone database which is installed separately?
>>
>> Thanks,
>>
>> Jem
>>
>> On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> Not sure how IndexedRDD is persisted, but perhaps you're better off
>>> using a NOSQL database for lookups (perhaps using Cassandra, with the
>>> Cassandra connector)? That should give you good performance on lookups, but
>>> persisting those billion records sounds like something that will take some
>>> time in any case.
>>>
>>> Regards,
>>> Vetle
>>>
>>>
>>> On Thu, Jul 16, 2015 at 10:02 AM Jem Tucker <[email protected]>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have been using IndexedRDD as a large lookup (1 billion records) to
>>>> join with small tables (1 million rows). The performance of indexedrdd is
>>>> great until it has to be persisted on disk. Are there any alternatives to
>>>> IndexedRDD or any changes to how I use it to improve performance with big
>>>> data volumes?
>>>>
>>>> Kindest Regards,
>>>>
>>>> Jem
>>>>
>>>

Reply via email to