You can certainly query over 4 TB of data with Spark.  However, you will
get an answer in minutes or hours, not in milliseconds or seconds.  OLTP
databases are used for web applications, and typically return responses in
milliseconds.  Analytic databases tend to operate on large data sets, and
return responses in seconds, minutes or hours.  When running batch jobs
over large data sets, Spark can be a replacement for analytic databases
like Greenplum or Netezza.



On Sat, Jul 11, 2015 at 8:53 AM, Roman Sokolov <ole...@gmail.com> wrote:

> Hello. Had the same question. What if I need to store 4-6 Tb and do
> queries? Can't find any clue in documentation.
> Am 11.07.2015 03:28 schrieb "Mohammed Guller" <moham...@glassbeam.com>:
>
>>  Hi Ravi,
>>
>> First, Neither Spark nor Spark SQL is a database. Both are compute
>> engines, which need to be paired with a storage system. Seconds, they are
>> designed for processing large distributed datasets. If you have only
>> 100,000 records or even a million records, you don’t need Spark. A RDBMS
>> will perform much better for that volume of data.
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Ravisankar Mani [mailto:rrav...@gmail.com]
>> *Sent:* Friday, July 10, 2015 3:50 AM
>> *To:* user@spark.apache.org
>> *Subject:* Spark performance
>>
>>
>>
>> Hi everyone,
>>
>> I have planned to move mssql server to spark?.  I have using around
>> 50,000 to 1l records.
>>
>>  The spark performance is slow when compared to mssql server.
>>
>>
>>
>> What is the best data base(Spark or sql) to store or retrieve data around
>> 50,000 to 1l records ?
>>
>> regards,
>>
>> Ravi
>>
>>
>>
>


-- 
### Confidential e-mail, for recipient's (or recipients') eyes only, not
for distribution. ###

Reply via email to