Re: file server

David Christensen Wed, 12 Jul 2023 20:09:35 -0700

On 7/12/23 02:44, lina wrote:

Dear all,


My computer only has 2 TB data storage capacity,

I want to have 100 TB capacity to store/analyze data.

I am thinking of adding 5 hard drives, each is 18TB, and then merge
them into one volume? or get a file server? What is the best option
for me, and what is the budget?

Thanks so much for your advice, best, lina



On 7/12/23 04:48, lina wrote:

Currently I do not have a plan to keep the data, once the data
finished analyzing, I can just remove it.



On 7/12/23 06:00, lina wrote:

I need to extract the data for downstream analysis. after that, these
data can be removed.

It is hard to provide recommendations without knowing your computer,your network, your analysis, your quality metrics, or your budget.

I use ZFS. Given an x86_64/amd64 computer with Debian, sufficient HDDbays, and sufficient HBA ports, yes, you could install 5 @ 18 TB HDD'sand merge them into one 90 TB ZFS pool. If your computer has 5 bays andports, this will be your lowest cost solution; but is unlikely to beyour "best" solution.

ZFS likes memory; the more the better. (I use ECC memory.) For 90 TB,I would consider filling all memory slots with the fastest and largestmodules that are supported.

ZFS allows SSD's to be added as read cache devices and/or write cachedevices. Done correctly, either or both can improve performance at afraction the cost of all-SSD storage.

If your analysis can make use of concurrent I/O, more drives of smallersize each will improve performance. One or more external chassis may bedesirable:

And, smaller drives make RAID more feasible. E.g. 20 @ 6 TB arranged as5 raidz1 virtual devices (vdev) of 4 drives each would provide 90 TB ofstorage, support 5 concurrent I/O operations, and tolerate 1 drivefailure per vdev at an incremental cost of +33%. Whereas 10 @ 18 TBdrives arranged as 5 mirror vdev's of 2 drives each would provide 90 TBof storage, support 5 concurrent I/O operations, and tolerate 1 drivefailure per vdev at an incremental cost of +100%. But, the latter willresilver faster when you replace a failed drive (or a spare activates).

If your analysis can be partitioned across multiple threads and thethreads have independent memory and I/O patterns, putting the data ontoa file server (or NAS) would allow multiple computers to work togetherand do the analysis in less time. You will want a fast connectionbetween the analysis computers and the storage server (e.g. 10+ GbpsEthernet). (Alternatively, a storage area network; SAN.)



David

Re: file server

Reply via email to