> Details like OSS configuration, storage and network details would help

38 TByte Lustre filesystem, exported by 11 servers via Infiniband (64 OSTs)

> What happens when you write, single client to a single OST, what figures are 
> you getting?

~ 280 MBps. I see similar results when I use multiple processes (from 
the same node) writing to a shared file.

> how your network looks like?

I/O network 4x Infiniband with up to 1 GiByte/s pro Link (192 Port 
Voltaire ISR 9288 IB-Switch)

Do let me know if I can provide more details.
Thanks,

On 02/10/2012 05:26 PM, Carlos Thomaz wrote:
> Hi Kshitij
>
> Difficult to point put without knowing your exact configuration. Details like 
> OSS configuration, storage and network details would help.
>
> The best to identify your bottleneck is to run other benchmarks to understand 
> your raw performance and then compare it to what you get on top of lustre. 
> You could use xdd to analyze your OSS raw performer and obdfilter to 
> understand how your lustre backend is performing. However if your system is 
> in production you won't be able to do that since these benchmarks are 
> destructive.
>
> What happens when you write, single client to a single OST, what figures are 
> you getting? What about reads?
>
> Send us more information that we try to help you out what's going on.
>
> Btw, how your network looks like?
>
> Rgds
> Carlos
>
> On Feb 10, 2012, at 4:28 PM, "Kshitij Mehta"<[email protected]>  wrote:
>
>> We have lustre 1.6.7 configured using 64 OSTs.
>> I am testing the performance using IOR, which is a file system benchmark.
>>
>> When I run IOR using mpi such that processes write to a shared file,
>> performance does not scale. I tested with 1,2 and 4 processes, and the
>> performance remains constant at 230 MBps.
>>
>> When processes write to separate files, performance improves greatly,
>> reaching 475 MBps.
>>
>> Note that all processes are spawned on a single node.
>>
>> Here is the output:
>> Writing to a shared file:
>>
>>> Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
>>> /fastfs/gabriel/ss_64/km_ior.out
>>> Machine: Linux deimos102
>>>
>>> Summary:
>>>         api                = POSIX
>>>         test filename      = /fastfs/gabriel/ss_64/km_ior.out
>>>         access             = single-shared-file
>>>         ordering in a file = sequential offsets
>>>         ordering inter file= no tasks offsets
>>>         clients            = 4 (4 per node)
>>>         repetitions        = 1
>>>         xfersize           = 32 MiB
>>>         blocksize          = 2 GiB
>>>         aggregate filesize = 8 GiB
>>>
>>> Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
>>> (OPs)  Mean (OPs)   Std Dev  Mean (s)
>>> ---------  ---------  ---------  ----------   -------  ---------
>>> ---------  ----------   -------  --------
>>> write         233.61     233.61      233.61      0.00       7.30
>>> 7.30        7.30      0.00  35.06771   EXCEL
>>>
>>> Max Write: 233.61 MiB/sec (244.95 MB/sec)
>> Writing to separate files:
>>
>>> Command line used: ./IOR -a POSIX -b 2g -e -t 32m -w -o
>>> /fastfs/gabriel/ss_64/km_ior.out -F
>>> Machine: Linux deimos102
>>>
>>> Summary:
>>>         api                = POSIX
>>>         test filename      = /fastfs/gabriel/ss_64/km_ior.out
>>>         access             = file-per-process
>>>         ordering in a file = sequential offsets
>>>         ordering inter file= no tasks offsets
>>>         clients            = 4 (4 per node)
>>>         repetitions        = 1
>>>         xfersize           = 32 MiB
>>>         blocksize          = 2 GiB
>>>         aggregate filesize = 8 GiB
>>>
>>> Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min
>>> (OPs)  Mean (OPs)   Std Dev  Mean (s)
>>> ---------  ---------  ---------  ----------   -------  ---------
>>> ---------  ----------   -------  --------
>>> write         475.95     475.95      475.95      0.00      14.87
>>> 14.87       14.87      0.00  17.21191   EXCEL
>>>
>>> Max Write: 475.95 MiB/sec (499.07 MB/sec)
>> I am trying to understand where the bottleneck is, when processes write
>> to a shared file.
>> Your help is appreciated.
>>
>> -- 
>> Kshitij Mehta
>> PhD candidate
>> Parallel Software Technologies Lab (pstl.cs.uh.edu)
>> Dept. of Computer Science
>> University of Houston
>> Houston, Texas, USA
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> [email protected]
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Kshitij Mehta
PhD Candidate
Parallel Software Technologies Lab (http://pstl.cs.uh.edu)
Dept. of Computer Science
University of Houston
Texas, USA

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to