SRCSGT | D | High Performance Global Parallel File System

FBO DAILY ISSUE OF MARCH 31, 2007 FBO #1951
SOURCES SOUGHT

D -- High Performance Global Parallel File System

Notice Date: 3/29/2007
Notice Type: Sources Sought
NAICS: 541511 — Custom Computer Programming Services
Contracting Office: Department of Energy, Lawrence Berkeley National Laboratory (DOE Contractor), Lawrence Berkeley, 1 Cyclotron Road MS: 937-200, Berkeley, CA, 94720, UNITED STATES
ZIP Code: 00000
Solicitation Number: FILESYSTEM2007
Response Due: 4/16/2007
Archive Date: 5/1/2007
Description: Lawrence Berkeley National Laboratory (LBNL) and NERSC, in support of the U.S. Department of Energy Office of Science, is seeking information on a high performance parallel global filesystem which will provide a unified namespace accessible in parallel from all NERSC production computing resources at near native cluster rates. Current systems at NERSC include: 1) Cray XT4 system Linux/Catamount, 2)IBM Power 5 AIX cluster, 3) LNXI Opteron Infiniband Linux cluster, 4) SGI Altix IA64 NUMA Linux, 5) IBM Power 3 AIX cluster, and 5) IA32/x86_64 commodity Linux cluster. NERSC adds one or more systems a year with a very large system arriving every three years. NERSC anticipates all those systems to be openly competed and cannot limit possible future choices based on its global filesystem. The access patterns at NERSC are varied, but many applications require highly concurrent I/O to large, single files. High aggregate rates of single streams are also needed, but not the primary mechanism of use. I/O formats includes MPI/IO, HDF5, NetCDF, and POSIX (both many processes writing to single file and many processes writing to many files). Please explain how the following requirements would be met and supported: Functionality Requirements: 1. The filesystem must be deployable to all existing and future NERSC systems. 1a. What kinds of systems are currently and planned to be supported (both hardware and OS)? 1b. If direct support for these systems is not provided, explain how your filesystem can provide high speed, parallel access for them, what performance might be expected and identify time frames of availability. 1c. Explain how future NERSC systems of undefined architecture/OS would be supported. 2. The facility wide global file system must exist separate from all NERSC systems (i.e., it must be a stand-alone sub component of the NERSC Facility). 2a. NERSC computational systems have different maintenance and upgrade schedules. The filesystem must be available (the entire name space accessible at expected performance) independent of the availability of any individual system. It must be possible to shutdown any NERSC ?client? system and have the file system remain available on the other systems. 2b. The file system must be able to be administered and maintained independent of any individual client system. 3. The facility wide global file system must co-exist with local file systems on the NERSC systems. Each NERSC system usually has a set of file systems of various types (e.g., Lustre, GPFS, XFS, and NFS) that are local to that system and managing storage that is dedicated to that system. The global file system must not interfere with the operation of these local file systems. Note: the ultimate goal is that there be minimal local storage on any system, but how close NERSC is able to come to the ultimate goal is dependent on the performance and reliability of the facility wide global file system. Additionally, the existing systems already have local storage to varying degrees. 4. POSIX file system API and semantics: 4a. NERSC users run a variety of codes that are written with the expectation of file system compliance with POSIX file system API and semantics. 4b. If the filesystem provides extensions to POSIX, identify them and explain their purpose. 4c. If filesystem deviates from POSIX, identify how it does and what the implications are for applications. 4d. Identify any relaxation of the POSIX standard that are allowed or required to improve performance. Are these sites configurable? 5. The global file system must be capable of being delivered via IP networks. Other, lower level network protocols are required for specific systems. 6. Multiple independent instances of the filesystem must be accessible by the same client system. 7. The global file system must be expandable in both capacity and aggregate bandwidth. Describe the: 7a. Ability to dynamically increase or reduce the size of an existing file system. 7b. Ability to dynamically increase or reduce the number of files an existing file system can support (e.g., increase the number of i-nodes). 7c. Ability to dynamically increase the I/O performance of an existing file system when additional hardware is added to the file system. 8. Parallel I/O. 8a. Describe how you support parallel (concurrent) access to a single file from multiple client nodes. 8b. Describe how you support parallel (concurrent) access to multiple files by multiple clients. 8c. Describe how you support the ability to increase the single-file parallel I/O performance with the increase of new hardware. 8d. Identify any extensions for MPI-IO that you provide. 9. No single point of failure. The global file system must be able to be configured with no single point of failure (hardware or software). 10. The global file system must be capable of running some subset of multiple versions of the file system software. 10a. This is necessary in order to permit rolling upgrades so a simultaneous upgrade of all clients and/or servers is not required. 10b. This is necessary in order to accommodate different system OS requirements and upgrade availability. 10c. Explain what multi-version capabilities the filesystem is able to support and any limitations on the simultaneous use of multiple file system software versions. Scalability and Performance Requirements 11. File system must scale to support a large number of clients now and substantially more in the future. 11a. NERSC now has approximately 10K client nodes, both single and multiple CPUs/cores, with approximately 30K CPUs/cores. 11b. In the 2009 time frame with the NERSC-6 system, NERSC is expected to grow to approximately 50k to 100k client nodes with multiple CPUs/cores per node. 11c. Between now and 2009, the number of client nodes and CPUs is likely to grow substantially, but incrementally. 11d. Explain what features your file system has which will allow it to operate at this scale and how you will support the necessary number of clients now and in the future. 11e. Multi-PetaByte (PB) filesystem capacity per filesystem instance. What is the current size limitation of the filesystem? 11f. Explain any architectural limitations on the size to which a single instance of your file system can grow. 11g. Indicate the current maximum per file system capacities supported now and your roadmap for increased capacities. 12. File system must support individual files that are a significant fraction of the file system capacity. Explain any architectural limitations in the: 12a. Number of individual files the filesystem can support. NERSC desires over a billion files/filesystem. 12b. Number of files per directory that can be supported. NERSC desires over a billion/directory. 13. Filesystem must be capable of scalable aggregate streaming I/O performance. 13a. NERSC currently needs 18 GB/s end user aggregate multi-node streaming I/O performance. This aggregate performance is across all file system instances. 13b. In the 2009 time frame, with the NERSC-6 system, NERSC is expected to need approximately 60 GB/s aggregate streaming performance. 13c. Between now and 2009, aggregate streaming performance will need to increase in relation to the growth in the number of client systems and nodes and their performance. 13d. Explain how you would accomplish the necessary initial aggregate streaming I/O bandwidth and accommodate future bandwidth requirements. 13e. Explain any architectural or current implementation limits on single file system and aggregate file system streaming I/O bandwidth. 14. Scalable meta-data operation rates (e.g., file creates). Explain what mechanisms your filesystem has to scale up the meta-data operations rates. Please address performance for meta data operations on files in the same directory as well as different directories. 15. Please explain any hardware assumptions made in the above responses 16. Please identify any required hardware or software that is needed or assumed for your file system. Licensing and Support 17. Describe how the filesystem is licensed and the costs for: 17a. Individual cpu systems. 17b. Multiple cpu systems. 17c. Clusters of single and multiple cpu systems. 17d. Clusters where only a few I/O nodes run the filesystem software and thousands of compute nodes forward I/O to the I/O nodes. 17e. Is a site-wide license offered, and under what terms? 17f. Is there a volume discount offered? 17g. Is there a published price list? Where? 18. Describe the yearly support costs for the filesystem. 19. Describe licensing and support costs for the current NERSC systems: 19a. Cray XT4 system Linux/Catamount: 9712 nodes total, 2 processors per node, 9672 compute nodes; 40 I/O nodes, and a 54 node test system. 19b. IBM Power 5 AIX cluster: 122 nodes total, 8 processors per node, 111 compute nodes; 6 I/O nodes, and a 2 node test system. 19c. LNXI Opteron Infiniband Linux cluster: 391 nodes, 2 processors per node, 356 compute nodes; 20 I/O nodes, and a 10 node test system. 19d. SGI Altix IA64 NUMA Linux: 32 processors IA32/x86_64. 19e. IBM Power 3 AIX cluster: 416 nodes total, 16 processors per node, 380 compute nodes; 20 I/O nodes, and a 4 node test system. 19f. Commodity Linux cluster: 400 nodes total, 2 processors per node, 235 compute nodes; 102 I/O nodes, and a 10 node test system. 20. Describe what price and terms & conditions protection there would be for NERSC?s future systems. 21. Signify your acceptance of the University of California Lawrence Berkeley National Laboratory General Provisions without exception or change. Responses are due no later than Tuesday, April 16, 2007. Responses should not exceed 40 pages. Responses as well as questions about the RFI should be submitted electronically (as an email attachment in Microsoft Word or Adobe Acrobat format) to Lynn Rippe at lerippe@lbl.gov. Non-electronic submissions will not be considered. Based on responses to this RFI, NERSC may invite selected respondents to discuss their plans in more detail.
Place of Performance: Address: University of California, Ernest Orlando Lawrence Berkeley National Laboratory, 415 Thomas Berkley Way, Oakland, CA; Zip Code: 94612; Country: UNITED STATES
Record: SN01261866-W 20070331/070329220459 (fbodaily.com)
Source: FedBizOpps Link to This Notice
(may not be valid after Archive Date)

| FSG Index | This Issue's Index | Today's FBO Daily Index Page |

Loren Data's SAM Daily™

D -- High Performance Global Parallel File System