AWARD
99 -- BIBFRAME Scriptshifter Utility Enhancement
- Notice Date
- 6/17/2025 10:26:56 AM
- Notice Type
- Award Notice
- Contracting Office
- CONTRACTS SERVICES Washington DC 20540 USA
- ZIP Code
- 20540
- Solicitation Number
- 2025-LGC-0006
- Archive Date
- 06/28/2025
- Point of Contact
- Betsy Lewis-Matsuoka, Phone: 2027070170
- E-Mail Address
-
bmatsuoka@loc.gov
(bmatsuoka@loc.gov)
- Small Business Set-Aside
- NONE No Set aside used
- Award Number
- LCLGD25P0012
- Award Date
- 06/13/2025
- Awardee
- Stefano Cossu Philadelphia PA 19143 USA
- Award Amount
- 112000.00
- Description
- The Library of Congress (LC) collects information resources from around the world and makes them known and available to potential users via bibliographic records that describe the resources. The bibliographic descriptions contain typical bibliographic data such as titles, names of creators and other associated entities, publication information such as publisher and place of publication, and other information. This information is presented in the publications in various scripts, including many non-Latin scripts such as Chinese, Korean, Cyrillic, Arabic, Greek, Hebrew, and over 30 additional scripts. The Library of Congress records descriptive information in the original script of the item being cataloged (technology allowing) but needs to transliterate certain data elements of the description into the Latin script for various processing components and in some cases to assist end users. When technology does not support non-Latin scripts, LC staff must manually transliterate into the Latin script more of the descriptive information. The Library of Congress maintains transliteration tables for over 75 languages and scripts, ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman Scripts and makes them available on its web site: http://www.loc.gov/catdir/cpso/roman.html. These tables are jointly maintained by LC and the American Library Association (ALA). These tables are used by United States libraries and many libraries outside the United States. The Library of Congress catalog contains several million bibliographic records for resources in non-Latin scripts and collects over 75,000 additional non-Latin resources each year. The Library of Congress requires a utility that can transliterate between non-Latin and Latin scripts using the transliteration tables approved by LC and ALA. A utility, called Scriptshifter, has been developed for transliteration of 20+ scripts in the Balkan/Caucasian, Slavic, Turkic, and Chinese script families, and for Korean, Greek, Arabic, and Hebrew. Scriptshifter needs to be continually enhanced to incorporate additional scripts and improve the tool for very complex scripts like the Arabic, Southeast Asian, and several Asian scripts like Japanese in which the Library receives resources. Continual updating of the software framework to improve efficiency is necessary as the technical possibilities change and the transliteration tables change. The Contractor shall design, code, test, and document the additions to the Scriptshifter transliteration utility capability for non-Latin data into the Latin alphabet according to the ALA-LC Romanization Tables, and where possible the conversion of Latin script transliteration to non-Latin script. The utility will focus on research and improvement of Indic and related languages such as Devanagari and Brahmi scripts, Southeast Asian scripts such as Thai, Laotian, Khmer, Burmese, Tibetan, and Arabic scripts such as Kurdish, Sindhi, Persian, Pushto, Urdu, and Mophah. In addition, research on developing a Japanese transliteration tool and refinement of other Asian scripts such as Korean and Chinese will be done. More specifically, the contractor shall: Improve Persian transliteration Improve Thai, Roman-to-script only Improve some inaccurate South Asian languages handled by current software Implement Lao Improve Tibetan: fix tables in current software or create a new one managed by SS Improve overall testing Research existing tools for transliterating Japanese and implement a S2r-only transliteration, if an appropriate tool is identified Research a better machine learning model for Arabic scripts, possibly usable with multiple languages Improve some inaccurate South Asian languages handled by current software Implement a better method to separate words Research possible solutions on other Arabic scripts with less available data (Pushto, Urdu, etc.) The utility must also carry out reverse transliteration, converting Latin transliterated strings into non-Latin strings, where feasible. The utility must remain adjustable as transliterations change. The contractor will review the software utility as a whole and make general improvements to the framework. One area of focus will be the Aksharamukha tool that has been incorporated for Asian scripts. Work will be done related to authentication and external use of the tool.
- Web Link
-
SAM.gov Permalink
(https://sam.gov/opp/50d58547968d48db86c5ee98a36ddbe7/view)
- Place of Performance
- Address: Washington, DC, USA
- Country: USA
- Country: USA
- Record
- SN07479998-F 20250619/250617230045 (samdaily.us)
- Source
-
SAM.gov Link to This Notice
(may not be valid after Archive Date)
| FSG Index | This Issue's Index | Today's SAM Daily Index Page |