OLAC Record: Articulation Index

OLAC Record
oai:www.ldc.upenn.edu:LDC2005S22

Metadata

Title: Articulation Index

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Wright, Jonathan. Articulation Index LDC2005S22. Web Download. Philadelphia: Linguistic Data Consortium, 2005

Contributor: Wright, Jonathan

Date (W3CDTF): 2005

Date Issued (W3CDTF): 2005-09-15

Description: *Introduction* Articulation Index was developed by the Linguistic Data Consortium (LDC) and contains 34 hours of prompted and conversational English speech. The corpus was partly inspired by the work of Harvey Fletcher, who performed a number of perceptual experiments involving English syllables during the first half of the 20th century. His term articulation index meant something like perceptual index of syllables, where those syllables were not necessarily words, and reflected how well speakers could correctly identify syllables in the presence of noise. This corpus was created to facilitate similar experiments, as well as to potentially facilitate new methods in speech recognition research. The basic concept behind the corpus was to record speakers pronouncing syllables of English, some of which might be real words, but most of which are nonsense syllables. The goal was to have each speaker say a set of 2,000 syllables common to all speakers, as well as a set of 20 syllables unique to that speaker. LDC has also released Articulation Index LSCP (LDC2015S12), which adds time alignment and different formats to a subset of this corpus. *Data* This release contains recordings of 20 American English speakers (12 males, eight females) saying 2005 common syllables, 1845 of which were spoken by all speakers, and 400 unique syllables (20 syllables/speaker). Participants were prompted with an automatically generated sentence containing the desired syllable followed by an isolated pronunciation of the syllable. The data contains separate files for the whole recorded phrases and the isolated syllables. The corpus also contains short conversations between participants. Here's a breadown of the extent for each folder: * Phrases: 22.8 hours * Syllables: 10.2 hours * Conversation: 1.2 hours The recordings were made in a small, sound-treated anechoic room at LDC. The speakers wore two microphones: a Sennheiser 410 headset and a Nortel Liberator wireless phone headset. The Sennheiser's signal traveled through a Symetrix 302 Dual Microphone Preamp, Sony PCM-R300 DAT deck, and Townshend Datlink to a Sun Sparcserver 20 where it was written to disk at 16 kHz, 16-bit, pcm data. The Nortel's signal was transmitted to a wireless base station at a telephone connected via the network to LDC's telephone recording platform where it was captured to disk as 8 kHz, 8-bit, u-law data. The speakers were prompted via a computer interface that displayed one prompt at a time, allowing them to iterate through the prompts by pressing a "next" button. Each recording session lasted approximately 15 minutes. *Samples* For an example of the data in this corpus, please listen to this sample (WAV). *Updates* None at this time.

Format: Sampling Rate: 16000, 8000

Sampling Format: pcm

Identifier: LDC2005S22

https://catalog.ldc.upenn.edu/LDC2005S22

ISBN: 1-58563-346-1

ISLRN: 513-688-150-766-0

DOI: 10.35111/qmyb-6884

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2005S22

Rights Holder: © 2005 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2005S22

DateStamp: 2022-01-20

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Wright, Jonathan. 2005. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2005S22
Up-to-date as of: Wed Feb 26 18:31:22 EST 2025

Metadata
Title:		Articulation Index
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Wright, Jonathan. Articulation Index LDC2005S22. Web Download. Philadelphia: Linguistic Data Consortium, 2005
Contributor:		Wright, Jonathan
Date (W3CDTF):		2005
Date Issued (W3CDTF):		2005-09-15
Description:		Introduction Articulation Index was developed by the Linguistic Data Consortium (LDC) and contains 34 hours of prompted and conversational English speech. The corpus was partly inspired by the work of Harvey Fletcher, who performed a number of perceptual experiments involving English syllables during the first half of the 20th century. His term articulation index meant something like perceptual index of syllables, where those syllables were not necessarily words, and reflected how well speakers could correctly identify syllables in the presence of noise. This corpus was created to facilitate similar experiments, as well as to potentially facilitate new methods in speech recognition research. The basic concept behind the corpus was to record speakers pronouncing syllables of English, some of which might be real words, but most of which are nonsense syllables. The goal was to have each speaker say a set of 2,000 syllables common to all speakers, as well as a set of 20 syllables unique to that speaker. LDC has also released Articulation Index LSCP (LDC2015S12), which adds time alignment and different formats to a subset of this corpus. Data This release contains recordings of 20 American English speakers (12 males, eight females) saying 2005 common syllables, 1845 of which were spoken by all speakers, and 400 unique syllables (20 syllables/speaker). Participants were prompted with an automatically generated sentence containing the desired syllable followed by an isolated pronunciation of the syllable. The data contains separate files for the whole recorded phrases and the isolated syllables. The corpus also contains short conversations between participants. Here's a breadown of the extent for each folder: * Phrases: 22.8 hours * Syllables: 10.2 hours * Conversation: 1.2 hours The recordings were made in a small, sound-treated anechoic room at LDC. The speakers wore two microphones: a Sennheiser 410 headset and a Nortel Liberator wireless phone headset. The Sennheiser's signal traveled through a Symetrix 302 Dual Microphone Preamp, Sony PCM-R300 DAT deck, and Townshend Datlink to a Sun Sparcserver 20 where it was written to disk at 16 kHz, 16-bit, pcm data. The Nortel's signal was transmitted to a wireless base station at a telephone connected via the network to LDC's telephone recording platform where it was captured to disk as 8 kHz, 8-bit, u-law data. The speakers were prompted via a computer interface that displayed one prompt at a time, allowing them to iterate through the prompts by pressing a "next" button. Each recording session lasted approximately 15 minutes. Samples For an example of the data in this corpus, please listen to this sample (WAV). Updates None at this time.
Format:		Sampling Rate: 16000, 8000
Format:		Sampling Format: pcm
Identifier:		LDC2005S22
		https://catalog.ldc.upenn.edu/LDC2005S22
		ISBN: 1-58563-346-1
		ISLRN: 513-688-150-766-0
		DOI: 10.35111/qmyb-6884
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2005S22
Rights Holder:		© 2005 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2005S22
DateStamp:		2022-01-20
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Wright, Jonathan. 2005. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text