OLAC Record: Articulation Index LSCP

OLAC Record
oai:www.ldc.upenn.edu:LDC2015S12

Metadata

Title: Articulation Index LSCP

Access Rights: Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining

Bibliographic Citation: Schatz, Thomas, et al. Articulation Index LSCP LDC2015S12. Web Download. Philadelphia: Linguistic Data Consortium, 2015

Contributor: Schatz, Thomas

Cao, Xuan-Nga

Kolesnikova, Anna

Bergvelt, Tomas

Wright, Jonathan

Dupoux, Emmanuel

Date (W3CDTF): 2015

Date Issued (W3CDTF): 2015-11-16

Description: *Introduction* Articulation Index LSCP was developed by researchers at Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP), Ecole Normale Supérieure. It revises and enhances a subset of Articulation Index (AIC) (LDC2005S22), a corpus of persons speaking English syllables. Changes include the addition of forced alignment to sound files, time alignment of syllable utterances and format conversions. AIC consists of 20 American English speakers (12 males, 8 females) pronouncing syllables, some of which form actual words, but most of which are nonsense syllables. All possible Consonant-Vowel (CV) and Vowel-Consonant (VC) combinations were recorded for each speaker twice, once in isolation and once within a carrier-sentence, for a total of 25768 recorded syllables. *Data* Articulation Index LSCP alters AIC in the following ways. * Time-alignments for the onset and offset of each word and syllable were generated through forced-alignment with a standard HMM-GMM (Hidden Markov Model-Gaussian Mixture Model) ASR system. * The time-alignments for the beginning and end of the syllables (whether in isolation or within a carrier sentence) were manually adjusted. The time-alignments for the other words in carrier sentences were not manually adjusted. * The recordings of isolated syllables were cut according to the manual time-alignments to remove the silent portions at the beginning and end, and the time-alignments were altered to correspond to the cut recordings. * The file naming scheme was slightly altered for compatibility with the Kaldi speech recognition toolkit. * AIC contains a wide-band (16 KHz, 16-bit PCM) and a narrow-band (8 KHz, 8 bit u-law) version of the recordings distributed in sphere format. The LSCP version contains the wide-band version only distributed as wave files. This release does not include certain AIC triphone recordings (CVC, CCV or VCC). Audio data is presented as 16kHz 16-bit flac compressed .wav files. The flac compression was added for distribution, and documentation may refer to the files as .wav files. *Samples* Please listen to this audio sample. *Updates* None at this time.

Extent: Corpus size: 796224 KB

Format: Sampling Rate: 1600

Sampling Format: pcm

Identifier: LDC2015S12

https://catalog.ldc.upenn.edu/LDC2015S12

ISBN: 1-58563-735-1

ISLRN: 607-221-014-735-8

DOI: 10.35111/rz6a-gd14

Language: English

Language (ISO639): eng

License: LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf

Medium: Distribution: Web Download

Publisher: Linguistic Data Consortium

Publisher (URI): https://www.ldc.upenn.edu

Relation (URI): https://catalog.ldc.upenn.edu/docs/LDC2015S12

Rights Holder: Portions © 2015 Tomas Bergvelt, Anna Kolesnikov, Xuan-Nga Cao, Thomas Schatz, Emmanuel Dupoux, © 2005, 2015 Trustees of the University of Pennsylvania

Type (DCMI): Sound

Type (OLAC): primary_text

OLAC Info

Archive: The LDC Corpus Catalog

Description: http://www.language-archives.org/archive/www.ldc.upenn.edu

GetRecord: OAI-PMH request for OLAC format

GetRecord: Pre-generated XML file

OAI Info

OaiIdentifier: oai:www.ldc.upenn.edu:LDC2015S12

DateStamp: 2020-11-30

GetRecord: OAI-PMH request for simple DC format

Search Info
Citation: Schatz, Thomas; Cao, Xuan-Nga; Kolesnikova, Anna; Bergvelt, Tomas; Wright, Jonathan; Dupoux, Emmanuel. 2015. Linguistic Data Consortium.
Terms: area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text

http://www.language-archives.org/item.php/oai:www.ldc.upenn.edu:LDC2015S12
Up-to-date as of: Wed Feb 26 18:32:01 EST 2025

Metadata
Title:		Articulation Index LSCP
Access Rights:		Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining
Bibliographic Citation:		Schatz, Thomas, et al. Articulation Index LSCP LDC2015S12. Web Download. Philadelphia: Linguistic Data Consortium, 2015
Contributor:		Schatz, Thomas
		Cao, Xuan-Nga
		Kolesnikova, Anna
		Bergvelt, Tomas
		Wright, Jonathan
		Dupoux, Emmanuel
Date (W3CDTF):		2015
Date Issued (W3CDTF):		2015-11-16
Description:		Introduction Articulation Index LSCP was developed by researchers at Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP), Ecole Normale Supérieure. It revises and enhances a subset of Articulation Index (AIC) (LDC2005S22), a corpus of persons speaking English syllables. Changes include the addition of forced alignment to sound files, time alignment of syllable utterances and format conversions. AIC consists of 20 American English speakers (12 males, 8 females) pronouncing syllables, some of which form actual words, but most of which are nonsense syllables. All possible Consonant-Vowel (CV) and Vowel-Consonant (VC) combinations were recorded for each speaker twice, once in isolation and once within a carrier-sentence, for a total of 25768 recorded syllables. Data Articulation Index LSCP alters AIC in the following ways. * Time-alignments for the onset and offset of each word and syllable were generated through forced-alignment with a standard HMM-GMM (Hidden Markov Model-Gaussian Mixture Model) ASR system. * The time-alignments for the beginning and end of the syllables (whether in isolation or within a carrier sentence) were manually adjusted. The time-alignments for the other words in carrier sentences were not manually adjusted. * The recordings of isolated syllables were cut according to the manual time-alignments to remove the silent portions at the beginning and end, and the time-alignments were altered to correspond to the cut recordings. * The file naming scheme was slightly altered for compatibility with the Kaldi speech recognition toolkit. * AIC contains a wide-band (16 KHz, 16-bit PCM) and a narrow-band (8 KHz, 8 bit u-law) version of the recordings distributed in sphere format. The LSCP version contains the wide-band version only distributed as wave files. This release does not include certain AIC triphone recordings (CVC, CCV or VCC). Audio data is presented as 16kHz 16-bit flac compressed .wav files. The flac compression was added for distribution, and documentation may refer to the files as .wav files. Samples Please listen to this audio sample. Updates None at this time.
Extent:		Corpus size: 796224 KB
Format:		Sampling Rate: 1600
Format:		Sampling Format: pcm
Identifier:		LDC2015S12
		https://catalog.ldc.upenn.edu/LDC2015S12
		ISBN: 1-58563-735-1
		ISLRN: 607-221-014-735-8
		DOI: 10.35111/rz6a-gd14
Language:		English
Language (ISO639):		eng
License:		LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf
Medium:		Distribution: Web Download
Publisher:		Linguistic Data Consortium
Publisher (URI):		https://www.ldc.upenn.edu
Relation (URI):		https://catalog.ldc.upenn.edu/docs/LDC2015S12
Rights Holder:		Portions © 2015 Tomas Bergvelt, Anna Kolesnikov, Xuan-Nga Cao, Thomas Schatz, Emmanuel Dupoux, © 2005, 2015 Trustees of the University of Pennsylvania
Type (DCMI):		Sound
Type (OLAC):		primary_text
OLAC Info
Archive:		The LDC Corpus Catalog
Description:		http://www.language-archives.org/archive/www.ldc.upenn.edu
GetRecord:		OAI-PMH request for OLAC format
GetRecord:		Pre-generated XML file
OAI Info
OaiIdentifier:		oai:www.ldc.upenn.edu:LDC2015S12
DateStamp:		2020-11-30
GetRecord:		OAI-PMH request for simple DC format
Search Info
Citation:		Schatz, Thomas; Cao, Xuan-Nga; Kolesnikova, Anna; Bergvelt, Tomas; Wright, Jonathan; Dupoux, Emmanuel. 2015. Linguistic Data Consortium.
Terms:		area_Europe country_GB dcmi_Sound iso639_eng olac_primary_text