OLAC Record
oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-1010

Metadata
Title:PAROLE reference corpus
Bibliographic Citation:http://hdl.handle.net/20.500.11752/ILC-1010
Creator:Marinelli, Rita
Biagini, Lisa
Bindi, Remo
Goggi, Sara
Monachini, Monica
Orsolini, Paola
Picchi, Eugenio
Rossi, Sergio
Calzolari, Nicoletta
Zampolli, Antonio
Date (W3CDTF):2024-07-19T13:19:26Z
Date Available:2024-07-19T13:19:26Z
Description:The PAROLE project (Preparatory Action for Linguistic Resources Organization for Language Engineering) has produced a set of harmonized corpora and lexicons for a large number of European languages. Each corpus, made up of 20 million words, was built up as reference corpus for Human Language Technology applications, to provide full information about a large variety of text types in the language considered, to represent the use of contemporary language and to become the first nucleus of an electronic text library. The texts have been stored using a common format following the standards recommended in the CES (Corpus Encoding Standard), according to flexibility and multifunctionality criteria. The texts belong to a wide range of media and genres, selected in proportions aimed at reflecting their prominence within the society, classified according to medium, genre, topic and time of production. For more info see also Goggi, Sara, Lisa Biagini, Remo Bindi, and Sergio Rossi. 1997. ‘Italian Corpus Documentation - LE-PAROLE WP2.11’, October. https://zenodo.org/records/8167985. Marinelli, Rita, Lisa Biagini, Remo Bindi, Sara Goggi, Monica Monachini, Paola Orsolini, Eugenio Picchi, Sergio Rossi, Nicoletta Calzolari, and A. Zampolli. 1996. ‘The Italian “Parole” Corpus : An Overview’. Linguistica Computazionale Computational Linguistics in Pisa-Special Issue I (XVI/XVII, 1996/1997): 401–21. https://doi.org/10.1400/18167. https://www.ilc.cnr.it/wp-content/uploads/2022/05/Z224.pdf The corpus is annotated at textual level, with some Named Entities annotation. A portion of this corpus was annotated morpho-syntactic information and is available here: Sara Goggi, Sara Goggi remo Bindi, Lisa Biagini e Sergio Rossi, 1997, Corpus Parole (3 milions words), ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa, http://hdl.handle.net/20.500.11752/ILC-1001.
Identifier (URI):http://hdl.handle.net/20.500.11752/ILC-1010
Language:Italian
Language (ISO639):ita
Publisher:Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
Rights:Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
http://creativecommons.org/licenses/by-nc/4.0/
Subject:Corpus
Reference corpus
PAROLE project
SGML
Databases
Type:corpus
Type (DCMI):Text
Type (OLAC):primary_text

OLAC Info

Archive:  ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", National Research Council, in Pisa
Description:  http://www.language-archives.org/archive/dspace-clarin-it.ilc.cnr.it
GetRecord:  OAI-PMH request for OLAC format
GetRecord:  Pre-generated XML file

OAI Info

OaiIdentifier:  oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-1010
DateStamp:  2024-07-19
GetRecord:  OAI-PMH request for simple DC format

Search Info

Citation: Marinelli, Rita; Biagini, Lisa; Bindi, Remo; Goggi, Sara; Monachini, Monica; Orsolini, Paola; Picchi, Eugenio; Rossi, Sergio; Calzolari, Nicoletta; Zampolli, Antonio. 2024. Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR).
Terms: area_Europe country_IT dcmi_Text iso639_ita olac_primary_text


http://www.language-archives.org/item.php/oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/ILC-1010
Up-to-date as of: Tue Mar 4 8:44:07 EST 2025