Passa al contenuto principale
Supporto OCLC

Set di caratteri validi per gli script supportati

Discover the valid character sets for supported non-Latin scripts in Connexion client.

Arabo, CJK, cirillico, greco ed ebraico

Character sets for these scripts are listed in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, Code Tables. These MARC-8 character sets are subsets of Unicode characters that are approved for use in MARC 21 cataloging.

Gli script definiti dai set di caratteri MARC-8 sono supportati per i record bibliografici e per le intestazioni dei nomi delle varianti nei record di autorità.

The following list defines the scope of valid characters in the Connexion client for Arabic (including Persian), CJK, Cyrillic, Greek, and Hebrew scripts:

  • Arabo di base = 33(hex) [ Grafica ASCII : 3]
  • Arabo esteso = 34(hex) [ Grafica ASCII : 4]
  • Cinese, giapponese, coreano (EACC) = 31(hex) [ Grafica ASCII : 1]
  • Cirillico base = 4E(hex) [ Grafica ASCII : N]
  • Cirillico esteso = 51(hex) [ Grafica ASCII : Q]
  • Greco di base = 53(hex) [ Grafica ASCII : S]
  • Ebraico di base = 32(hex) [ Grafica ASCII : 2]

     Note: In bibliographic records, the client inserts the notation (3, (4, $1, (N, (Q, (S, or (2, respectively, into field 066 to indicate which script(s) are used in a record. If multiple scripts are used, the notations are inserted individually, each in a separate subfield c. 

Armeno, bengalese, devanagari, etiope, siriaco, tamil e tailandese

Questi script sono supportati solo per i record bibliografici.

There are no defined MARC-8 character sets for Armenian, Bengali, Devanagari, Ethiopic, Syriac, Tamil, or Thai. In addition, Connexion Client also supports Cyrillic characters outside the MARC-8 character set. OCLC implemented the following script identification codes for these scripts based on ISO 15924 Code Lists.

The following list shows the ranges of UTF-8 Unicode characters that define valid characters for these scripts in the Connexion client:

  • Armn = Armeno (intervallo di caratteri da U+0530 a U+058F)
  • Beng = bengalese (intervallo di caratteri da U+0980 a U+09FF)
  • Cyrl = Cyrillic character set (outside the MARC-8 character set)
  • Deva = Devanagari (intervallo di caratteri da U+0900 a U+097F)
  • Ethi = Ethiopic (intervallo di caratteri da U+1200 a U+1399, da U+2D80 a U+2DDF, da U+AB00 a U+AB2F)
  • Syrc = Syriac (intervallo di caratteri da U+0700 a U+074F)
  • Taml = Tamil (intervallo di caratteri da U+0B80 a U+0BFF)
  • Tailandese = Tailandese (intervallo di caratteri da U+0E00 a U+0E7F)

 Note: The client inserts Armn, Beng, Cyrl, Deva, Ethi, Syrc, Taml, or Thai, respectively, in field 066 of a bibliographic record to indicate that the script is used.  If multiple scripts are used, the notations are inserted individually, each in a separate subfield c.

Limitazioni all'uso dei caratteri armeno, bengalese, cirillico (al di fuori del set di caratteri MARC-8), devanagari, etiope, siriaco, tamil e tailandese

  • To export or import records containing Armenian, Bengali, Cyrillic (outside the MARC-8 character set), Devanagari, Ethiopic, Syriac, Tamil, and Thai scripts, you must select the UTF-8 Unicode character set option in: 
    • To export: Tools > Options > Export; click Record Characteristics and select UTF-8 Unicode in the Character Set list under Bibliographic Records. 
    • To import: File > Import Records; click Record Characteristics and select UTF-8 Unicode in the Character Set list under Bibliographic Records. 

    Because Armenian, Bengali, Cyrillic (outside the MARC-8 character set), Devanagari, Ethiopic, Syriac, Tamil, and Thai scripts are not part of MARC-8 characters, you cannot export or import these scripts using the MARC-8 character set option.

    Because MARC-8 characters are part of UTF-8 Unicode, you can safely export or import Arabic, CJK, Cyrillic, Cyrillic (outside of the MARC-8 character set), Greek, and Hebrew records using either the MARC-8 or the UTF-8 Unicode character set option.
  • Armenian, Cyrillic (outside the MARC-8 character set), Bengali, Devanagari, Ethiopic, Syriac, Tamil, and Thai scripts are not supported for variant name headings in authority records

Invalid characters in Connexion client

Any characters that are not included in the above lists of defined characters or that cannot be inserted via Edit > Enter Diacritics (or Enter Diacritics button or <Ctrl><E>) are invalid in the client. To include non-Latin characters that you need but that are invalid in Connexion client, you can:

  • Enter the character in the record, export the record to your local system using Unicode export format, and then remove the character before processing the record in WorldCat.
    Or
  • Enter the name of the character within square brackets, using the Unicode standard if available, (e.g., enter [schwa]), or for CJK characters, enter the reading of the character (e.g., enter [yin]).

    For reference, see the Unicode charts, which has a character name index.

 Note: Z39.50 access to WorldCat records also supports MARC-8 and Unicode UTF-8 character sets. See Z39.50 Cataloging for information on non-Latin script support in Z39.50.

I multiscript in un singolo record sono validi

Utilizzare tutti gli script non latini supportati necessari in qualsiasi punto di un record, anche all'interno dello stesso campo.