Unicode

Unicode — Unicode and UTF-8 utility functions.

Synopsis

typedef             raptor_unichar;
int                 raptor_unicode_char_to_utf8         (raptor_unichar c,
                                                         unsigned char *output);
int                 raptor_utf8_to_unicode_char         (raptor_unichar *output,
                                                         unsigned char *input,
                                                         int length);
int                 raptor_unicode_is_xml11_namestartchar
                                                        (raptor_unichar c);
int                 raptor_unicode_is_xml10_namestartchar
                                                        (raptor_unichar c);
int                 raptor_unicode_is_xml11_namechar    (raptor_unichar c);
int                 raptor_unicode_is_xml10_namechar    (raptor_unichar c);
int                 raptor_utf8_check                   (unsigned char *string,
                                                         size_t length);

Description

Functions to support converting to and from Unicode written in UTF-8 which is the native internal string format of all the redland libraries. Includes checking for Unicode names using either the XML 1.0 or XML 1.1 rules.

Details

raptor_unichar

typedef unsigned long raptor_unichar;

raptor Unicode codepoint


raptor_unicode_char_to_utf8 ()

int                 raptor_unicode_char_to_utf8         (raptor_unichar c,
                                                         unsigned char *output);

Convert a Unicode character to UTF-8 encoding.

Based on librdf_unicode_char_to_utf8() with no need to calculate length since the encoded character is always copied into a buffer with sufficient size.

c :

Unicode character

output :

UTF-8 string buffer or NULL

Returns :

bytes encoded to output buffer or <0 on failure

raptor_utf8_to_unicode_char ()

int                 raptor_utf8_to_unicode_char         (raptor_unichar *output,
                                                         unsigned char *input,
                                                         int length);

Convert an UTF-8 encoded buffer to a Unicode character.

If output is NULL, then will calculate the number of bytes that will be used from the input buffer and not perform the conversion.

output :

Pointer to the Unicode character or NULL

input :

UTF-8 string buffer

length :

buffer size

Returns :

bytes used from input buffer or <0 on failure: -1 input buffer too short or length error, -2 overlong UTF-8 sequence, -3 illegal code positions, -4 code out of range U+0000 to U+10FFFF. In cases -2, -3 and -4 the coded character is stored in the output.

raptor_unicode_is_xml11_namestartchar ()

int                 raptor_unicode_is_xml11_namestartchar
                                                        (raptor_unichar c);

Check if Unicode character is legal to start an XML 1.1 Name

Namespaces in XML 1.1 REC 2004-02-04 http://www.w3.org/TR/2004/REC-xml11-20040204/NT-NameStartChar updating Extensible Markup Language (XML) 1.1 REC 2004-02-04 http://www.w3.org/TR/2004/REC-xml11-20040204/ sec 2.3, [4a] excluding the ':'

c :

Unicode character to check

Returns :

non-0 if legal

raptor_unicode_is_xml10_namestartchar ()

int                 raptor_unicode_is_xml10_namestartchar
                                                        (raptor_unichar c);

Check if Unicode character is legal to start an XML 1.0 Name

Namespaces in XML REC 1999-01-14 http://www.w3.org/TR/1999/REC-xml-names-19990114/NT-NCName updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 http://www.w3.org/TR/2004/REC-xml-20040204/ excluding the ':'

c :

Unicode character to check

Returns :

non-0 if legal

raptor_unicode_is_xml11_namechar ()

int                 raptor_unicode_is_xml11_namechar    (raptor_unichar c);

Check if a Unicode codepoint is a legal to continue an XML 1.1 Name

Namespaces in XML 1.1 REC 2004-02-04 http://www.w3.org/TR/2004/REC-xml11-20040204/ updating Extensible Markup Language (XML) 1.1 REC 2004-02-04 http://www.w3.org/TR/2004/REC-xml11-20040204/ sec 2.3, [4a] excluding the ':'

c :

Unicode character

Returns :

non-0 if legal

raptor_unicode_is_xml10_namechar ()

int                 raptor_unicode_is_xml10_namechar    (raptor_unichar c);

Check if a Unicode codepoint is a legal to continue an XML 1.0 Name

Namespaces in XML REC 1999-01-14 http://www.w3.org/TR/1999/REC-xml-names-19990114/NT-NCNameChar updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 http://www.w3.org/TR/2004/REC-xml-20040204/ excluding the ':'

c :

Unicode character

Returns :

non-0 if legal

raptor_utf8_check ()

int                 raptor_utf8_check                   (unsigned char *string,
                                                         size_t length);

Check a string is UTF-8.

string :

UTF-8 string

length :

length of string

Returns :

Non 0 if the string is UTF-8