![]() |
ezEngine
Release 25.03
|
Helper functions to work with Unicode. More...
#include <UnicodeUtils.h>
Classes | |
struct | UtfInserter |
[internal] Small helper class to append bytes to some arbitrary container. Used for Utf8 string building. More... | |
Static Public Member Functions | |
template<typename T > | |
static constexpr T * | GetMaxStringEnd () |
[internal] Returns the max string end pointer for the given type | |
static bool | IsASCII (ezUInt32 uiChar) |
Returns whether a character is a pure ASCII character (only the first 7 Bits are used) | |
static bool | IsUtf8StartByte (char iByte) |
Checks whether the given byte is a start byte in a UTF-8 multi-byte sequence. | |
static bool | IsUtf8ContinuationByte (char iByte) |
Checks whether the given byte is a byte in a UTF-8 multi-byte sequence. | |
static ezUInt32 | GetUtf8SequenceLength (char iFirstByte) |
Returns the number of bytes that a UTF-8 sequence is in length, which is encoded in the first byte of the sequence. | |
static ezUInt32 | ConvertUtf8ToUtf32 (const char *pFirstChar) |
Converts the UTF-8 character that starts at pFirstChar into a UTF-32 character. | |
static ezUInt32 | GetSizeForCharacterInUtf8 (ezUInt32 uiCharacter) |
Computes how many bytes the character would require, if encoded in UTF-8. | |
static ezResult | MoveToNextUtf8 (const char *&ref_szUtf8, ezUInt32 uiNumCharacters=1) |
Moves the given string pointer ahead to the next Utf8 character sequence. More... | |
static ezResult | MoveToNextUtf8 (const char *&ref_szUtf8, const char *szUtf8End, ezUInt32 uiNumCharacters=1) |
Moves the given string pointer ahead to the next Utf8 character sequence. More... | |
static ezResult | MoveToPriorUtf8 (const char *&ref_szUtf8, const char *szUtf8Start, ezUInt32 uiNumCharacters=1) |
Moves the given string pointer backwards to the previous Utf8 character sequence. More... | |
static bool | IsValidUtf8 (const char *szString, const char *szStringEnd=GetMaxStringEnd< char >()) |
Returns false if the given string does not contain a completely valid Utf8 string. | |
static bool | SkipUtf8Bom (const char *&ref_szUtf8) |
If the given string starts with a Utf8 Bom, the pointer is incremented behind the Bom, and the function returns true. More... | |
static bool | SkipUtf16BomLE (const ezUInt16 *&ref_pUtf16) |
If the given string starts with a Utf16 little endian Bom, the pointer is incremented behind the Bom, and the function returns true. More... | |
static bool | SkipUtf16BomBE (const ezUInt16 *&ref_pUtf16) |
If the given string starts with a Utf16 big endian Bom, the pointer is incremented behind the Bom, and the function returns true. More... | |
template<typename ByteIterator > | |
static ezUInt32 | DecodeUtf8ToUtf32 (ByteIterator &ref_szUtf8Iterator) |
Decodes the next character from the given Utf8 sequence to Utf32 and increments the iterator as far as necessary. | |
template<typename UInt16Iterator > | |
static bool | IsUtf16Surrogate (UInt16Iterator &ref_szUtf16Iterator) |
Characters that cannot be represented in a single utf16 code point need to be split up into two surrogate pairs to form unicode characters beyond the \uFFFF range. | |
template<typename UInt16Iterator > | |
static ezUInt32 | DecodeUtf16ToUtf32 (UInt16Iterator &ref_szUtf16Iterator) |
Decodes the next character from the given Utf16 sequence to Utf32 and increments the iterator as far as necessary. | |
template<typename WCharIterator > | |
static ezUInt32 | DecodeWCharToUtf32 (WCharIterator &ref_szWCharIterator) |
Decodes the next character from the given wchar_t sequence to Utf32 and increments the iterator as far as necessary. | |
template<typename ByteIterator > | |
static void | EncodeUtf32ToUtf8 (ezUInt32 uiUtf32, ByteIterator &ref_szUtf8Output) |
Encodes the given Utf32 character to Utf8 and writes as many bytes to the output iterator, as necessary. | |
template<typename UInt16Iterator > | |
static void | EncodeUtf32ToUtf16 (ezUInt32 uiUtf32, UInt16Iterator &ref_szUtf16Output) |
Encodes the given Utf32 character to Utf16 and writes as many bytes to the output iterator, as necessary. | |
template<typename WCharIterator > | |
static void | EncodeUtf32ToWChar (ezUInt32 uiUtf32, WCharIterator &ref_szWCharOutput) |
Encodes the given Utf32 character to wchar_t and writes as many bytes to the output iterator, as necessary. | |
template<typename Container > | |
static bool | RepairNonUtf8Text (const char *pStartData, const char *pEndData, Container &out_result) |
Checks an array of char's, whether it is a valid Utf8 string. If not, it repairs the string, ie by either re-encoding characters or removing them. Writes the result to the desired container type (ezString or ezStringBuilder). More... | |
Static Public Attributes | |
static constexpr ezUInt16 | Utf16BomLE = 0xfeff |
Byte Order Mark for Little Endian Utf16 strings. | |
static constexpr ezUInt16 | Utf16BomBE = 0xfffe |
Byte Order Mark for Big Endian Utf16 strings. | |
Helper functions to work with Unicode.
|
inlinestatic |
Moves the given string pointer ahead to the next Utf8 character sequence.
The string may point to an invalid position (in between a character sequence). It may not point to a zero terminator already.
|
inlinestatic |
Moves the given string pointer ahead to the next Utf8 character sequence.
The string may point to an invalid position (in between a character sequence). It may not point to a zero terminator already.
|
inlinestatic |
Moves the given string pointer backwards to the previous Utf8 character sequence.
The string may point to an invalid position (in between a character sequence), or even the \0 terminator, as long as there is a valid string before it (and the user knows when to stop).
|
static |
Checks an array of char's, whether it is a valid Utf8 string. If not, it repairs the string, ie by either re-encoding characters or removing them. Writes the result to the desired container type (ezString or ezStringBuilder).
Returns true if the text had to be repaired, false if it was already valid.
|
inlinestatic |
If the given string starts with a Utf16 big endian Bom, the pointer is incremented behind the Bom, and the function returns true.
Otherwise the pointer is unchanged and false is returned.
|
inlinestatic |
If the given string starts with a Utf16 little endian Bom, the pointer is incremented behind the Bom, and the function returns true.
Otherwise the pointer is unchanged and false is returned.
|
inlinestatic |
If the given string starts with a Utf8 Bom, the pointer is incremented behind the Bom, and the function returns true.
Otherwise the pointer is unchanged and false is returned.