![]() |
ezEngine
Release 25.03
|
Takes text and splits it up into ezToken objects. The result can be used for easier parsing. More...
#include <Tokenizer.h>
Public Member Functions | |
ezTokenizer (ezAllocator *pAllocator=nullptr) | |
Constructor. More... | |
void | Tokenize (ezArrayPtr< const ezUInt8 > data, ezLogInterface *pLog, bool bCopyData=true) |
Clears any previous result and creates a new token stream for the given array. More... | |
const ezDeque< ezToken > & | GetTokens () const |
Gives read access to the token stream. | |
ezDeque< ezToken > & | GetTokens () |
Gives read and write access to the token stream. | |
void | GetAllTokens (ezDynamicArray< const ezToken * > &ref_tokens) const |
Returns an array with a copy of all tokens. Use this when using ezTokenParseUtils. | |
void | GetAllLines (ezDynamicArray< const ezToken * > &ref_tokens) const |
Returns an array of all tokens. New line tokens are ignored. | |
ezResult | GetNextLine (ezUInt32 &ref_uiFirstToken, ezHybridArray< const ezToken *, 32 > &ref_tokens) const |
Returns an array of tokens that represent the next line in the file. More... | |
ezResult | GetNextLine (ezUInt32 &ref_uiFirstToken, ezHybridArray< ezToken *, 32 > &ref_tokens) |
const ezArrayPtr< const ezUInt8 > | GetTokenizedData () const |
Returns the internal copy of the tokenized data. Will be empty if Tokenize was called with 'bCopyData' equals 'false'. | |
void | SetTreatHashSignAsLineComment (bool bHashSignIsLineComment) |
Enables treating lines that start with # character as line comments. More... | |
Takes text and splits it up into ezToken objects. The result can be used for easier parsing.
The tokenizer is built to work on code that is similar to C. That means it will tokenize comments and strings as they are defined in the C language. Also line breaks that end with a backslash are not really considered as line breaks.
White space is defined as spaces and tabs.
Identifiers are names that consist of alphanumerics and underscores.
Non-Identifiers are everything else. However, they will currently never consist of more than a single character. Ie. '++' will be tokenized as two consecutive non-Identifiers.
Parenthesis etc. will not be tokenized in any special way, they are all considered as non-Identifiers.
The token stream will always end with an end-of-file token.
ezTokenizer::ezTokenizer | ( | ezAllocator * | pAllocator = nullptr | ) |
Constructor.
Takes an additional optional allocator. If no allocator is given the default allocator will be used.
ezResult ezTokenizer::GetNextLine | ( | ezUInt32 & | ref_uiFirstToken, |
ezHybridArray< const ezToken *, 32 > & | ref_tokens | ||
) | const |
Returns an array of tokens that represent the next line in the file.
Returns EZ_SUCCESS when there was more data to return, EZ_FAILURE if the end of the file was reached already. uiFirstToken is the index from where to start. It will be updated automatically. Consecutive calls to GetNextLine() with the same uiFirstToken variable will give one line after the other.
|
inline |
Enables treating lines that start with # character as line comments.
Needs to be set before tokenization to take effect.
void ezTokenizer::Tokenize | ( | ezArrayPtr< const ezUInt8 > | data, |
ezLogInterface * | pLog, | ||
bool | bCopyData = true |
||
) |
Clears any previous result and creates a new token stream for the given array.
data | The string data to be tokenized. |
pLog | A log interface that will receive any tokenization errors. |
bCopyData | If set, 'data' will be copied into a member variable and tokenization is run on the copy, allowing for the original data storage to be deallocated after this call. If false, tokenization will reference 'data' directly and thus, 'data' must outlive this instance. |