ezEngine  Release 25.03
ezTokenizer Class Reference

Takes text and splits it up into ezToken objects. The result can be used for easier parsing. More...

#include <Tokenizer.h>

Public Member Functions

 ezTokenizer (ezAllocator *pAllocator=nullptr)
 Constructor. More...
 
void Tokenize (ezArrayPtr< const ezUInt8 > data, ezLogInterface *pLog, bool bCopyData=true)
 Clears any previous result and creates a new token stream for the given array. More...
 
const ezDeque< ezToken > & GetTokens () const
 Gives read access to the token stream.
 
ezDeque< ezToken > & GetTokens ()
 Gives read and write access to the token stream.
 
void GetAllTokens (ezDynamicArray< const ezToken * > &ref_tokens) const
 Returns an array with a copy of all tokens. Use this when using ezTokenParseUtils.
 
void GetAllLines (ezDynamicArray< const ezToken * > &ref_tokens) const
 Returns an array of all tokens. New line tokens are ignored.
 
ezResult GetNextLine (ezUInt32 &ref_uiFirstToken, ezHybridArray< const ezToken *, 32 > &ref_tokens) const
 Returns an array of tokens that represent the next line in the file. More...
 
ezResult GetNextLine (ezUInt32 &ref_uiFirstToken, ezHybridArray< ezToken *, 32 > &ref_tokens)
 
const ezArrayPtr< const ezUInt8 > GetTokenizedData () const
 Returns the internal copy of the tokenized data. Will be empty if Tokenize was called with 'bCopyData' equals 'false'.
 
void SetTreatHashSignAsLineComment (bool bHashSignIsLineComment)
 Enables treating lines that start with # character as line comments. More...
 

Detailed Description

Takes text and splits it up into ezToken objects. The result can be used for easier parsing.

The tokenizer is built to work on code that is similar to C. That means it will tokenize comments and strings as they are defined in the C language. Also line breaks that end with a backslash are not really considered as line breaks.
White space is defined as spaces and tabs.
Identifiers are names that consist of alphanumerics and underscores.
Non-Identifiers are everything else. However, they will currently never consist of more than a single character. Ie. '++' will be tokenized as two consecutive non-Identifiers.
Parenthesis etc. will not be tokenized in any special way, they are all considered as non-Identifiers.

The token stream will always end with an end-of-file token.

Constructor & Destructor Documentation

◆ ezTokenizer()

ezTokenizer::ezTokenizer ( ezAllocator pAllocator = nullptr)

Constructor.

Takes an additional optional allocator. If no allocator is given the default allocator will be used.

Member Function Documentation

◆ GetNextLine()

ezResult ezTokenizer::GetNextLine ( ezUInt32 &  ref_uiFirstToken,
ezHybridArray< const ezToken *, 32 > &  ref_tokens 
) const

Returns an array of tokens that represent the next line in the file.

Returns EZ_SUCCESS when there was more data to return, EZ_FAILURE if the end of the file was reached already. uiFirstToken is the index from where to start. It will be updated automatically. Consecutive calls to GetNextLine() with the same uiFirstToken variable will give one line after the other.

Note
This function takes care of handling the 'backslash/newline' combination, as defined in the C language. That means all such sequences will be ignored. Therefore the tokens that are returned as one line might not contain all tokens that are actually in the stream. Also the tokens might have different line numbers, when two or more lines from the file are merged into one logical line.
Todo:
Theoretically, if the line ends with an identifier, and the next directly starts with one again,

◆ SetTreatHashSignAsLineComment()

void ezTokenizer::SetTreatHashSignAsLineComment ( bool  bHashSignIsLineComment)
inline

Enables treating lines that start with # character as line comments.

Needs to be set before tokenization to take effect.

◆ Tokenize()

void ezTokenizer::Tokenize ( ezArrayPtr< const ezUInt8 >  data,
ezLogInterface pLog,
bool  bCopyData = true 
)

Clears any previous result and creates a new token stream for the given array.

Parameters
dataThe string data to be tokenized.
pLogA log interface that will receive any tokenization errors.
bCopyDataIf set, 'data' will be copied into a member variable and tokenization is run on the copy, allowing for the original data storage to be deallocated after this call. If false, tokenization will reference 'data' directly and thus, 'data' must outlive this instance.

The documentation for this class was generated from the following files: