InDesign SDK  20.5
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends Groups Pages
UnicodeSavvyString Class Reference

#include <UnicodeSavvyString.h>

Inheritance diagram for UnicodeSavvyString:
PMStringWideString

Public Types

typedef int32 size_type
 
typedef std::ptrdiff_t difference_type
 
typedef UTF16TextChar code_value
 
typedef UTF32TextChar code_point
 
typedef code_value * code_value_iterator
 
typedef code_value const * const_code_value_iterator
 
typedef const UnicodeSavvyStringconst_reference
 
typedef UTF16TextChar value_type
 

Public Member Functions

bool16 HasMultiWordUnicode () const
 
size_type CharCount () const
 
size_type NumUTF16TextChars () const
 
size_type capacity (void) const
 
void reserve (size_type newCapacity)
 
void resize (size_type newSize, code_value fill=code_value())
 
void clear ()
 
const UTF16TextChar * GrabUTF16Buffer (int32 *numUTF16s) const
 
int32 CodePointIndexToUTF16Index (int32 index) const
 
void Truncate (CharCounter count)
 
void Remove (int32 position, CharCounter count)
 
UTF32TextChar GetUTF32TextChar (int32 pos) const
 
const_code_value_iterator begin () const
 
const_code_value_iterator end () const
 

Protected Types

enum  { kMaxSmallString = 23 }
 

Protected Member Functions

 UnicodeSavvyString (adobe::move_from< UnicodeSavvyString > other)
 
 UnicodeSavvyString (UnicodeSavvyString &&other) noexcept
 
void move_from (UnicodeSavvyString &other) noexcept
 
template<class IteratorType >
 UnicodeSavvyString (IteratorType b, IteratorType e, size_type nCodePoints=0)
 
int32 CountChars () const
 
int32 CountCharsUtil (const UTF16TextChar *buffer, int32 bufferLength) const
 
void InsertGap (uint32 wordWiseIndex, size_type numberOfSpaces)
 
void RemoveGap (uint32 wordWiseIndex, size_type numberOfSpaces)
 
void InsertUTF32TextChar (UTF32TextChar c, int32 pos=0)
 
void InsertUTF16String (const UTF16TextChar *buf, int32 len, int32 position=0)
 
void AppendUTF32TextChar (UTF32TextChar c32)
 
void CopyFrom (const UnicodeSavvyString &other)
 
bool16 operator== (const UnicodeSavvyString &s) const
 
template<class IteratorType >
UnicodeSavvyStringassign (IteratorType b, IteratorType e, size_type nCodePoints=0)
 
UnicodeSavvyStringreplace (size_type pos, size_type n1, code_value const *s, size_type n2)
 
UnicodeSavvyStringappend (code_value const *s, size_type nCodeValues, size_type nCodePoints=0)
 
UTF32TextChar surro_GetUTF32TextChar (int32 pos) const
 
const UTF16TextChar * ConstBuffer () const
 
void insert_safe (code_value_iterator i, const_code_value_iterator sb, const_code_value_iterator se)
 
void erase_safe (code_value_iterator b, code_value_iterator e)
 
void replace_safe (code_value_iterator b, code_value_iterator e, const_code_value_iterator sb, const_code_value_iterator se)
 
template<class InputIterator >
void assign_impl (InputIterator b, InputIterator e, size_type nCodePoints, std::input_iterator_tag)
 
template<class FwdIterator >
void assign_impl (FwdIterator b, FwdIterator e, size_type nCodePoints, std::forward_iterator_tag)
 
bool16 UnicodeBufferIsValid () const
 
UTF16TextChar * GetBufferForWriting (size_type size)
 

Protected Attributes

StringStoragefStorage
 
UTF16TextChar fSmallStorage [kMaxSmallString+1]
 
size_type fUTF16BufferLength
 
size_type fNumChars
 

Friends

void swap (UnicodeSavvyString &lhs, UnicodeSavvyString &rhs) noexcept
 

Detailed Description

This is a base class that handles UTF16 code values. It is really important that the users of this class understand the distinction between a code value and a code point. A code point (or a Unicode character) can be stored as one or more code values. In the UTF16 encoding you can have one or two code values for each code point. When a character has two code values, those are called surrogates. Most of the functions of this class work on code values, not code points.

Constructor & Destructor Documentation

UnicodeSavvyString::UnicodeSavvyString (adobe::move_from< UnicodeSavvyStringother)
inlineprotected

Movable constructor - assumes ownership of the remote part

template<class IteratorType >
UnicodeSavvyString::UnicodeSavvyString (IteratorType b,
IteratorType e,
size_type nCodePoints = 0 
)
inlineprotected

Constructs the string using a range of code values [b, e). The code values in the range need to be UTF16 encoded.

Parameters
b[IN] - beginning of the range.
e[IN] - end of the range (one past last one).
nCodePoints[IN, OPTIONAL] - number of code points in the range. This parameter can be used for optimization purposes, if the caller knows the number of code points represented in the range.

Member Function Documentation

UnicodeSavvyString& UnicodeSavvyString::append (code_value const * s,
size_type nCodeValues,
size_type nCodePoints = 0 
)
protected

Appends the code values from the C-array s at the end of the current string.

Parameters
s[IN] - C-array of code values that will be added to this string.
nCodeValues[IN] - number of code values to be added.
nCodePoints[IN, OPTIONAL] - number of code points that nCodeValues represent. This can be used for optimization purposes if the caller knows how many code points are added.
Returns
reference to this string.
template<class IteratorType >
UnicodeSavvyString & UnicodeSavvyString::assign (IteratorType b,
IteratorType e,
size_type nCodePoints = 0 
)
inlineprotected

Assigns to the string the code values in the specified range [b, e). The code values in the range need to be UTF16 encoded.

Parameters
b[IN] - beginning of the range.
e[IN] - end of the range (one past last one).
nCodePoints[IN, OPTIONAL] - number of code points in the range. This parameter can be used for optimization purposes, if the caller knows the number of code points represented in the range.
const_code_value_iterator UnicodeSavvyString::begin () const
inline

Returns a const iterator for the beginning of the storage of the string. The iterator works only over code values and it is agnostic of code points.

size_type UnicodeSavvyString::capacity (void ) const
inline

Retrieves the number of UTF16 code values that we can fit in the string without re-allocating. An unicode code value is not the same with an unicode code point (unicode character). Beware of unicode code points that can span 2 code values (surrogates)!

Returns
current capacity in code values that the string can hold.
size_type UnicodeSavvyString::CharCount () const
inline

Retrieves the number of code points stored in this string. The number of code points can be different from the number of code values if surrogates are present

Returns
number of unicode code points.
void UnicodeSavvyString::clear ()

Erases the string making it empty. Capacity stays the same.

See Also
reserve, capacity
int32 UnicodeSavvyString::CodePointIndexToUTF16Index (int32 index) const
inline

Converts a code point index to a code value index inside the string.

Parameters
index[IN] - zero based index of the code point.
Returns
the code value index where the code point start in the UTF16 buffer.
const_code_value_iterator UnicodeSavvyString::end () const
inline

Returns a const iterator for the end of the storage of the string. The iterator works only over code values and it is agnostic of code points.

UTF32TextChar UnicodeSavvyString::GetUTF32TextChar (int32 pos) const
inline

Retrieves the unicode code point at the specified position.

Parameters
pos[IN] - position (in code points) where the character is.
Returns
the unicode character.
const UTF16TextChar * UnicodeSavvyString::GrabUTF16Buffer (int32 * numUTF16s) const
inline

Retrieves a pointer to a UTF16 encoded representation of the string (null terminated). This function is identical to c_str() of the std::string.

Parameters
numUTF16s[OUT, OPTIONAL] - if the pointer is not nil the function will set it on return to the number of code values it contains.
Returns
a pointer to a null terminated buffer of code values. This pointer can (and will) be different after a non-const method was called on the string.
bool16 UnicodeSavvyString::HasMultiWordUnicode () const
inline

Checks if the string has surrogates.

Returns
true if the string has surrogate pairs.
void UnicodeSavvyString::move_from (UnicodeSavvyStringother)
inlineprotectednoexcept

Moves the data from other into this leaving other in destructible state

size_type UnicodeSavvyString::NumUTF16TextChars () const
inline

Retrieves the number of code values present in this string. The number of code points (characters) can be smaller than this number if surrogates are present.

bool16 UnicodeSavvyString::operator== (const UnicodeSavvyStrings) const
protected

Equality check for two strings.

Parameters
s[IN] - other string to compare with.
Returns
kTrue if the strings are equal, kFalse otherwise.
void UnicodeSavvyString::Remove (int32 position,
CharCounter count 
)

Removes the specified number of code points starting at position.

Parameters
position[IN] - index of code point from where the removal should start.
count[IN] - number of code points to remove. If the value of count is kMaxInt32 the function will remove all the code points after position.
UnicodeSavvyString& UnicodeSavvyString::replace (size_type pos,
size_type n1,
code_value const * s,
size_type n2 
)
protected

Replaces the code values in range [pos, pos + n1) with n2 code values from the C-array s. WARNING: This function operates on CODE VALUES only. It doesn't know anything about surrogate pairs. It is the caller's responsability to make sure that the replacement leaves the string in a consistent state. The function grows the string if necessary to accomodate for the replacement string.

Parameters
pos[IN] - index of the code value from where the replacement starts.
n1[IN] - number of code values to be replaced.
s[IN] - C-array of code values that will replace the existing code values in the string. Needs to have at least n2 code values in it.
n2[IN] - length of the replacement sequence.
Returns
reference to this string.
void UnicodeSavvyString::reserve (size_type newCapacity)

Reserves internal memory for at least newSize UTF16 code values. If newCapacity is smaller than the current capacity, the call is taken as a nonbinding request to shrink the capacity. The capacity is never reduced below the current number of code values in the string (a call to reserve() doesn't modify the number of code values in the string). Each reallocation invalidates all references, pointers and iterators and it carries a cost so a preemptive call to reserve() is useful to increase speed and not invalidate references and iterators.

Parameters
newCapacity[IN] - the minimum capacity that the string should have.
void UnicodeSavvyString::resize (size_type newSize,
code_value fill = code_value() 
)

Changes the number of code values of *this to newSize. If newSize is bigger than current size, new code values initialized with the fill value are appended to the string. If fill parameter is not specified, the default constructor for code_value is used ('\0'). If newSize is smaller, code values are removed from the end of the string. Calling resize(0) has the same effect as clearing the string.

Parameters
newSize[IN] - the new size of the string.
fill[IN] - the fill value for new code values if size increases.
void UnicodeSavvyString::Truncate (CharCounter count)

Truncates the string so it contains the specified number of code points.

Parameters
count[IN] - the desired number of code points the string should contain.

Friends And Related Function Documentation

void swap (UnicodeSavvyStringlhs,
UnicodeSavvyStringrhs 
)
friend

Swaps this object with another one. swap() should never throw. The swap idiom is used to efficiently exchange two objects. It is important to declare swap for your own data structs so other classes can contain them and implement swap(). Is is important to have swap() defined in your class because it allows other clients who use it as a data member to implement a correct assignment operator for their classes.

Parameters
rhs[IN/OUT] - the other object.