1. Format Functions
The expression fmt(arg)
returns a printable object that contains a copy or reference of
arg
as well as format information that can be edited with the
member functions listed in the tables below.
auto str = strf::to_string( +*strf::fmt(1.0).sci().fill(U'~') ^ 15 );
assert(str == "~~~~+1.e+01~~~~")
Member function | Effect |
---|---|
|
Aligns to the left ( Or to the right right-to-left (RTL) script ) |
|
Aligns to the right ( Or to the left right-to-left (RTL) script ) |
|
Center alignment |
|
Splits the content, as in |
|
Sets the fill character. |
Member function | Effect |
---|---|
|
Sets the precision. For integers and floating points. |
|
Uses the binary base. For integers only |
|
Uses the octal base. For integers only |
|
Uses the decimal base. For integers only |
|
Uses the hexadecimal base. For integers and floating points. |
|
Prints base indication on integers. Print decimal point on floating points. |
|
Prints |
|
Uses fixed notation. For floating points only. |
|
Uses scientific notation. For floating points only. |
|
Uses "general" notation. For floating points only. |
Member function | Effect |
---|---|
|
Prints the argument |
Member function | Effect |
---|---|
|
Sets string precision |
|
Transcodes the input string from the character encoding represented by |
|
Transocdes the input string from the character encoding represented by |
|
Equivalent to |
|
Equivalent to |
|
Transcodes the input string from the character encoding that corresponds to its character type, or just sanitizes it if it same as the destination encoding. |
|
Transsodes the input string from the character encoding that correspods to its character type, if it is not already the same as the destination encoding. |
The library also provides some global function templates that work as alias to format functions:
Expression | Equivalent Expression |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
auto str = strf::to_string( +*strf::center(1.0, 9, U'~') );
assert(str == "~~~+1.~~~")
2. Destinations
Expression | Header |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where:
-
CharT
is a charater type. -
Traits
is a CharTraits type. -
A
is an Allocator type -
char_ptr
is aCharT*
value, whereCharT
is a character type. -
end
is aCharT*
value, whereCharT
is a character type. -
count
is astd::size_t
value -
streambuf_ptr
is astd::streambuf<CharT, Traits>*
-
streambuf_ref
is astd::streambuf<CharT, Traits>&
-
cfile
is aFILE*
-
outbuff_ref
is abasic_outbuff<CharT>&
, whereCharT
is a character type. -
args...
is an argument list of printable values.
strf::to(outbuff_ref) (args...)
Return type |
|
Return value |
|
Supports reserve |
No
See the list of types that derive from |
Header file | |
Preconditions |
|
Return type | |
Return value |
a value
|
Note |
The termination character |
Supports reserve |
No |
strf::to_basic_string <CharT, Traitsopt, Aopt> ( args... )
Return type |
|
Supports reserve |
Yes |
strf::to_string ( args... )
Return type |
|
Supports reserve |
Yes |
strf::to_u8string ( args... )
Return type |
|
Supports reserve |
Yes |
strf::to_u16string ( args... )
Return type |
|
Supports reserve |
Yes |
strf::to_u32string ( args... )
Return type |
|
Supports reserve |
Yes |
strf::to_wstring ( args... )
Return type |
|
Supports reserve |
Yes |
Return type | |
Return value |
A value
|
Supports reserve |
No |
to<CharTopt>(cfile) (args...)
- Effect
-
Successively call
std::fwrite(buffer, sizeof(CharT),/*...*/, cfile)
until the whole content is written or until an error happens, wherebuffer
is an internal array ofCharT
.
Return type | |
Return value |
|
Supports reserve |
No |
wto(cfile) (args...)
Header file | |
Return type |
3. Derivates of basic_outbuff
The table below lists the concrete types that derivate from the basic_outbuff<CharT>
abstract class.
Type | Description |
---|---|
|
Writes C strings |
|
Discard content |
|
Appends to |
|
Creates |
|
Creates |
|
Writes to |
|
Writes to |
Writes to |
where:
-
CharT
is a charater type. -
Traits
is a CharTraits type. -
A
is an Allocator type
4. Tr-string
auto s = strf::to_string.tr("{} in hexadecimal is {}", x, strf::hex(x));
The tr-string is like what in other formatting libraries would be called as the format string, except that it does not specify any formatting. Its purpose is to enable your program to provide multilingual support by using translation tools like gettext.
Since it is common for the person who writes the string to be translated not being the same who translates it, the tr-string syntax allows the insertion of comments.
A '{' followed by |
until | means |
---|---|---|
|
the next |
a comment |
a digit |
the next |
a positional argument reference |
another |
the second |
an escaped |
any other character |
the next |
a non positional argument reference |
- Comments
-
auto str = strf::to_string.tr ( "You can learn more about python{-the programming language, not the animal species} at {}" , "www.python.org" ); assert(str == "You can learn more about python at www.python.org");
- Positional arguments
-
Position zero refers to the first input argument. The characters the after the digits are ignored. So they can also be used as comments.
auto str = strf::to_string.tr("{1 a person} likes {0 a food type}.", "sandwich", "Paul"); assert(str == "Paul likes sandwich.");
- Non positional arguments
-
The characters the after the
'{'
are ignored as wellauto str = strf::to_string.tr("{a person} likes {a food type}.", "Paul", "sandwich"); assert(str == "Paul likes sandwich.");
- Escapes
-
Note there is no way and no need to escape the
'}'
character, since it has special meaning only when corresponding to a previous ’{'auto str = strf::to_string.tr("} {{x} {{{} {{{}}", "aaa", "bbb"); assert(str == "} {x} {aaa {bbb}");
Tr-string error handling
When the argument associated with a "{"
does not exists, the library does two things:
-
It prints a replacement character
"\uFFFD"
(�) ( or"?"
when the encoding can’t represent it ) where the missing argument would be printed. -
It calls the
handle
function on the facet object correspoding to thetr_error_notifier_c
category, which, by default, does nothing.
5. Facet Categories
Category | Constrainable | Description |
---|---|---|
Yes |
Defines how the width is calculated |
|
|
Yes |
Numeric punctuation for decimal base |
|
Yes |
Numeric punctuation for hexadecimal base |
|
Yes |
Numeric punctuation for octal base |
|
Yes |
Numeric punctuation for binary base |
Yes |
Letter case for printing numeric and booleans values |
|
|
No |
The character encoding correponding to character type |
Yes |
Callback to notify character encoding nonconformities. |
|
Yes |
Wheter surrogates are treated as errors |
|
No |
Callback to notify errors on the tr-string |
6. Numeric punctuation
The numpunct
class template defines punctuation for
integers, void*
and floating points. It comprises
the "thousands" separator, the decimal point and the
grouping pattern.
The integer sequence passed to the constructor defines the grouping.
The last group is repeated, unless you add the -1
argument:
auto str1 = strf::to_string.with(numpunct<10>(1, 2, 3))(1000000000000ll);
assert(str1 == "1,000,000,000,00,0");
auto str2 = strf::to_string.with(numpunct<10>(1, 2, 3, -1))(1000000000000ll);
assert(str2 == "1000000,000,00,0");
This numpunct
constructor has some preconditions:
-
No more than six arguments can be passed.
-
No argument can be greater than 30.
-
No argument can be less than
1
, unless it is the last argument and it’s equal to-1
.
When default constructed, the numpunct
has no grouping, i.e.
the thousands separator is never printed.
The default thousands separator and decimal point are U','
and U'.'
,
repectively. To change them, use the thousands_sep
and decimal_point
member functions:
auto my_punct = numpunct<10>(3).thousands_sep(U'\'').decimal_point(U':');
auto str = strf::to_string.with(my_punct)(1000000.5);
assert(str == "1'000'000:5");
//or as lvalue:
auto my_punct2 = numpunct<10>(3);
my_punct2.thousands_sep(U';');
my_punct2.decimal_point(U'^');
auto str = strf::to_string.with(my_punct2)(1000000.5);
assert(str == "1;000;000^5");
Numeric punctuation from locale
The header file <strf/locale.hpp>
declares the locale_numpunct
function,
which returns a numpunct<10>
object that reflects the current locale:
#include <strf/locale.hpp>
#include <strf/to_string.hpp>
void sample() {
if (setlocale(LC_NUMERIC, "de_DE")) {
const auto punct_de = strf::locale_numpunct();
auto str = strf::to_string.with(punct_de) (*strf::fixed(10000.5))
assert(str == "10.000,5");
}
}
7. Letter case
The lettercase
facet affects the letter cases when printing numeric values.
The default value is strf::lowercase
.
namespace strf {
enum class lettercase { lower = /*...*/, mixed = /*...*/, upper = /*...*/ };
constexpr lettercase lowercase = lettercase::lower;
constexpr lettercase mixedcase = lettercase::mixed;
constexpr lettercase uppercase = lettercase::upper;
}
Value | Result examples |
---|---|
|
|
|
|
|
|
auto str_upper = strf::to_string.with(strf::uppercase)
( *strf::hex(0xabc), ' '
, 1.0e+50, ' '
, std::numeric_limits<FloatT>::infinity() );
assert(str_upper == "0XAB 1E+50 INF");
auto str_mixed = strf::to_string.with(strf::mixedcase)
( *strf::hex(0xabc), ' '
, 1.e+50, ' '
, std::numeric_limits<FloatT>::infinity() );
assert(str_mixed == "0xAB 1e+50 Inf");
8. Character encodings
The class templates below are facets that specify the character encoding corresponding to the character type. Their instances are always empty classes. More encodings are expected to be included in future version of the library.
namespace strf {
template <typename CharT> struct ascii;
template <typename CharT> struct iso_8859_1;
template <typename CharT> struct iso_8859_3;
template <typename CharT> struct iso_8858_15;
template <typename CharT> struct windows_1252;
template <typename CharT> struct utf8;
template <typename CharT> struct utf16;
template <typename CharT> struct utf32;
template <typename CharT>
using utf = /* utf8<CharT> , utf16<CharT> or utf32<CharT> */
/* , depending on sizeof(CharT) */;
} // namespace strf
auto s = strf::to_string
.with(strf::windows_1252<char>())
.with(strf::numpunct<10>{4, 3, 2}.thousands_sep(0x2022))
("one hundred billions = ", 100000000000ll);
// The character U+2022 is encoded as '\225' in Windows-1252
assert(s == "one hundred billions = 1\2250000\225000\2250000");
Encoding conversion
Since the library knows the encoding correspondig to each
character type, and knows how to convert from one to another,
it is possible to mix input string of difference characters
types, though you need to use the function conv
:
auto str = strf::to_string( "aaa-"
, strf::conv(u"bbb-")
, strf::conv(U"ccc-")
, strf::conv(L"ddd") );
auto str16 = strf::to_u16string( strf::conv("aaa-")
, u"bbb-"
, strf::conv(U"ccc-")
, strf::conv(L"ddd") );
assert(str == "aaa-bbb-ccc-ddd");
assert(str16 == u"aaa-bbb-ccc-ddd");
The conv
function can also specify an alternative encoding
for a specific input string argument:
auto str_utf8 = strf::to_u8string
( strf::conv("--\xA4--", strf::iso_8859_1<char>())
, strf::conv("--\xA4--", strf::iso_8859_15<char>()));
assert(str_utf8 == u8"--\u00A4----\u20AC--");
The sani
function has the same effect as conv
,
except when the input encoding is same as the output.
In this case sani
causes the input to be sanitized, whereas conv
does not:
auto str = strf::to_string
.with(strf::iso_8859_3<char>()) // the output encoding
( strf::conv("--\xff--") // not sanitized
, strf::conv("--\xff--", strf::iso_8859_3<char>()) // not sanitized ( same encoding )
, strf::conv("--\xff--", strf::utf8<char>()) // sanitized ( different encoding )
, strf::sani("--\xff--") // sanitized
, strf::sani("--\xff--", strf::iso_8859_3<char>()) ) // sanitized
assert(str == "--\xff----\xff----?----?----?--");
The library replaces the invalid sequences by the
replacement character �,
if the destination encoding supports it. Otherwise, '?'
is printed, as in the above code snippet.
An "invalid sequence" is any input that is non-conformant to the source encoding, or that is impossible to write, in a conformant way, in the destination encoding. But there is an optional exception for surrogates characters.
When the input is UTF-8, the library follows the practice recommended by the Unicode Standard regarding to calculate how many replacement characters to print for each non-conformant input sequence. ( see for "Best Practices for Using U+FFFD" in Chapter 3 ). |
The library does not sanitizes non-conformities when converting a single character,
like punctuation characters or the the fill character ( they are in UTF-32 ). In this case
the replacement character is only used when the destination encoding is not able
to print the codepoint.
For example, if the you use (char32_t)0xFFFFFFF as the decimal point,
then it will printed as "\uFFFD" if the destination is UTF-8 or UTF-16, but
if the destination is UTF-32, then the library just writes (char32_t)0xFFFFFFF
verbatim.
|
Surrogates tolerance
There is one particular kind of nonconformity that you may sometimes want to permit, which is the invalid presence of surrogate characters. That is particular common on Windows, where you may have an old file name, created at the time of Windows 95 ( where wide strings were UCS-2 ) and that contains some unpaired surrogates. If you then treat it as UTF-16 and convert it to UTF-8 and back to UTF-16, you get a different name.
So the library provides the surrogate_policy
enumeration, which is a facet that
enables you to turn off the surrogate sanitization.
namespace strf {
enum class surrogate_policy : bool { strict = false, lax = true };
}
When the value is surrogate_policy::strict
, which is the default,
if an UTF-16 input contains a high surrogate not followed by a low surrogate,
or a low surrogate not following a high surrogate, that is considered
invalid and is thus sanitized.
When the value is surrogate_policy::lax
, those situations are allowed.
std::u16string original {u'-', 0xD800 ,u'-', u'-', u'-'};
// convert to UTF-8
auto str_strict = strf::to_u8string(strf::conv(original));
auto str_lax =
strf::to_u8string .with(strf::surrogate_policy::lax) (strf::conv(original));
assert(str_strict == u8"-\uFFFD---"); // surrogate sanitized
assert(str_lax == (const char8_t*)"-\xED\xA0\x80---"); // surrogate allowed
// convert back to UTF-16
auto utf16_strict = strf::to_u16string(strf::conv(str_lax));
auto utf16_lax =
strf::to_u16string .with(strf::surrogate_policy::lax) (strf::conv(str_lax));
assert(utf16_strict == u"-\uFFFD\uFFFD\uFFFD---"); // surrogate sanitized
assert(utf16_lax == original); // surrogate preserved
Encoding error notifier function
The facet invalid_seq_notifier
contains a function pointer that is called
every time an ivalid sequence is sanitized, unless it is null, which is the
default.
The code below throws an exception if u16str
contains any invalid sequence:
std::u16string u16str = /* ... */;
notifier_func = [](){
throw std::sytem_error(std::make_error_code(std::errc::illegal_byte_sequence));
};
strf::invalid_seq_notifier notifier{ notifier_func };
auto str = strf::to_string.width(notifier)(strf::conv(u16str));
9. Width Calculation
The width_calculator_c
facet category
enables you to choose how the width of a string is calculated when using
alignment formatting. You have five options:
-
The
fast_width
facet assumes that the width of a string is equal to its size. This is the least accurate method, but it’s the fastest and also is what others formatting libraries usually do. Therefore it’s the default facet.Exampleauto str = "15.00 \xE2\x82\xAC \x80"; // "15.00 € \x80" auto result = strf::to_string.with(strf::fast_width{}) ( strf::right(str, 12, '*') ); assert(result == "*15.00 \xE2\x82\xAC \x80"); // width calculated as 11
-
The
width_as_fast_u32len
facet evaluates the width of a string as the number of Unicode code points. However, differently fromwidth_as_u32len
, to gain performance, it assumes that the measured string is totally conformant to its encoding. Nonconformities do not cause undefined behaviour, but lead to incorrect values. For example, the width of an UTF-8 string may simply be calculated as the number of bytes that are not in the range [0x80
,0xBF
], i.e., are not continuation bytes. So that an extra continuation byte — that would replaced by a"\uFFFD"
during sanitization — is not counted.Exampleauto str = "15.00 \xE2\x82\xAC \x80"; // "15.00 € \x80" auto result = strf::to_string .with(strf::width_as_fast_u32len{}) ( strf::right(str, 12, '*')); assert(result == "****15.00 \xE2\x82\xAC \x80"); // width calculated as 8
-
The
width_as_u32len
facet also evaluates the width of a string as the number of Unicode code points. But each nonconformity to the encoding is counted as an extra code points ( as if it were replaced by replacement character � ).Exampleauto str = "15.00 \xE2\x82\xAC \x80"; // "15.00 € \x80" auto result = strf::to_string .with(strf::width_as_u32len{}) ( strf::right(str, 12, '*')); assert(result == "***15.00 \xE2\x82\xAC \x80"); // width calculated as 9
-
The
make_width_calculator
function template takes a function objectf
as paramenter and returns a facet object that calculates the width of the strings by converting them to UTF-32 ( following the policy associated toinvalid_seq_policy::replace
) and then callingf
to evaluate the width of each UTF-32 character.f
shall take achar32_t
parameter and return awidth_t
, which is a type that implements Q16.16 fixed-point arithmetics. This means that can use non itegral values.Exampleauto wfunc = [](char32_t ch) → strf::width_t { using namespace strf::width_literal; static const strf::width_t roman_numerals_width [] = { 0.5642_w, 1.1193_w, 1.6789_w, 1.8807_w, 1.2982_w, 1.8853_w, 2.4954_w, 3.0046_w, 1.8945_w, 1.3624_w, 1.9035_w, 2.4771_w, 1.1789_w, 1.4495_w, 1.4128_w, 1.7294_w }; if (ch < 0x2160 || ch > 0x216F) { return 1; } return roman_numerals_width[ch - 0x2160]; }; auto my_wcalc = strf::make_width_calculator(wfunc); auto str = u8"\u2163 + \u2167 = \u216B"; // "Ⅳ + Ⅷ = Ⅻ" auto result = strf::to_u8string.with(my_wcalc) (strf::right(str, 18, '.')); // width calculated as 13.3624, rounded to 13: assert(result == u8".....\u2163 + \u2167 = \u216B");
-
The fifth option is to implement your own width calculator. This implies to create a class that satisfies the WidthCalculator type requirements. There are two reasons why you may want to do that, instead of the of the previous options:
-
Accuracy: The previous methods are not able to take into account the presence of ligatures and digraphs.
-
Peformance: The object returned by
make_width_calculator
converts the string to UTF-32 before calling the provided function object for each UTF-32 character. When you implement your own calculator, you can optimize it to directly measure strings that are encoded in a specific encoding.
-
The width calculation algorithm is applied
on the input, not the output string. Keep that in mind when
converting from one encoding to another using
|
10. Ranges
Without formatting
|
|
|
|
|
|
where
-
range_obj
is an object whose type is a Container type -
begin
andend
are iterators -
separator
is a raw string ofCharT
, whereCharT
is the destination character type. -
func
is unary a function object such that the type of expressionfunc(x)
is printable wherex
is an element of the range.
int arr[3] = { 11, 22, 33 };
auto str = strf::to_string(strf::range(arr));
assert(str == "112233");
str = strf::to_string(strf::separated_range(arr, ", "));
assert(str == "11, 22, 33");
auto op = [](auto x){ return strf::join('(', +strf::fmt(x * 10), ')'); };
str = strf::to_string(strf::separated_range(arr, ", ", op));
assert(str == "(+110), (+220), (+330)");
With formatting
|
|
|
|
|
|
Any format function applicable to the element type of the
range can also be applied to the
expression strf::fmt_range(/*...*/)
or
strf::fmt_separated_range(/*...*/)
.
It causes the formatting to be applied to each element.
std::vector<int> vec = { 11, 22, 33 };
auto str1 = strf::to_string("[", +strf::fmt_separated_range(vec, " ;") > 6, "]");
assert(str1 == "[ +11 ; +22 ; +33]");
int array[] = { 11, 22, 33 };
auto str2 = strf::to_string
( "["
, *strf::fmt_separated_range(array, " / ").fill('.').hex() > 6,
" ]");
assert(str2 == "[..0xfa / ..0xfb / ..0xfc]");
11. Joins
Simple joins
|
Joins enables you to group a set of input arguments as one:
auto str = strf::to_string.tr("Blah blah blah {}.", strf::join("abc", '/', 123))
assert(str == "Blah blah blah abc/123")
They can be handy to create aliases:
struct date{ int day, month, year; };
auto as_yymmdd = [](date d) {
return strf::join( strf::dec(d.year % 100).p(2), '/'
, strf::dec(d.month).p(2), '/'
, strf::dec(d.day).p(2) );
};
date d {1, 1, 1999};
auto str = strf::to_string("The day was ", as_yymmdd(d), '.');
assert(str == "The day was is 99/01/01.");
Aligned joins
You can apply any of the alignment format function on the
expression join(args...)
auto str = strf::to_string(strf::join("abc", "def", 123) > 15);
assert(str == " abcdef123);
Or use any of the expressions below:
|
|
|
|
|
|
|
where:
-
args...
are the values to be printed -
width
is a value of typestd::int16_t
-
alignment
is a value of typetext_alignment
-
ch
is a value of typechar32_t
-
split_pos
is a value of typestd::size_t
auto str = strf::to_string(strf::join_split(15, U'.', 2)("abc", "def", 123));
assert(abcdef......123);
12. Extending the library (to-do)
Adding output types
to-do
Adding printable types
to-do
Adding Facets
to-do
Adding character encodings
to-do