Boost C++ Libraries Home Libraries People FAQ More

Next

Chapter 1. Boost.Stringify

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

Table of Contents

Overview
A tour of the library
Why to use Boost.Stringify
Using it as a static library
Input types
Simple input types
Joins
Facets
Range
How to add a new destination type
How to add a new input type
Example 1: IPv4 address
Example 2: IPv6 address
Example 3: Base64
API Reference (todo)
Class template output_buffer
Concept OutputBuffer
Abstract class template printer
Customization point: function template make_printer
Class template value_with_format (to-do)
Concept Formatter
Customization point: function template fmt
Class template dispatcher
Tr-string
Exception types
Output Types
Class template facets_pack
Encoding conversion (To-do)
Width calculation (to-do)
Numeric punctuation
Benchmarks
Run-time performance
Compilation times and generated binary size

Boost.Stringify is a fast, highly extensible, locale-independent formatting library.

[Warning] Warning

This library is not part of the Boost C++ Libraries yet, and is still subject to change without preserving backwards compatibility.

Most of the code snippets in this documentation use to_string. Not the std::to_string from the standard library, but one that you can think as a generalization with variadic arguments. For example:

#include <boost/stringify.hpp> // This is the only header you need to include.

namespace strf = boost::stringify::v0; // Everything is inside this namespace.
                                       // v0 is an inline namespace.
void sample()
{
    int value = 255;
    auto s = strf::to_string(value, " in hexadecimal is ", strf::hex(value));
    BOOST_ASSERT(s == "255 in hexadecimal is ff");
}

You can see that this library uses format functions — like hex above — to specify formatting. Actually, hex is just a syntatic sugar. hex(x) simply returns fmt(x).hex(), and the return of fmt(x) is an object that contains the value of x as well as member functions that change the formatting. These format functions can be called in a row like this: strf::hex(255).p(4).fill(U'.') > 10.

Now, some other examples:

namespace strf = boost::stringify::v0;

// more formatting:  operator>(int width) : align to rigth
//                   operator~()          : show base
//                   p(int)               : set precision
auto s = strf::to_string( "---"
                        , ~strf::hex(255).p(4).fill(U'.') > 10
                        , "---" );
BOOST_ASSERT(s == "---....0x00ff---");

//
// ranges
//
int array[] = {20, 30, 40};
const char* separator = " / ";
s = strf::to_string( "--[", strf::range(array, separator), "]--");
BOOST_ASSERT(s == "--[20 / 30 / 40]--");

//
// range with formatting
//
s = strf::to_string( "--["
                   , ~strf::hex(strf::range(array, separator)).p(4)
                   , "]--");
BOOST_ASSERT(s == "--[0x0014 / 0x001e / 0x0028]--");

// or

s = strf::to_string( "--["
                   , ~strf::fmt_range(array, separator).hex().p(4)
                   , "]--");
BOOST_ASSERT(s == "--[0x0014 / 0x001e / 0x0028]--");

//
// join: align a group of argument as one:
//
int value = 255;
s = strf::to_string( "---"
                   , strf::join_center(30, U'.')( value
                                                , " in hexadecimal is "
                                                , strf::hex(value) )
                   , "---" );
BOOST_ASSERT(s == "---...255 in hexadecimal is ff...---");


// joins can contain any type of argument, including ranges and other joins
s = strf::to_string( strf::join_right(30, U'.')
                       ( "{"
                       , strf::join_center(20)( "["
                                              , strf::range(array, ", ")
                                              , "]" )
                       , "}" ));
BOOST_ASSERT(s == "........{    [10, 20, 30]    }");

Another difference of this to_string from std::to_string is that it's not a function, but an object ( actually a constexpr ). It has member functions that are invoked as in the named parameter idiom following this syntax:

The leading expression is the part of the syntax that varies according to the destination type. Hence, to_string is a leading expression.

Destination type

Leading expression

1

std::string

to_string

2

std::u8string(C++20)

to_u8string(C++20)

3

std::u16string

to_u16string

4

std::u32string

to_u32string

5

std::wstring

to_wstring

6

std::basic_string <CharT, Traits, Alloc>

to_basic_string<CharT, Traits, Alloc>

7

std::basic_string <CharT, Traits, Alloc>&

append(destination)

8

std::basic_string <CharT, Traits, Alloc>&

assign(destination)

9

CharT*

write(destination, limit)

10

CharT*

write(destination, end)

11

CharT[size]

write(destination)

12

FILE* ( using narrow functions)

write<CharT>(destination, count_ptr)

13

FILE* ( using wide functions)

wwrite(destination, count_ptr)

14

std::basic_streambuf <CharT, Traits>&

write(destination, count_ptr)

Where:

  • CharT can be char, char16_t, char32_t or wchar_t
  • end is CharT*
  • limit is std::size_t. It determines the maximum number of characters that can be written, including the termination character.
  • count_ptr is optional. The number of successfully written characters is assigned to *count_ptr. Its type is std::streamsize* when destination is std::basic_streambuf <CharT, Traits> and std::size_t* otherwise.
  • 8) Alloc is optional. If omitted, it is std::allocator<CharT>
  • 8-11) Traits is optional. If omitted, it is std::char_traits<CharT>
  • 12) CharT is optional. If omitted, it is char
  • 12 and 13) std::fflush is not called

The following member functions only affect some output types ( leading expressions 1 to 8 and 15 and 16 ) in the previous table, or when you create the reserve method when defining your own output type ).

If none of these function is called, then no reservation is done.

namespace strf = boost::stringify::v0;  // v0 is an inline namespace

auto str = strf::to_string.reserve(5000)("blah", "blah");

BOOST_ASSERT(str == "blahblah");
BOOST_ASSERT(str.capacity() >= 5000);

The tr-string is what in other formatting libraries would be called as the format string, with the difference that it does not specify any formatting. Its purpose is to enable the use of translation tools like gettext.

auto s = strf::to_string.tr("{} in hexadecimal is {}", x, strf::hex(x));

You can customize how the library handles parsing errors in the with the tr_invalid_arg enumeration.

These are not the same as the facets handled by std::locale, but they are analogous. Keep in mind that this is a locale-independent library. So things are always printed as in the C-locale, unless you explicitly specify otherwise, which you do by passing facet objects to the facets function. For example, to customize numeric punctuation:

namespace strf = boost::stringify::v0;
constexpr int base = 10;
auto punct = strf::str_grouping<base>{"\4\3\2"}.thousands_sep(U'.');
auto s = strf::to_string
    .facets(punct)
    ("one hundred billions = ", 100000000000ll);

BOOST_ASSERT(s == "one hundred billions = 1.00.00.000.0000");

Every facet belongs to a facet category. Each facet category corresponds to a concept, i.e. a set of requirements that a class must satisfy. A class that satisfies such requirements is a facet of such facet category. Moreover, for each facet category there is class whose name, by convention, has a "_c" suffix and is the name of the category.

For example, monotonic_grouping<10> and str_grouping<10> are both facets of the category numpunct_c<10>. Both have the same purpose: to customize numeric punctuation.

facet category

constrainable

what it controls

where it is used

width_calculation_c

yes

how the width is calculated

joins and all conventional types

numpunct_c<10>

yes

numeric punctuation for decimal base

integers and floating points numbers

numpunct_c<16>

yes

numeric punctuation for hexadecimal base

integers and floating points numbers

numpunct_c<8>

yes

numeric punctuation for octal base

integers numbers

encoding_c

no

encoding

encoding_error_c

yes

encoding error handling

surrogate_policy_c

yes

surrogates allowance

tr_string_error_c

no

Tr-string parsing error handling

In tr-strings

Constrained facets

With the constrain function template you can create constrained facets, which are facets that only apply to certain input types. Its template parameter is a template type that contains a value member constexpr convertible to bool that tells whether a given input type is under the influence of the given facet:

namespace strf = boost::stringify::v0;

auto facet_obj = strf::constrain<std::is_signed>(strf::monotonic_grouping<10>{3});

auto s = strf::to_string.facets(facet_obj)(100000u, "  ", 100000);

BOOST_ASSERT(s == "100000  100,000");

The library provides some type traits of this kind:

trait

description

is_int_number

matches short, int, long, long long and the corresponding unsigned types

is_char

matches char, char8_t, wchar_t, char16_t, and char32_t

is_string

matches strings inputs

Overriding facets

If there are two or more facets object passed to the facets function of the same category, and that apply to the same input type, then the last one wins:

namespace strf = boost::stringify::v0;

auto punct_dec_1 = strf::monotonic_grouping<10>{1};
auto punct_dec_2 = strf::monotonic_grouping<10>{2}.thousands_sep('.');
auto punct_dec_3 = strf::monotonic_grouping<10>{3}.thousands_sep('^');;

// Below, punct_dec_3 overrides punct_dec_2, but only for signed types.
// punct_dec_2 overrides punct_dec_1 for all input types,
// hence the presence of punt_dec_1 bellow has no effect.

auto s = strf::to_string
    .facets( punct_dec_1
           , punct_dec_2
           , strf::constrain<std::is_signed>(punct_dec_3) )
    ( 100000, "  ", 100000u ) ;

BOOST_ASSERT(s == "100^000  10.00.00");

Did you notice that the thousand separator is specified as a char32_t instead of char? That means you can use any Unicode code point, as long as it is supported by the output encoding, which by default is UTF-8 for char. But you can specify other encodings with facets.

namespace strf = boost::stringify::v0;

// Writting in Windows-1252
auto s = strf::to_string
    .facets(strf::windows_1252<char>())
    .facets(strf::str_grouping<10>{"\4\3\2"}.thousands_sep(0x2022))
    ("one hundred billions = ", 100000000000ll);

// The character U+2022 is encoded as '\225' in Windows-1252
BOOST_ASSERT(s == "one hundred billions = 1\2250000\225000\2250000");

It is also possible to specify the encoding of each input string individually:

// Three input string. Each one in its own character set
namespace strf = boost::stringify::v0;
auto s = strf::to_u8string( strf::cv("\x80\xA4 -- ", strf::iso_8859_1<char>())
                          , strf::cv("\x80\xA4 -- ", strf::iso_8859_15<char>())
                          , strf::cv("\x80\xA4", strf::windows_1252<char>()) );

// The output by default is in UTF-8
BOOST_ASSERT(s == u8"\u0080\u00A4 -- \u0080\u20AC -- \u20AC\u00A4");

As well as converting a string of different character types:

namespace strf = boost::stringify::v0;
auto str = strf::to_string( strf::cv(u"aaa-")
                          , strf::cv(U"bbb-")
                          , strf::cv(L"ccc") );
BOOST_ASSERT(str ==  "aaa-bbb-ccc");

Furthermore, if you don't want to convert any encoding but just to sanitize an input string, you can use still the cv function, since it always implies sanitization, even when the input and output encodings are the same:

// sanitize UTF-8 input
namespace strf = boost::stringify::v0;
auto s = strf::to_u8string(strf::cv("a b c \xFF d e"));
BOOST_ASSERT(s == u8"a b c \uFFFD d e");

The library provides three options of how to handle encoding errors. The default one is to replace any invalid input sequence by the replacement character "�" (U+FFFD). When the input is UTF-8, the library follows the practice recommended by the Unicode Standard. [1] The second option is to omit the invalid sequence, and the third one is to stop everything and emit an error, i.e., to return an error code or throw an exception. You can also choose whether surrogates are treated as errors or not (as explained here).

The current list of supported encodings is small, but is expected to grow:

function

description

template<typename CharT> encoding<CharT> utf8();

UTF-8 encoding. Default for char and char8_t.

template<typename CharT> encoding<CharT> iso_8859_1();

ISO/IEC 8859-1 encoding

template<typename CharT> encoding<CharT> iso_8859_15();

ISO/IEC 8859-15 encoding

template<typename CharT> encoding<CharT> windows_1252();

Windows-1252 encoding

template<typename CharT> encoding<CharT> utf16();

The UTF-16 encoding. Default for char16_t.

template<typename CharT> encoding<CharT> utf32();

UTF-32 encoding. Default for char32_t.

encoding<wchar_t> utfw();

UTF-16 if sizeof(wchar_t) == 2 and UTF-32 if sizeof(wchar_t) == 4. Default for wchar_t.

The facets of the numpunct_c<Base> category specify the decimal point, the group separator character, and the groups' size when printing numbers in the numeric base Base. The library currently provides two facets belonging to this category for you to choose. If all groups have the same size, then you should choose the monotonic_grouping<Base> facet since it is optimized for this situation:

namespace strf = boost::stringify::v0;
constexpr int base = 10;

auto str = strf::to_string
    .facets(strf::monotonic_grouping<base>{3}.thousands_sep(U'.'))
    (100000000000ll);

BOOST_ASSERT(str == "100.000.000.000");

Otherwise, you can use the str_grouping<Base>:

namespace strf = boost::stringify::v0;
constexpr int base = 10;

auto punct = strf::str_grouping<base>{"\4\3\2"};
auto str = strf::to_string.facets(punct)(100000000000ll);
BOOST_ASSERT(str == "1,00,00,000,0000");

Almost all formatting libraries provide a way to specify width and alignment. But they assume that the width of a string equals its size:

// Prints three pading spaces less than you'd like
printf("Full name: %80s\n", u8"Frédéric François Chopin");

Now, Boost.Stringify provides a facet ( width_calculator ) that enables you to choose more accurate algorithms. Three options are available:

  • Width as string size. This is the default:

    namespace strf = boost::stringify::v0;
    
    auto str = strf::to_u8string
        .facets(strf::width_as_len())
        (strf::right(u8"áéíóú", 12, U'.'));
    
    BOOST_ASSERT(str == u8"..áéíóú");
    
  • Width as the number of codepoints:

    namespace strf = boost::stringify::v0;
    
    auto str = strf::to_u8string
        .facets(strf::width_as_u32len())
        (strf::right(u8"áéíóú", 12, U'.'));
    
    BOOST_ASSERT(str == u8".......áéíóú");
    
  • You implement a function that calculates and sum the width of the codepoints:

    auto my_width_calculator =
        [] (int limit, const char32_t* it, const char32_t* end)
    {
        int sum = 0;
        for (; sum < limit && it != end; ++it)
        {
            auto ch = *it;
            sum += ((0x2E80 <= ch && ch <= 0x9FFF) ? 2 : 1);
        }
        return sum;
    };
    
    auto str = strf::to_u8string
        .facets(strf::width_as(my_width_calculator))
        (strf::right(u8"今晩は", 10, U'.'));
    
    BOOST_ASSERT(str == u8"....今晩は");
    

    Unfortunately, you can't be 100% accurate. Even assuming the output is displayed in a monospace font, such font is unlikely to support wider characters so they are displayed instead in another font, with a width that you usually can't predict.

If you have two or more facet values that you use often, you may consider storing them into facets_pack object and reuse it instead of retyping them every time. The facets_pack has some similarities to std::tuple, and there is the pack function template that create facets_pack, analogous as to how std::make_tuple creates std::tuples;

template <typename ... Facets>
facets_pack<Facets> pack(const Facets& ... facets);

Passing a facets_pack object to the facets member function is equivalent to pass all the facet objects it contains. For example, to_string.facets(f1, pack(f2, f3), f4)(args ...) is equivalent to to_string.facets(f1, f2, f3, f4). Moreover, f1, f2, f3 and f4 don't have to be facet objects. Actually, anything that can be passed to facets member function can be stored in a facets_pack, and vice-versa.

[Note] Note

Facets are usually stored by copy in facets_pack. You can avoid this with std::reference_wrapper. However all facets are designed to provide a fast copy, with the exception perhaps of str_grouping that contains internally a std::string.

Another thing you can do is to create new leading expressions based on the existing ones:

namespace my { // my customizations

namespace strf = boost::stringify::v0;

const auto my_default_facets = strf::pack
    ( strf::monotonic_grouping<10>(3)
    , strf::monotonic_grouping<16>(4).thousands_sep(U'\'')
    , strf::width_as_u32len()
    , strf::surrogate_policy::lax
    , strf::encoding_error::stop );

const auto to_string = strf::to_string.facets(my_default_facets);

template <typename Str>
inline auto append(Str& str)
{
    return strf::append(str).facets(my_default_facets);
}

template <typename ... Args>
inline decltype(auto) write(Args&& ... args)
{
    return strf::write(std::forward<Args>(args)...).facets(my_default_facets);
}

} // namespace my

void using_my_customizations()
{
    namespace strf = boost::stringify::v0;

    int x = 100000000;
    auto str = my::to_string(x);
    BOOST_ASSERT(str == "100,000,000");

    my::append(str) (" in hexadecimal is ", ~strf::hex(x));
    BOOST_ASSERT(str == "100,000,000 in hexadecimal is 0x5f5'e100");

    char buff[500];
    my::write(buff)(x, " in hexadecimal is ", ~strf::hex(x));
    BOOST_ASSERT(str == buff);

    // Overriding numpunct_c<16> back to default:
    str = my::to_string
        .facets(strf::no_grouping<16>())
        (x, " in hexadecimal is ", ~strf::hex(x));
    BOOST_ASSERT(str == "100,000,000 in hexadecimal is 0x5f5e100");
}

Performance

See the benchmarks.


Extensibility

Boost.Stringify allows you not only to add input types but also output types. Qt users, for instance, might be interested in a to_string-equivalent that creates a QString instead of an std::string. Take a look at this implementation to see how easy this is. Or perhaps you might want to implement an alternative way to write to files based on lower level system functions instead of fprintf in order to get better performance.

On the other hand, if your interest is to add a new input type, know that you can also provide formatting options for this new type. There is at least one that is probably desirable, which is alignment (a.k.a. justification). This one is particularly easy to implement in Boost.Stringify ( and difficult in others ). See the examples.

And sometimes what you want is not exactly adding an input type, but to implement some kind of solution that generates textual content, like, for example, a converter of binary data to Base64. If you implement it as an extension of Boost.Stringify, as demonstrated here, then it automatically gains support to the full range of output types available in the library.


Internationalization

It is not because this is a locale-independent library that you cannot customize numeric punctuation, since you can do that with facets. And it is actually a good thing that don't need to change the global locale to achieve that.

Also, Boost.Stringify is probably the most suitable library to be used with translation tools like gettext , for the following reasons:

Tr-string translation hints

The tr-string syntax allows you to insert comments intended to help the person who translates it.

Formatting decoupled from translation

Because formatting is not in the tr-string, but in the format functions, you can change the formatting without changing each translation of the tr-string.

Catch formatting errors at compile-time

You may not like the verbosity of format functions, which is a legitime dislike. However, you should also dislike the error-proneness of format strings. Although mistakes in the formatting string can be caught at compile-time in some cases — as warnings when using printf in some compilers, and as compilation errors notably in {fmt} — this doesn't work if the format string is returned by a function like gettext. Then printf has undefined behaviour, and other libraries may throw exceptions which is still not an adequate solution. A run-time error is only acceptable when it's not the program's fault; otherwise it's a bug. Format functions, on the other hand, always give you a compilation error. It's true that mistakes can also happen in the tr-string but the probability is much lower. Besides, you can customize how such errors are handled.


Some capabilities not present in most of the other formatting libraries


Boost.Stringify is prepared to be used as a header-only library by default. But it can also be used as a static library, in that case the source that use the library must define the macro BOOST_STRINGIFY_SEPARATE_COMPILATION. In case you don't want to use the included Boost.Build or CMake project, it will problably be easy for you to find some other way build the library since there is only one source file ( build/stringify.cpp ) to compile.



[1] Search for "Best Practices for Using U+FFFD" in Chapter 3 of Unicode Standard.

Last revised: June 03, 2019 at 22:20:25 GMT


Next