Serialization

Class Serialization Traits

Version
Implementation Level
Object Tracking
Export Key
Abstract
Type Information Implementation
Wrappers
Bitwise Serialization
Template Serialization Traits
Compile Time Warnings and Errors

Serialization of data depends on the type of the data. For example, for primitive types such as int, it wouldn't make sense to save a version number in the archive. Likewise, for a data type that is never serialized through a pointer, it would (almost) never make sense to track the address of objects saved to/loaded from the archive as it will never be saved/loaded more than once in any case. Details of serialization for a particular data type will vary depending on the type, the way it is used and specifications of the programmer.

One can alter the manner in which a particular data type is serialized by specifying one or more class serialization traits. It is not generally necessary for the programmer to explicitly assign traits to his classes as there are default values for all traits. If the default values are not appropriate they can be assigned by the programmer. A template is used to associate a typename with a constant. For example see version.hpp.

Version

This header file includes the following code:


namespace boost { 
namespace serialization {
template<class T>
struct version
{
    BOOST_STATIC_CONSTANT(unsigned int, value = 0);
};
} // namespace serialization
} // namespace boost

For any class T, The default definition of boost::serialization::version<T>::value is 0. If we want to assign a value of 2 as the version for class my_class we specialize the version template:


namespace boost { 
namespace serialization {
struct version<my_class>
{
    BOOST_STATIC_CONSTANT(unsigned int, value = 2);
};
} // namespace serialization
} // namespace boost

Now whenever the version number for class my_class is required, the value 2 will be returned rather than the default value of 0.

To diminish typing and enhance readability, a macro is defined so that instead of the above, we could write:


BOOST_CLASS_VERSION(my_class, 2)

which expands to the code above.

Implementation Level

In the same manner as the above, the "level" of implementation of serialization is specified. The header file level.hpp defines the following.


// names for each level
enum level_type
{
    // Don't serialize this type. An attempt to do so should
    // invoke a compile time assertion.
    not_serializable = 0,
    // write/read this type directly to the archive. In this case
    // serialization code won't be called.  This is the default
    // case for fundamental types.  It presumes a member function or
    // template in the archive class that can handle this type.
    // there is no runtime overhead associated reading/writing
    // instances of this level
    primitive_type = 1,
    // Serialize the objects of this type using the objects "serialize"
    // function or template. This permits values to be written/read
    // to/from archives but includes no class or version information. 
    object_serializable = 2,
    ///////////////////////////////////////////////////////////////////
    // once an object is serialized at one of the above levels, the
    // corresponding archives cannot be read if the implementation level
    // for the archive object is changed.  
    ///////////////////////////////////////////////////////////////////
    // Add class information to the archive.  Class information includes
    // implementation level, class version and class name if available.
    object_class_info = 3,
};

Using a macro defined in level.hpp we can specify that my_class should be serialized along with its version number:


BOOST_CLASS_IMPLEMENTATION(my_class, boost::serialization::object_class_info)

If implementation level is not explicitly assigned, the system uses a default according to the following rules.

if the data type is volatile assign not_serializable
else if it's an enum or fundamental type assign primitive_type
else assign object_class_info

That is, for most user defined types, objects will be serialized along with class version information. This will permit one to maintain backward compatibility with archives which contain previous versions. However, with this ability comes a small runtime cost. For types whose definition will "never" change, efficiency can be gained by specifying object_serializable to override the default setting of object_class_info. For example, this has been done for the binary_object wrapper

Object Tracking

Depending on the way a type is used, it may be necessary or convenient to track the address of objects saved and loaded. For example, this is generally necessary while serializing objects through a pointer in order to be sure that multiple identical objects are not created when an archive is loaded. This "tracking behavior" is controlled by the type trait defined in the header file tracking.hpp which defines the following:


// names for each tracking level
enum tracking_type
{
    // never track this type
    track_never = 0,
    // track objects of this type if the object is serialized through a 
    // pointer.
    track_selectively = 1,
    // always track this type
    track_always = 2
};

A corresponding macro is defined so that we can use:


BOOST_CLASS_TRACKING(my_class, boost::serialization::track_never)

Default tracking traits are:

For primitive, track_never.
For pointers, track_never. That is, addresses of addresses are not tracked by default.
All current serialization wrappers such as boost::serialization::nvp, track_never.
For all other types, track_selectively. That is addresses of serialized objects are tracked if and only if one or more of the following is true:
- an object of this type is anywhere in the program serialized through a pointer.
- the class is explicitly "exported" - see below.
- the class is explicitly "registered" in the archive

The default behavior is almost always the most convenient one. However, there a few cases where it would be desirable to override the default. One case is that of a virtual base class. In a diamond inheritance structure with a virtual base class, object tracking will prevent redundant save/load invocations. So here is one case where it might be convenient to override the default tracking trait. (Note: in a future version the default will be reimplemented to automatically track classes used as virtual bases). This situation is demonstrated by test_diamond.cpp included with the library.

Export Key

When serializing a derived class through a virtual base class pointer, two issues may arise.

The code in the derived class might never be explicitly referred to. Such code will never be instantiated.
This is addressed by invoking BOOST_CLASS_EXPORT_IMPLEMENT(T) in the file which defines (implements) the class T. This ensures that code for the derived class T will be explicitly instantiated.
There needs to be some sort of identifier which can be used to select the code to be invoked when the object is loaded. Standard C++ does implement typeid() which can be used to return a unique string for the class. This is not entirely satisfactory for our purposes for the following reasons:
- There is no guarantee that the string is the same across platforms. This would then fail to support portable archives.
- In using code modules from various sources, classes may have to be wrapped in different namespaces in different programs.
- There might be classes locally defined in different code modules that have the same name.
- There might be classes with different names that we want to consider equivalent for purposes of serialization.
So in the serialization library, this is addressed by invoking BOOST_CLASS_EXPORT_KEY2(my_class, "my_class_external_identifier") in the header file which declares the class. In a large majority of applications, the class name works just fine for the external identifier string so the following short cut is defined - BOOST_CLASS_EXPORT_KEY(my_class).

For programs which consist of only one module - that is programs which do not use DLLS, one can specify BOOST_CLASS_EXPORT(my_class) or BOOST_CLASS_EXPORT_GUID(my_class, "my_class_external_identifier") in either the declaration header or definition. These macros expand to invocation of both of the macros described above. (GUID stands for Globally Unique IDentifier.)

(Elsewhere in this manual, the serialization of derived classes is addressed in detail.)

The header file export.hpp contains all macro definitions described here. The library will throw a runtime exception if

A type not explicitly referred to is not exported.
Serialization code for the same type is instantiated in more than one module (or DLL).

Abstract

When serializing an object through a pointer to its base class, the library needs to determine whether or not the base is abstract (i.e. has at least one virtual function). The library uses the type trait macro BOOST_IS_ABSTRACT(T) to do this. Not all compilers support this type trait and corresponding macro. To address this, the macro


BOOST_SERIALIZATION_ASSUME_ABSTRACT(T)

has been implemented to permit one to explicitly indicate that a specified type is in fact abstract. This will guarentee that BOOST_IS_ABSTRACT will return the correct value for all compilers.

Type Information Implementation

This last trait is also related to the serialization of objects through a base class pointer. The implementation of this facility requires the ability to determine at run time the true type of the object that a base class pointer points to. Different serialization systems do this in different ways. In our system, the default method is to use the function typeid(...) which is available in systems which support RTTI (Run Time Type Information). This will be satisfactory in almost all cases and most users of this library will lose nothing in skipping this section of the manual.

However, there are some cases where the default type determination system is not convenient. Some platforms might not support RTTI or it may have been disabled in order to speed execution or for some other reason. Some applications, E.G. runtime linking of plug-in modules, can't depend on C++ RTTI to determine the true derived class. RTTI only returns the correct type for polymorphic classes - classes with at least one virtual function. If any of these situations applies, one may substitute his own implementation of extended_type_info

The interface to facilities required to implement serialization is defined in extended_type_info.hpp. Default implementation of these facilities based on typeid(...) is defined in extended_type_info_typeid.hpp. An alternative implementation based on exported class identifiers is defined in extended_type_info_no_rtti.hpp.

By invoking the macro:


BOOST_CLASS_TYPE_INFO(
    my_class, 
    extended_type_info_no_rtti<my_class>
)

we can assign the type information implementation to each class on a case by case basis. There is no requirement that all classes in a program use the same implementation of extended_type_info. This supports the concept that serialization of each class is specified "once and for all" in a header file that can be included in any project without change.

This is illustrated by the test program test_no_rtti.cpp. Other implementations are possible and might be necessary for certain special cases.

Wrappers

Archives need to treat wrappers differently from other types since, for example, they usually are non-const objects while output archives require that any serialized object (with the exception of a wrapper) be const. This header file wrapper.hpp includes the following code:


namespace boost { 
namespace serialization {
template<class T>
struct is_wrapper
 : public mpl::false_
{};
} // namespace serialization
} // namespace boost

For any class T, The default definition of boost::serialization::is_wrapper<T>::value is thus false. If we want to declare that a class my_class is a wrapper we specialize the version template:


namespace boost { 
namespace serialization {
struct is_wrapper<my_class>
 : mpl::true_
{};
} // namespace serialization
} // namespace boost

To diminish typing and enhance readability, a macro is defined so that instead of the above, we could write:


BOOST_CLASS_IS_WRAPPER(my_class)

which expands to the code above.

Bitwise Serialization

Some simple classes could be serialized just by directly copying all bits of the class. This is, in particular, the case for POD data types containing no pointer members, and which are neither versioned nor tracked. Some archives, such as non-portable binary archives can make us of this information to substantially speed up serialization. To indicate the possibility of bitwise serialization the type trait defined in the header file is_bitwise_serializable.hpp is used:


namespace boost { namespace serialization {
    template<class T>
    struct is_bitwise_serializable
     : public is_arithmetic<T>
    {};
} }

is used, and can be specialized for other classes. The specialization is made easy by the corresponding macro:


BOOST_IS_BITWISE_SERIALIZABLE(my_class)

Template Serialization Traits

In some instances it might be convenient to assign serialization traits to a whole group of classes at once. Consider, the name-value pair wrapper


template<class T>
struct nvp : public std::pair<const char *, T *>
{
    ...
};

used by XML archives to associate a name with a data variable of type T. These data types are never tracked and never versioned. So one might want to specify:


BOOST_CLASS_IMPLEMENTATION(nvp<T>, boost::serialization::level_type::object_serializable)
BOOST_CLASS_TRACKING(nvp<T>, boost::serialization::track_never)

Examination of the definition of these macros reveals that they won't expand to sensible code when used with a template argument. So rather than using the convenience macros, use the original definitions


template<class T>
struct implementation_level<nvp<T> >
{
    typedef mpl::integral_c_tag tag;
    typedef mpl::int_<object_serializable> type;
    BOOST_STATIC_CONSTANT(
        int,
        value = implementation_level::type::value
    );
};

// nvp objects are generally created on the stack and are never tracked
template<class T>
struct tracking_level<nvp<T> >
{
    typedef mpl::integral_c_tag tag;
    typedef mpl::int_<track_never> type;
    BOOST_STATIC_CONSTANT(
        int, 
        value = tracking_level::type::value
    );
};

to assign serialization traits to all classes generated by the template nvp<T>

Note that it is only possible to use the above method to assign traits to templates when using compilers which correctly support Partial Template Specialization. One's first impulse might be to do something like:


#ifndef BOOST_NO_TEMPLATE_PARTIAL_SPECIALIZATION
template<class T>
struct implementation_level<nvp<T> >
{
   ... // see above
};

// nvp objects are generally created on the stack and are never tracked
template<class T>
struct tracking_level<nvp<T> >
{
   ... // see above
};
#endif

This can be problematic when one wants to make his code and archives portable to other platforms. It means the objects will be serialized differently depending on the platform used. This implies that objects saved from one platform won't be loaded properly on another. In other words, archives won't be portable.

This problem is addressed by creating another method of assigning serialization traits to user classes. This is illustrated by the serialization for a name-value pair.

Specifically, this entails deriving the template from a special class boost::serialization::traits which is specialized for a specific combination of serialization traits. When looking up the serialization traits, the library first checks to see if this class has been used as a base class. If so, the corresponding traits are used. Otherwise, the standard defaults are used. By deriving from a serialization traits class rather than relying upon Partial Template Specializaton, one can a apply serialization traits to a template and those traits will be the same across all known platforms.

The signature for the traits template is:


template<
    class T,       
    int Level, 
    int Tracking,
    unsigned int Version = 0,
    class ETII = BOOST_SERIALIZATION_DEFAULT_TYPE_INFO(T),
    class IsWrapper = mpl::false_
>
struct traits

and template parameters should be assigned according to the following table:

parameter	description	permitted values	default value
`T`	target class	class name	none
`Level`	implementation level	`not_serializable primitive_type object_serializable object_class_info`	none
`Tracking`	tracking level	`track_never track_selectivly track_always`	none
`Version`	`class version`	unsigned integer	`0`
`ETTI`	`type_info` implementation	`extended_type_info_typeid extended_type_info_no_rtti`	default `type_info implementation`
`IsWrapper`	is the type a wrapper?	`mpl::false_ mpl::true_`	`mpl::false_`

Compile Time Warnings and Errors

Some serialization traits can conflict with other ones. Sometimes these conflicts will result in erroneous behavior (E.G. creating of archives which could not be read) and other times they represent a probable misconception on the part of the library user which could result in surprising behavior. To the extent possible, these conflicts are detected at compile time and errors (BOOST_STATIC_ASSERT) or warnings (BOOST_STATIC_WARNING) are generated. They are generated in a compiler dependent manner which should show a chain of instantiation to the point where the error/warning is detected. Without this capability, it would be very hard to track down errors or unexpected behavior in library usage. Here is a list of the conflicts trapped:

object_level - error

This error traps attempts to serialize types whose implementation level is set to non_serializable.

object_versioning - error

It's possible that for efficiency reasons, a type can be assigned a serialization level which doesn't include type information in the archive. This would preclude the assignment of a new version number to the type. This error traps attempts to assign a version number in this case. This has to be a user error.

object_tracking - warning

The following code will display a message when compiled:

T t;
ar << t;

unless the tracking_level serialization trait is set to "track_never". The following will compile without problem:

const T t
ar << t;

Likewise, the following code will trap at compile time:

T * t;
ar >> t;

if the tracking_level serialization trait is set to "track_never".

The following case illustrates the function of this message. It was originally used as an example in the mailing list by Peter Dimov.

class construct_from 
{ 
    ... 
}; 

void main(){ 
    ... 
    Y y; 
    construct_from x(y); 
    ar << x; 
}

Suppose that the above message is not displayed and the code is used as is.

this example compiles and executes fine. No tracking is done because construct_from has never been serialized through a pointer. Now some time later, the next programmer(2) comes along and makes an enhancement. He wants the archive to be sort of a log. void main(){ ... Y y; construct_from x(y); ar << x; ... x.f(); // change x in some way ... ar << x }
Again no problem. He gets two different of copies in the archive, each one is different. That is he gets exactly what he expects and is naturally delighted.
Now sometime later, a third programmer(3) sees construct_from and says - oh cool, just what I need. He writes a function in a totally disjoint module. (The project is so big, he doesn't even realize the existence of the original usage) and writes something like: class K { shared_ptr <construct_from> z; template <class Archive> void serialize(Archive & ar, const unsigned version){ ar << z; } };
He builds and runs the program and tests his new functionality. It works great and he's delighted.
Things continue smoothly as before. A month goes by and it's discovered that when loading the archives made in the last month (reading the log). Things don't work. The second log entry is always the same as the first. After a series of very long and increasingly acrimonious email exchanges, it's discovered that programmer(3) accidentally broke programmer(2)'s code .This is because by serializing via a pointer, the "log" object is now being tracked. This is because the default tracking behavior is "track_selectively". This means that class instances are tracked only if they are serialized through pointers anywhere in the program. Now multiple saves from the same address result in only the first one being written to the archive. Subsequent saves only add the address - even though the data might have been changed. When it comes time to load the data, all instances of the log record show the same data. In this way, the behavior of a functioning piece of code is changed due the side effect of a change in an otherwise disjoint module. Worse yet, the data has been lost and cannot be recovered from the archives. People are really upset and disappointed with boost (at least the serialization system).
After a lot of investigation, it's discovered what the source of the problem is and class construct_from is marked "track_never" by including: BOOST_CLASS_TRACKING(construct_from, track_never)
Now everything works again. Or - so it seems.
shared_ptr<construct_from> is not going to have a single raw pointer shared amongst the instances. Each loaded shared_ptr<construct_from> is going to have its own distinct raw pointer. This will break shared_ptr and cause a memory leak. Again, The cause of this problem is very far removed from the point of discovery. It could well be that the problem is not even discovered until after the archives are loaded. Now we not only have a difficult to find and fix program bug, but we have a bunch of invalid archives and lost data.

Now consider what happens when the message is displayed:

Right away, the program traps at ar << x;
The programmer curses (another %^&*&* hoop to jump through). He's in a hurry (and who isn't) and would prefer not to const_cast - because it looks bad. So he'll just make the following change an move on. Y y; const construct_from x(y); ar << x;
Things work fine and he moves on.
Now programmer (2) wants to make his change - and again another annoying const issue; Y y; const construct_from x(y); ... x.f(); // change x in some way ; compile error f() is not const ... ar << x
He's mildly annoyed now he tries the following:
- He considers making f() a const - but presumably that shifts the const error to somewhere else. And he doesn't want to fiddle with "his" code to work around a quirk in the serializaition system
- He removes the const from const construct_from above - damn now he gets the trap. If he looks at the comment code where the BOOST_STATIC_ASSERT occurs, he'll do one of two things
  1. This is just crazy. Its making my life needlessly difficult and flagging code that is just fine. So I'll fix this with a const_cast and fire off a complaint to the list and maybe they will fix it. In this case, the story branches off to the previous scenario.
  2. Oh, this trap is suggesting that the default serialization isn't really what I want. Of course in this particular program it doesn't matter. But then the code in the trap can't really evaluate code in other modules (which might not even be written yet). OK, I'll add the following to my construct_from.hpp to solve the problem. BOOST_CLASS_TRACKING(construct_from, track_never)
Now programmer (3) comes along and make his change. The behavior of the original (and distant module) remains unchanged because the construct_from trait has been set to "track_never" so he should always get copies and the log should be what we expect.
But now he gets another trap - trying to save an object of a class marked "track_never" through a pointer. So he goes back to construct_from.hpp and comments out the BOOST_CLASS_TRACKING that was inserted. Now the second trap is avoided, But damn - the first trap is popping up again. Eventually, after some code restructuring, the differing requirements of serializating construct_from are reconciled.

Note that in this second scenario

all errors are trapped at compile time.
no invalid archives are created.
no data is lost.
no runtime errors occur.

It's true that these messages may sometimes flag code that is currently correct and that this may be annoying to some programmers. However, this example illustrates my view that these messages are useful and that any such annoyance is a small price to pay to avoid particularly vexing programming errors.

pointer_level - warning

This trap addresses the following situation when serializing a pointer:

A type doesn't save class information in the archive. That is, the serialization trait implementation level <= object_serializable.
Tracking for this type is set to "track selectively" in this case, indication that an object is tracked is not stored in the archive itself - see level == object_serializable. Since class information is not saved in the archive, the existence or absence of the operation ar << T * anywhere else in the program is used to infer that an object of this type should be tracked.
A problem arises when a program which reads an archive includes the operation ar >> T * so that tracking information will be included in the archive. When a program which creates the archive doesn't include ar << T it is presumed that the archive doesn't include tracking information and the archive will fail to load. Also the reverse situation could trigger a similar problem.
Though this situation is unlikely for several reasons, it is possible - hence this warning.

So if your program traps here, consider changing the tracking or implementation level traits - or not serializing via a pointer.

pointer_tracking - warning

Serializing an object of a type marked "track_never" through a pointer could result in creating more objects than were saved! There are cases in which a user might really want to do this so we leave it as a warning.

const_loading - error

One cannot load data into a "const" object unless it's a wrapper around some other non-const object.

© Copyright Robert Ramey 2002-2004 and Matthias Troyer 2006. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)