|
SerializationSpecial Considerations |
This could cause problems in programs where the copies of different objects are saved from the same address.
template<class Archive>
void save(boost::basic_oarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
A x = a[i];
ar << x;
}
}
In this case, the data to be saved exists on the stack. Each iteration
of the loop updates the value on the stack. So although the data changes
each iteration, the address of the data doesn't. If a[i] is an array of
objects being tracked by memory address, the library will skip storing
objects after the first as it will be assumed that objects at the same address
are really the same object.
To help detect such cases, output archive operators expect to be passed
const
reference arguments.
Given this, the above code will invoke a compile time assertion. The obvious fix in this example is to use
template<class Archive>
void save(boost::basic_oarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
ar << a[i];
}
}
which will compile and run without problem.
The usage of const
by the output archive operators
will ensure that the process of serialization doesn't
change the state of the objects being serialized. An attempt to do this
would constitute augmentation of the concept of saving of state with
some sort of non-obvious side effect. This would almost surely be a mistake
and a likely source of very subtle bugs.
Unfortunately, implementation issues currently prevent the detection of this kind of error when the data item is wrapped as a name-value pair.
A similar problem can occur when different objects are loaded to an address which is different from the final location:
template<class Archive>
void load(boost::basic_oarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
A x;
ar >> x;
std::m_set.insert(x);
}
}
In this case, the address of x
is the one that is tracked rather than
the address of the new item added to the set. Left unaddressed
this will break the features that depend on tracking such as loading an object through a pointer.
Subtle bugs will be introduced into the program. This can be
addressed by altering the above code thusly:
template<class Archive>
void load(boost::basic_iarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
A x;
ar >> x;
std::pair<std::set::const_iterator, bool> result;
result = std::m_set.insert(x);
ar.reset_object_address(& (*result.first), &x);
}
}
This will adjust the tracking information to reflect the final resting place of
the moved variable and thereby rectify the above problem.
If it is known a priori that no pointer values are duplicated, overhead associated with object tracking can be eliminated by setting the object tracking class serialization trait appropriately.
By default, data types designated primitive by the
Implementation Level
class serialization trait are never tracked. If it is desired to
track a shared primitive object through a pointer (e.g. a
long
used as a reference count), It should be wrapped
in a class/struct so that it is an identifiable type.
The alternative of changing the implementation level of a long
would affect all long
s serialized in the whole
program - probably not what one would intend.
It is possible that we may want to track addresses even though
the object is never serialized through a pointer. For example,
a virtual base class need be saved/loaded only once. By setting
this serialization trait to track_always
, we can suppress
redundant save/load operations.
BOOST_CLASS_TRACKING(my_virtual_base_class, boost::serialization::track_always)
To implement this facility, one declares a helper object associated to the current archive that can be used to store contextual information relevant to the particular type serialization algorithm.
template<class T>
class shared_ptr
{
...
};
BOOST_SERIALIZATION_SPLIT_FREE(shared_ptr)
class shared_ptr_serialization_helper
{
// table of previously loaded shared_ptr
// lookup a shared_ptr from the object address
shared_ptr<T> lookup(const T *);
// insert a new shared_ptr
void insert<shared_ptr<T> >(const shared_ptr<T> *);
};
namespace boost {
namespace serialization {
template<class Archive>
void save(Archive & ar, const shared_ptr & x, const unsigned int /* version */)
{
// save shared ptr
...
}
template<class Archive>
void load(Archive & ar, shared_ptr & x, const unsigned int /* version */)
{
// get a unique identifier. Using a constant means that all shared pointers
// are held in the same set. Thus we detect handle multiple pointers to the
// same value instances in the archive.
const void * shared_ptr_helper_id = 0;
shared_ptr_serialization_helper & hlp =
ar.template get_helper<shared_ptr_serialization_helper>(helper_instance_id);
// load shared pointer object
...
shared_ptr_serialization_helper & hlp =
ar.template get_helper<shared_ptr_serialization_helper>(shared_ptr_helper_id);
// look up object in helper object
T * shared_object hlp.lookup(...);
// if found, return the one from the table
// load the shared_ptr data
shared_ptr<T> sp = ...
// and add it to the table
hlp.insert(sp);
// implement shared_ptr_serialization_helper load algorithm with the aid of hlp
}
} // namespace serialization
} // namespace boost
get_helper<shared_ptr_serialization_helper>();
creates a helper object associated to the archive the first time it is invoked;
subsequent invocations return a reference to the object created in the first
place, so that hlp
can effectively be
used to store contextual information persisting through the serialization
of different complex_type
objects on
the same archive.
Helpers may be created for saving and loading archives. The same program might have several different helpers or the same helper instantiated separately from different parts of the program. This is what makes the helper_instance_id necessary. In principle it could be any unique integer. In practice it seems easiest to use the address of the serialization function which contains it. The above example uses this technique.
boost::serialization::object_serializable
.
Turning off tracking and class information serialization will result in pure template inline code that in principle could be optimised down to a simple stream write/read. Elimination of all serialization overhead in this manner comes at a cost. Once archives are released to users, the class serialization traits cannot be changed without invalidating the old archives. Including the class information in the archive assures us that they will be readable in the future even if the class definition is revised. A light weight structure such as a display pixel might be declared in a header like this:
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/level.hpp>
#include <boost/serialization/tracking.hpp>
// a pixel is a light weight struct which is used in great numbers.
struct pixel
{
unsigned char red, green, blue;
template<class Archive>
void serialize(Archive & ar, const unsigned int /* version */){
ar << red << green << blue;
}
};
// elminate serialization overhead at the cost of
// never being able to increase the version.
BOOST_CLASS_IMPLEMENTATION(pixel, boost::serialization::object_serializable);
// eliminate object tracking (even if serialized through a pointer)
// at the risk of a programming error creating duplicate objects.
BOOST_CLASS_TRACKING(pixel, boost::serialization::track_never)
wchar_t
while other compilers reserve only 2 bytes.
So it's possible that a value could be written that couldn't be represented by the loading program. This is a
fairly obvious situation and easily handled by using the numeric types in
<boost/cstdint.hpp>
A special integral type is std::size_t
which is a typedef
of an integral types guaranteed to be large enough
to hold the size of any collection, but its actual size can differ depending
on the platform. The
collection_size_type
wrapper exists to enable a portable serialization of collection sizes by an archive.
Recommended choices for a portable serialization of collection sizes are to
use either 64-bit or variable length integer representation.
template<class T>
struct my_wrapper {
template<class Archive>
Archive & serialize ...
};
...
class my_class {
wchar_t a;
short unsigned b;
template<class Archive>
Archive & serialize(Archive & ar, unsigned int version){
ar & my_wrapper(a);
ar & my_wrapper(b);
}
};
If my_wrapper
uses default serialization
traits there could be a problem. With the default traits, each time a new type is
added to the archive, bookkeeping information is added. So in this example, the
archive would include such bookkeeping information for
my_wrapper<wchar_t>
and for
my_wrapper<short_unsigned>
.
Or would it? What about compilers that treat
wchar_t
as a
synonym for unsigned short
?
In this case there is only one distinct type - not two. If archives are passed between
programs with compilers that differ in their treatment
of wchar_t
the load operation will fail
in a catastrophic way.
One remedy for this is to assign serialization traits to the template
my_template
such that class
information for instantiations of this template is never serialized. This
process is described above and
has been used for Name-Value Pairs.
Wrappers would typically be assigned such traits.
Another way to avoid this problem is to assign serialization traits
to all specializations of the template my_wrapper
for all primitive types so that class information is never saved. This is what has
been done for our implementation of serializations for STL collections.
ios::binary
. If this is not done, the archive generated
will be unreadable.
Unfortunately, no way has been found to detect this error before loading the archive. Debug builds will assert when this is detected so that may be helpful in catching this error.
BOOST_CLASS_EXPORT
.
Export implies two things:
BOOST_CLASS_EXPORT
in the same
source module that includes any of the archive class headers will
instantiate code required to serialize polymorphic pointers of
the indicated type to the all those archive classes. If no
archive class headers are included, then no code will be instantiated.
Note that the implemenation of this functionality requires
that the BOOST_CLASS_EXPORT
macro appear after the inclusion of any archive
class headers for which code is to be instantiated.
So, code that uses BOOST_CLASS_EXPORT
will look like the following:
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
... // other archives
#include "a.hpp" // header declaration for class a
BOOST_CLASS_EXPORT(a)
... // other class headers and exports
This will be true regardless of whether the code is part
of a stand alone executable, a static library or
a dynamic or shared library.
Including
BOOST_CLASS_EXPORT
in the "a.hpp" header itself as one would do with
other serialization traits will make it difficult
or impossible to follow the rule above regarding
inclusion of archive headers before
BOOST_CLASS_EXPORT
is invoked. This can best be addressed by using
BOOST_CLASS_EXPORT_KEY
in the header declarations and
BOOST_CLASS_EXPORT_IMPLEMENT
in the class definition file.
This system has certain implications for placing code in static or shared
libraries. Placing BOOST_CLASS_EXPORT
in library code will have no effect unless archive class headers are
also included. So when building a library, one should include all headers
for all the archive classes which he anticipates using. Alternatively,
one can include headers for just the
Polymorphic Archives.
Strictly speaking, export should not be necessary if all pointer serialization occurs through the most derived class. However, in order to detect what would be a catastrophic error, the library traps ALL serializations through a pointer to a polymorphic class which are not exported or otherwise registered. So, in practice, be prepared to register or export all classes with one or more virtual functions which are serialized through a pointer.
Note that the implementation of this functionality depends upon vendor specific extensions to the C++ language. So, there is no guaranteed portability of programs which use this facility. However, all C++ compilers which are tested with boost provide the required extensions. The library includes the extra declarations required by each of these compilers. It's reasonable to expect that future C++ compilers will support these extensions or something equivalent.
BOOST_CLASS_EXPORT_KEY
in headers.
BOOST_CLASS_EXPORT_IMPLEMENT
in definitions compiled in the library. For any particular type,
there should be only one file which contains
BOOST_CLASS_EXPORT_IMPLEMENT
for that type. This ensures that only one copy
of serialization code will exist within the program. It avoids
wasted space and the possibility of having different
versions of the serialization code in the same program.
Including
BOOST_CLASS_EXPORT_IMPLEMENT
in multiple files could result in a failure
to link due to duplicated symbols or the throwing
of a runtime exception.
demo_pimpl.cpp
,
demo_pimpl_A.cpp
and
demo_pimpl_A.hpp
where implementation of serialization is in a static library
completely separate from the main program.
test_dll_simple
,
and
dll_a.cpp
where implementation of serialization is also completely separate
from the main program but the code is loaded at runtime. In this
example, this code is loaded automatically when the program which
uses it starts up, but it could just as well be loaded and unloaded
with an OS dependent API call.
Also included are
test_dll_exported.cpp
,
and
polymorphic_derived2.cpp
which are similar to the above but include tests of the export
and no_rtti facilities in the context of DLLS.
For best results, write your code to conform to the following guidelines:
inline
code in classes used in DLLS.
This will generate duplicate code in the DLLS and mainline. This
needlessly duplicates code. Worse, it makes is possible for
different versions of the same code to exist simultaneously. This
type of error turns out to be excruciatingly difficult to debug.
Finally, it opens the possibility that a module being referred to
might be explicitly unloaded which would (hopefully) result in
a runtime error. This is another bug that is not always
reproducible or easy to find. For class member templates use something like
template<class Archive>
void serialize(Archive & ar, const unsigned int version);
in the header, and
template<class Archive>
void myclass::serialize(Archive & ar, const unsigned int version){
...
}
BOOST_CLASS_EXPORT_IMPLEMENT(my_class)
#include <boost/archive/text_oarchive>
#include <boost/archive/text_iarchive>
template myclass::serialize(boost::archive::text_oarchive & ar, const unsigned int version);
template myclass::serialize(boost::archive::text_iarchive & ar, const unsigned int version);
... // repeat for each archive class to be used.
in the implementation file. This will result in generation of all code
required in only one place. The library does not detect this type of error for you.
dlopen
in *nix or
LoadLibrary
in Windows). Try to arrange that they are unloaded in the reverse
sequence. This should guarantee that problems are avoided even if the
above guideline hasn't been followed.
extended_type_info
for associating classes with external identifying strings (GUID)
and void_cast
for casting between pointers of related types.
To complete the functionality of
extended_type_info
the ability to construct and destroy corresponding types has been
added. In order to use this functionality, one must specify
how each type is created. This should be done at the time
a class is exported. So, a more complete example of the code above would be:
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
... // other archives
#include "a.hpp" // header declaration for class a
// this class has a default constructor
BOOST_SERIALIZATION_FACTORY_0(a)
// as well as one that takes one integer argument
BOOST_SERIALIZATION_FACTORY_1(a, int)
// specify the GUID for this class
BOOST_CLASS_EXPORT(a)
... // other class headers and exports
With this in place, one can construct, serialize and destroy a class
about which is known only the GUID and a base class.
However, Writing/Reading different archives simultaneously
in different tasks is permitted as each archive instance is (almost)
completely independent from any other archive instance. The only shared
information is some type tables which have been implemented using a
lock-free thread-safe
singleton
described elsewhere in this documentation.
This singleton implementation guarantees that all of this shared information is initialized when the code module which contains it is loaded. The serialization library takes care to ensure that these data structures are not subsequently modified. The only time there could be a problem would be if code is loaded/unloaded while another task is serializing data. This could only occur for types whose serialization is implemented in a dynamically loaded/unloaded DLL or shared library. So if the following is avoided:
array
wrapper.
Serialization functions for data types containing contiguous arrays of homogeneous
types, such as for std::vector
, std::valarray
or
boost::multiarray
should serialize them using an
array
wrapper to make use of
these optimizations.
Archive types that can provide optimized serialization for contiguous arrays of
homogeneous types should implement these by overloading the serialization of
the array
wrapper, as is done
for the binary archives.
© Copyright Robert Ramey 2002-2004. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)