/* * Copyright (c) Likewise Software. All rights Reserved. * * This library is free software; you can redistribute it and/or modify it * under the terms of the GNU Lesser General Public License as published by * the Free Software Foundation; either version 2.1 of the license, or (at * your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser * General Public License for more details. You should have received a copy * of the GNU Lesser General Public License along with this program. If * not, see . * * LIKEWISE SOFTWARE MAKES THIS SOFTWARE AVAILABLE UNDER OTHER LICENSING * TERMS AS WELL. IF YOU HAVE ENTERED INTO A SEPARATE LICENSE AGREEMENT * WITH LIKEWISE SOFTWARE, THEN YOU MAY ELECT TO USE THE SOFTWARE UNDER THE * TERMS OF THAT SOFTWARE LICENSE AGREEMENT INSTEAD OF THE TERMS OF THE GNU * LESSER GENERAL PUBLIC LICENSE, NOTWITHSTANDING THE ABOVE NOTICE. IF YOU * HAVE QUESTIONS, OR WISH TO REQUEST A COPY OF THE ALTERNATE LICENSING * TERMS OFFERED BY LIKEWISE SOFTWARE, PLEASE CONTACT LIKEWISE SOFTWARE AT * license@likewisesoftware.com */ /* * Module Name: * * arch_rep.doxy * * Abstract: * * Architecture documentation * Data representation page * * Authors: Brian Koropoff (bkoropoff@likewisesoftware.com) * */ /** @page arch_rep Data Representation This portion of the architecture guide describes the octet stream form of data types understood by the LWMsg marshaller. All data encoding follow several common rules unless otherwise specified: - Type information is not encoded in the stream. - All multi-octet fields in the data stream are encoded in big-endian byte order. - Fields are not padded or aligned - Bits which do not have a specified meaning or value should be set to zero. @section arch_rep_core Core Data Types This section describes the core data types understood by the marshaller, and does not include extended types added by the association and connection abstractions, nor common type aliases which may be reduced to combinations of core types. @subsection arch_rep_int Integers Integers are arbitrary-length integral values (although the length must be a multiple of 8 bits). They are encoded starting with the most significant byte and ending with the least. Integers may be signed or unsigned. Signed integers are encoded using two's complement representation. @lwmsg_rep{Integer} @lwmsg_field{8, Most significant byte} @lwmsg_discon @lwmsg_field{8, Least significant byte} @lwmsg_endrep @subsection arch_rep_ptr Pointers Pointers represent a potentially-null reference to zero or more contiguous, homogenous elements of a particular type. If a pointer is not null, it must be unique -- that is, no two pointers in an encoded LWMsg octet stream can share a referent. There are three elements in the octet representation of a pointer: -# A flag indicating whether the pointer is null or not -# The length (number of elements) of the pointer referent -# The encoding of the elements of the referent The first byte of a pointer representation is a flag which indicates whether the pointer is null: - 0x00: the pointer is null - 0xff: the pointer is non-null Pointer types may be decorated with an attribute that requires them to be non-null, in which case the indicator byte is omitted entirely. The number of elements may be determined in three ways: -# As a static length -# As the value of an earlier field in the stream (correlated length) -# Implicitly through termination with a zero element If the first case, the length of the referent is well-known and is not encoded in the octet stream. In the second case, the length already appears previously in the stream and is not repeated. In the third case, the length is encoded explicitly as a 32-bit unsigned integer. In all three cases, the length specifies the number of elements, not the size in bytes. Finally, each element of the referent is encoded in order according to the rules of that type. In the case of a zero-terminated referent, the zero element is not encoded in the stream and is not counted in the transmitted length. The decoder implicitly adds it back. @lwmsg_rep{Pointer} @lwmsg_field{8, Indicator flag (omitted for non-nullable pointers)} @lwmsg_field{32, Length of referent (omitted for static or correlated length)} @lwmsg_field{w, Representation of 1st element} @lwmsg_field{w, Representation of 2nd element} @lwmsg_discon @lwmsg_field{w, Representation of nth element} @lwmsg_endrep @subsection arch_rep_arr Arrays Arrays share many characteristics with pointers but can never be null due to the fact that they are laid out contiguously in memory within their containing type. Because of this, their octet encoding is identical to that of a non-nullable pointer. Otherwise, arrays support the same set of length determination methods as pointers. @lwmsg_rep{Array} @lwmsg_field{32, Length of array (omitted for static or correlated length)} @lwmsg_field{w, Representation of 1st element} @lwmsg_field{w, Representation of 2nd element} @lwmsg_discon @lwmsg_field{w, Representation of nth element} @lwmsg_endrep Some encodings which are possible in theory are not allowed in practice because they cannot be decoded to a usable in-memory structure. In particular, an array with a variable length cannot occur in the middle of a structure or another array -- it must come at the end. This is known as a flexible array member. @subsection arch_rep_str Structures Structures are heterogeneous tuples of zero or more members, each of a specific type. Members in structures may be correlated: - The length of an array or pointer referent may be the value of an earlier field - The active arm of a union must be determined by the value of an earlier field The last member of a structure may optionally be an array with a non-static (variable) length. This is known as a flexible array member. A flexible array may not appear in any other position in a structure. A structure with a flexible array member must always be reached through a pointer with a static length of 1 -- that is, it may not be a direct member of another structure, of an array, or of a pointer referent with more than 1 element. The encoding of a structure is merely the encoding of its members in order. @lwmsg_rep{Structure} @lwmsg_field{w1, Representation of the 1st member} @lwmsg_field{w2, Representation of the 2nd member} @lwmsg_discon @lwmsg_field{wn, Representation of the nth member} @lwmsg_endrep @subsection arch_rep_union Unions Unions are a combination of one or more hetergeneous arms, only one of which is present for any given instance. Each arm is associated with a unique integer tag which identifies it. Every instance of a union must be correlated with an integer member of its containing structure. This integer is known as a discriminator and distinguises which arm of the union instance is active. Only the representation of this active arm is encoded in the octet stream. @lwmsg_rep{Union} @lwmsg_field{wa, Representation of the active arm} @lwmsg_endrep @section arch_rep_assoc Association Data Types The association abstraction extends the set of core marshaller types with primitives that are not meaningful outside the context of an association. However, they remain an integral component of the LWMsg stack and are included here for completeness. @subsection arch_rep_hand Handles Handles are opaque, persistent pointers which allow peers joined by an association to reference each other's objects without transmitting them. Handles are the recommended means of maintaining connection state. A handle's representation consists of its locality and handle ID. The locality is an 8-bit value which specifies the side of an association -- local or remote -- where the physical object represented by the handle resides. Alternatively, it may indicate that the handle is null. The handle ID is a 32-bit integer distinguishing the handle from all other possible active handles in the session. Handle IDs are arbitrarily assigned by the peer which first creates the handle. Both peers may by chance pick the same handle ID for handles they create; this is allowed because the locality of a handle is also taken into account when resolving the handle ID to an object in memory. @lwmsg_rep{Handle} @lwmsg_field{8, Locality} @lwmsg_field{32, Handle ID (omitted if locality is NULL)} @lwmsg_endrep The locality field has three legal values: - 0x00: The handle is null - 0x01: The handle is local from the perspective of the encoder - 0x02: The handle is remote from the perspective of the encoder @section arch_rep_conn Connection Data Types The connection abstraction builds on associations by adding additional primivite data types to exploit features of the underlying transport mechanism and operating system. @subsection arch_rep_fd File Descriptors The file descriptor type allows LWMsg applications communicating over UNIX domain sockets to exchange UNIX file descriptors between processes. Because the mechanism to achieve this involves passing special ancillary data to the kernel, the actual file descriptor is not encoded into the representation. Instead, an 8-bit flag is sent indicating whether the file descriptor was valid. @lwmsg_rep{File descriptor} @lwmsg_field{8, Validity flag} @lwmsg_endrep The flag field has two legal values: - 0x00: the file descriptor was invalid (-1) - 0xff: the file descriptor was valid and was transmitted as ancillary data **/