show header

Distribution

Introduction

The 10Duke SDK includes a generic distribution paradigm. Users of the SDK can create distributed applications and distributed systems to support large data volumes, large user counts, large and frequent transaction handling, intensive computation and sharing of resources in general.

A broad description and definition of what is a distributed system: A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages. [1]. In the 10Duke SDK the messages defining the interaction point between distributed components manifest as Distributable objects.

Distribution can take place on several levels and for different types of components including:

  • Storage
  • Data providing, e.g. using relational databases to access and manage data
  • User session data, e.g. allow end user requests to be randomly directed to one of several servers while being able to authenticate and validate session
  • Computation in general
  • Between logical components, e.g. between server instances servicing end users and reporting

Each of these identified categories for distribution can be configured and controlled separately.

Why and when is distribution needed

Some common cases where it is feasible to consider Distribution:

  • users are geographically spread over a large area, e.g. on different continents and the application allows users to interact with each other with requirements for (soft) real time message exchange. In this case it is likely that faster data distribution can be achieved between users by using the infrastructure of the system to distribute data locally closer to a user.
  • requirements for data security and resilience where traditional data back-up is not feasible
  • requirements to serve more end users than one server or one single cluster can handle
  • content delivery with requirements for responsiveness and delivery time.
  • layered architectures that process user requests, transactions and storage on several tiers. E.g. a node in a system accepts user uploads and caches received files locally and schedules putting file into storage tier asynchronously.
  • cases where a system is comprised of several logical subsystems. E.g. primary application, reporting data collection and computing reports

Distributed storage

The SDK includes a provider interface for using storage. The section for storage contains all relevant information starting with basics and covering advanced distributed storage scenarios.

Implementation for distributed storage is provided as extension modules for the SDK. Cloud storage providers can be integrated with ease and the base SDK itself includes a storage provider implementation for using Amazon S3. Users of the SDK are also able to create their own implementations of storage providers.

Distributed data

Distributing data is achieved from application development point of view by two patterns in the SDK:

  • Data provider
  • Command

Data providing is considered to decouple access to data from the actual implementation of how the data is accessed. The most common example and the primary case where data providing is used in the SDK is for making queries that result in object model instances and for persisting object model instances. The underlying implementation for backing up queries and persistence is commonly achieved by using relational databases or object databases (or both). Advanced implementations of DataProvider could use sharding techniques and even mix use of relation databases with so called no-sql databases.

Features of default data distribution:

  • Business objects are distributed across all participating nodes in a distributed system. Distribution of business objects usually takes place when they are persisted.
  • Each node participating in distribution has got a dedicated database instance / server reserved for it.
  • If two nodes were to share a physical database in this default case it would mean duplicate transactions and violation of unique key value constraints in many cases. Several nodes sharing the same physical database instance in a distributed system requires thus a different distribution approach (configuration) to inhibit duplicate transactions on database level.

Distribution of data can be encapsulated by:

  • an implementation of DataProvider

  • using commands to execute business logic processing (Command)

More to study: using commands to execute business logic processing

Computation in general

General distributed computation is best done using the Command patterns and it's implementation Command. Commands fit this purpose well and they do not pose any limitations to the processing and computation that can be implemented.

Reporting

Reporting usually means collecting and storing high volumes of data. The data may be produced in client applications, on server when accessing resources or any place imaginable in a distributed system. High volumes of data, which is usually not required by the actual application that reporting is for, is a good motivation for externalizing reporting to an other component in the system. This structure means that the primary application will produce what we call reporting events or reporting entries and submit those for processing and storage to the component responsible for handling reporting data. This organization moves load away from the primary application and frees it to serve users.

More to read about reporting...

Distribution in depth

Architecture

graphviz-G-5df3a7f086c47cd110b92bcf6f03da05ad0b0b8a.png

Special notes about the rules and formats used for distribution (protocol):

  1. Specification of the sequence of messages is left for applications to define (messages are business logical)
  2. Specification of the format of the data in the messages:
    • For distributable objects available in the SDK the data format is XML as defined by XmlSerializer.

    • Applications that define own distribution requests may use own data format and serialization techniques.

General usage

Technical sequence

  1. Acquire a DistributionRequestFactory (e.g. using DistributionRequestFactories)

  2. Request factory to create DistributionRequests objects based on the Distributable object you which to distribute. A factory implementation may inspect the type and use the Distributable object to determine what type of DistributionRequests objects to create.

  3. Invoke / run the DistributionRequests objects created by the factory (e.g. using ConcurrentExecutionManager acquired from ExecutionManagers

Required configuration / initialization

  1. See DistributionRequestFactories in API reference.

Relation business logics

The 10Duke SDK and user code includes code that can be classified as application processing / business logical processing (e.g. operating on an object model for instance). In most 10Duke SDK components business logic level implementation is encapsulated in Command implementation classes. When Command objects are invoked / executed using CommandScheduler they will automatically be distributed for execution on other nodes in the system (depending on configuration for DistributionRequestFactory using DistributionRequestFactories, see CommandScheduler for entry point to study configuration).

To summarize, when business logical processing is implemented by classes inheriting from Command and those classes are executed, it follows that the processing is automatically part of distribution.

Topology and structure

System nodes and their logical connections define the system topology and structure. The topology and node instances in a distributed system are not known to application level code using distribution. Runtime instances of DistributionRequestFactory and AbstractDistributionRequest implementation classes define the logical connections. The network of nodes may be different for various distribution cases. Two cases, which have separate configuration for distribution are distribution of commands and distribution of storage. Configuration of DistributionRequestFactory along with suitable end-points on servers fully hide the complexity of runtime connections of servers and topology of nodes.

Related implementation classes can be found in:

  • In package com.tenduke.services.globalpeergroup for command distribution (based on static configuration using HTTP(S) as protocol)
  • In package com.tenduke.services.storage for storage distribution (based on static configuration using HTTP(S) as protocol)

Data and objects

SDK distribution paradigm provides full location transparency for objects and data. Application level code deals with business logics and relevant processes instead of concerns with where data is stored and how data is distributed. Command, CommandScheduler and DataProvider provide the proper framework for dealing with data and distribution. Object identification relies on UUID as identifier type and is declared by SerializableObject.

Distributed invocation semantics

Invocation semantics deal with the reliability and quality of service from callers point of view (application level code using distribution). Let us start by naming the following basic invocation semantics:

  1. Maybe, which means that requested execution may happen on remote node(s)

  2. At-least-once, which means that requested execution will happen at least once remote node(s)

  3. At-most-once, which means that requested execution will happen once or not at all on remote node(s)

The default invocation semantics for distribution in the SDK is Maybe. Possible failure cases for the Maybe semantics are:

  • omission failures
  • crash failures

Concurrency

Distributed system and application design must consider the fact that execution is concurrent by nature. Concurrency is one of the primary benefits (and challenges) of distributed systems as it allows for scalability by enabling processing and handling of several tasks at the same time. It is recommended to apply system and application design that naturally allows for independence and decoupling between both components and logical level items. The SDK default is to allow absolute concurrency, i.e., the base implementation of the SDK's built-in distribution and components using distribution do not synchronize their operations globally by default.

Concurrent access

Generic challenge: there exists a possibility that several actors in client roles will attempt to accessed a shared resource at the same time AND the shared resource must be protected and maintained for consistency, allowing only one client to access it at a time. This use case requires a global mechanism to ensure only one client is accessing the shared resource. The global mechanism itself is logically based on distribution (e.g. a node has been elected to provide a global semaphore for instance). Alternatively business logic processing can be implemented so that it naturally protects against problems of concurrent access to shared resources AND does so without a global synchronization mechanism. Next, lets discuss a few scenarios and examples that outline ways to structure business logic processing to allow concurrent access without global synchronization.

Scenario 1: A shared resource in form of an object (e.g. an object model instance) is access simultaneously by several clients. Each client updates fields in their copy of the object independently of each other. After setting fields the clients request the object to be persisted (e.g. written to a database, see DataProvider). In this case it is possible that the separate clients have updated same fields of the object with conflicting values and when the object is persisted some client's data will be lost (or even worse the data in the object will cause inconsistency in the application).

An example of design and implementation for non-synchronized way to deal with this scenario: option in command CreateOrUpdateObject to only write changes. In many cases the "write changes only" option will reduce the probability for inconsistent writes to so low that there is no need for global synchronization of accessing the object. A few real life use cases to back this up:

Several users decide to view the same video in a video gallery. Each time a user clicks play on the video player the view count for the VideoEntry object is incremented. At the same time, the owner of the video edits the video object's title field (VideoEntry's displayName). For each view count increment and the title editing the application makes requests to persist the VideoEntry object using CreateOrUpdateObject and the "write changes only" flag set to true. The application design and usage of the business logic level to operate the VideoEntry object provide natural protection against losing data without any global synchronization: view count incrementing only updates VideoEntry consumptionCount field and title editing only updates VideoEntry's displayName field. By this organization it is not possible that the VideoEntry displayName would be lost due to concurrent view counting for the same object. The only risk is to loose a few user's view counts if increment processing is not sufficiently implemented.

Several users decide to view the same video in a video gallery. Each time a user clicks play on the video player the view count for the VideoEntry object is incremented. Keeping track of views for the VideoEntry object is done using ObjectConsumption. This will result in a new ObjectConsumption instance for each tracked view and there is only need to update the relation between the VideoEntry object and the new ObjectConsumption object (no concurrent access to VideoEntry).

Global clock (lack of)

Sending messages and coordinating actions between components in a distributed system often requires ordering of some kind. We will limit discussion to ordering by:

  • primarily structure of data (structure of messages).
  • time
  • logical clocks

Agreement and synchronization of time is non trivial in distributed systems. It is sufficient for most applications in the class of social media and networked web applications to synchronize system time using NTP for example. Due to the nature of networks, communication and physical limitations it is impossible to synchronize time in a distributed system exactly.

Distribution components in the SDK rely primarily on structure of data / structure of messages to ensure correct ordering in business logic processing. Structure of data means for example order of serialized objects in a message. The most relevant example is given by CommandScheduler and executing Command objects. Exploring a few real life use cases will explain further:

Example 1: A user uploads an image and adds title, description and tags. Title (displayName) and description are direct fields in the ImageEntry object but the image is tagged using Tag instances. It is evident that the ImageEntry instance must exist before it can be tagged. Then it follows that the command that creates the ImageEntry instance will have to execute before the command that tags the image. - Correct ordering in this case can simply be achieved by:

  • One instance of CreateOrUpdateObject command with objects set in the following order: first the ImageEntry and then the Tag objects.

  • N instances of CreateOrUpdateObject command with the first command instance creating the image entry and the following commands creating the tags. The correct ordering is achieved by wrapping the commands in the correct order in a Batch command.

Example 2: A client application tracks usage and submits report entries to the system (class instances of type AbstractReportEntry). The client application implements a fail safe to buffer the report entries locally for sake of recovering from transmission errors to server. However, the client application code has not included setting the report entry time stamp, which by default gets set on server side if it is undefined. This results in time stamps larger than the time when the event occurred in recovering and client re-transmitting the failed report entries. In reporting use cases where report entries form logical pairs for analysis, e.g. tracking client usage time with launch and exit event, a failure will occur for data where report entry that occurred before it's logical counter part will have an incorrect larger time stamp because it was set on server side when it was later re-transmitted. Similar errors would occur even if the entry time would be written by client and client - server time has not been sufficiently synchronized or the two do not have agreement / knowledge about time zones.

Independent failures

We define an independent failure to mean an error that causes processing to fail on one node while the same operation is successfully completed on other nodes. Independent failures may happen on different logical levels. The topmost logical level is node unavailability due to crash or network outage. The lowest logical level is represented by a detailed internal operation error while executing code on a node.

Commands

The default scheduling and execution of commands will omit distribution for commands that fail execution on the primary node (node that started the processing). On second level when the primary node has completed execution of a command successfully it will distribute the command for execution to nodes configured for distribution. At the distribution stage the system allows independent failures. Information for failures may be available in the corresponding CommandResult and OperationResult objects related to the distribution requests.

Business entities

In most cases it is critical that persisting an instance of an object model sample in a distributed system is consistent and successful on all relevant nodes. Reactive failure handling is usually required in the event of independent failures that cause the absence (or inconsistent state) of the object model instance on a node.

Storage

StorageProvider based use of storage fully encapsulates the handling of independent failures. A specific StorageProvider implementation is responsible to handle internally failure scenarios providing recovery, resilience, etc. security as required.

Reporting

The base reporting client and server implementation delivered with the 10Duke SDK allow configuration for buffering and error logging.

  • Error logging in the reporting client component includes the capability to store report entries that failed to submit to server locally on disk.
  • Error logging in the reporting server component includes the capability to store report entries that failed to bind using DataProvider locally on disk.

Failure handling

Implementation of and implementing failure handling is split into categories:

  1. Failure handling that is implemented within a component and can not be modified or altered by the caller (some higher level business logical context)
  2. Failure handling that is the responsibility of a higher level business logical context that uses a component that reports back a failure
  3. Decoupled and independent failure handling in retrospect, e.g., by inspection and analysis of system state

Commands

Failures arising from command execution and distribution usually requires reactive failure handling in the higher level calling context (category 2). In other cases the used Command class shall specifically document implementation that would imply category 1. Since Commands represent generic distributed processing category 3 failure handling may also be required.

Business entities

Failures in distributing object model instances usually leads to inconsistent system state directly visible to end users. The recommended way to handle failures is using categories 1 and / or 2. In other cases it is likely that there will be a clearly observable lead time in fixing the system state. A practical example to explain more: A user registers (a Profile is created) and posts a topic to a forum. The system is composed of three nodes of which one failed to write the user's Profile. The forum topic was successfully written on all nodes. The observable error when accessing the forum on the node that failed to write the Profile is a broken link to the user's profile page and potentially other missing information with the topic. This error would only be possible if writing a forum topic that relates to a Profile is possible with the broken reference (e.g. no foreign key constraint used in the schema that binds object models instances for Profile and Forum).

Storage

Failure handling for storage cases shall primarily be implemented in a StorageProvider. The calling context is usually responsible to handle a higher level business logical failure scenario of which failure in storage is just one part. Practically this means that the StorageProvider should return success return values only if storage was successful. If the StorageProvider returns failure code then the calling context must decide how to continue and what actions to apply.

Reporting

Failure handling in reporting is currently decoupled and independent. Practically this means:

  • (re)submitting report entries that the reporting client component has stored on disk
  • triggering (re)run of binding report entries to persistent storage on reporting server component

Heterogeneity

Deploying heterogenous systems may include the following aspects:

  • Different operating systems are used on participating nodes
  • Different combination components are used on participating nodes
  • Components are implemented using different base technologies, e.g., Java, .Net, PHP, Ruby, JavaScript, ActionScript, etc.

  • A variation of network technologies and protocols
  • etc.

The primary design and implementation in the 10Duke SDK that allows all these aspects manifests as:

  • Neutral external data formats for serialization of data, e.g., serialization of object models instances, commands, etc.
  • Default implementation provided for XML and JSON based serialization of data
  • Well encapsulated mechanism to make requests independent of network, transport of protocols used
  • Default implementation for HTTP(S) request model
  • Development and testing done for Linux and Microsoft Windows operating systems.
  • .Net, JavaScript and ActionScript libraries are available on request.

Openness

The 10Duke SDK design and implementation is based on open protocols and standards. Where open protocols and standards are not used openness is provided by:

  • allowing for extensions for all core concepts
  • default XML external data format is open.
  • StorageProvider implementations are documented and users may access storage with StorageProvider supplied by others

  • servlet client interfaces are documented to allow access from clients capable of HTTP(S) request model

Transparency

Transparency definitions and requirements exist to guide design and development of a distribution so that a system is perceived as an whole instead of explicit components or collections of components.

The ANSA Reference Manual (ANSA 1989) and the International Organization for Standardization's Reference Model for Open Distributed Processing (RM-ODP) ISO 1992 identify eight forms of transparency. In this section we will reflect the 10Duke SDK design and implementation against these standard definitions.

Access transparency enables local and remote resources accessed using identical operations. Examples of core concepts, patterns and interfaces provided by the 10Duke SDK to encapsulate distribution and provide access transparency:

Location transparency enables resources to be accessed without knowledge of their physical or network location. The 10Duke SDK also enables requests, transport and communication without knowledge of concrete underlying network topology or protocols.

Concurrency transparency enables several processes to operate concurrently using shared resources without interference between them.

The 10Duke SDK does not by default enforce locking or reservation of a resource before using or accessing it. The default is to allow full concurrency without any interference. The default distributed sequence for access is operating on local resource instance first and then replicate the operation by distributing processing to other nodes in the system. Distributed synchronization for globally protected access to a resource is left to be implemented by the SDK user.

Replication transparency enables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers.

The default distribution mechanism enables multiple instances of resources by it's nature. Consider the example of deploying N nodes and each of those node would have their own dedicated database instance for storing object model objects. A business logical request to operate on some part of the object model would start on one node and be distributed to other nodes once the first node completes. Every node that replicates the same operation then operates on object queried from it's local dedicated database (and writes back objects to the local dedicated database).

Configuration for distribution, implementation freedom for DistributionRequestFactory and AbstractDistributionRequest classes allow for detailed control over replication and distribution logics of operations.

Failure transparency enables concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components.

The primary mechanism to communicate failures back to callers is the base class OperationResult and classes inherited from it. OperationResult defines a nested structure for child OperationResults. This pattern allows for detailed inspection of reported errors retaining full business logical context. It also fully conceal faults.

Mobility transparency (migration transparency in the standard) allows the movement of resources and clients within a system without affecting the operation of users and programs.

The 10Duke SDK supports developing applications and systems that can service users in a complete and consistent way independent of the node that the user is connected to. Example scenarios: A system with web servers on several continents is deployed. DNS is configured with GeoIP and will direct users to the geographically closest node relative to the user. Distribution is configured to replicate user data in real-time over all nodes in the system. Now, if the user travels or interacts with other users on other continents the user profile and system state will appear complete and consistent independent of the user's location and the node that serves the user.

Performance transparency allows the system to be reconfigured to improve performance as loads vary.

The 10Duke SDK supports and allows configuration to manage:

  • scheduling concurrent tasks
  • request timeouts and wait periods
  • response timeouts and completeness where responses from several nodes are waited for.
  • memory usage and limits

Scaling transparency allows the system and applications to expand in scale without change to the system structure or the application algorithms.

The 10Duke SDK supports and allows configuration to manage:

  • topology and tactics for distribution
  • re-structuring systems physically while maintaining logical structure
    • dynamically adding and removing nodes to a system to distribute load
    • separating reporting data collection from nodes that run primary application
    • separating reporting analysis from nodes that run primary application
    • separating media handling from nodes that run primary application
    • separating storage and content delivery from nodes that run primary application
  • implementation and configuration of data providing to use sharding and mixing different types of databases

References

  1. DISTRIBUTED SYSTEMS CONCEPTS AND DESIGN, George Coulouris, Jean Dollimore, Tim Kindberg, ISBN 978-0-321-26354-4
  2. Parallel and Distributed Simulation Systems, Richard M. Fujimoto, ISBN