What is Container Orchestration?

  • Container orchestration is the process of managing, deploying, and scaling containers, which are lightweight, portable, and encapsulated environments that contain application code, runtime, libraries, and dependencies. Container orchestration platforms automate the deployment, scaling, and management of containerized applications, allowing developers to focus on writing code rather than managing infrastructure

    1. Container Lifecycle Management: Orchestration platforms manage the lifecycle of containers, including provisioning, deployment, scaling, and termination.
    2. Service Discovery: They facilitate the discovery of services by automatically routing traffic to containers, enabling seamless communication between services.
    3. Load Balancing: Orchestration platforms distribute incoming traffic across multiple instances of a service to ensure optimal performance and availability.
    4. Automatic Scaling: They automatically scale the number of container instances based on predefined metrics such as CPU usage, memory utilization, or incoming traffic.
    5. Self-healing: Orchestration platforms monitor the health of containers and automatically restart or replace failed containers to maintain application availability.
    6. Resource Management: They allocate compute, memory, and storage resources to containers based on predefined resource constraints and requirements.

    1

    Container Lifecycle Management

    Manage the lifecycle of containers, including provisioning, deployment, scaling, and termination..

    2

    Service Discovery

    Facilitate the discovery of services by automatically routing traffic to containers, enabling seamless communication between services

    3

    Load Balancing

    Distribute incoming traffic across multiple instances of a service to ensure optimal performance and availability

    4

    Automatic Scaling

    Automatically scale the number of container instances based on predefined metrics such as CPU usage, memory utilization, or incoming traffic

  • Anypoint Platform offers several deployment options for your applications and APIs. These options cater to different needs and levels of control. See below the breakdown:

    1. Runtime Plane:
    • CloudHub: A fully managed cloud service by MuleSoft for hosting and managing Mule applications. Offers easy scaling, high availability, and automated updates.
    • Anypoint Runtime Fabric: A container service for deploying Mule applications and API gateways on your own infrastructure (AWS, Azure, VMs, etc.). Provides more control and customization but requires management responsibilities.
    • On-premises Mule runtime engine instances: Deploy Mule applications directly on your own servers. Offers complete control but requires significant infrastructure management.

    2. Control Plane:

    • Cloud control planes: Hosted by MuleSoft in public clouds (US, EU, Government). Used to manage applications and APIs deployed in CloudHub or on-premises with Runtime Fabric.
    • Customer-hosted control plane (Anypoint Platform PCE): Install and manage the control plane software on your own infrastructure. Offers full control but requires significant expertise and resources.

    3. Hybrid deployment: Combine CloudHub for public-facing applications with on-premises Runtime Fabric for sensitive data or internal needs.

    Factors to consider when choosing between the above options:

    • Level of IT expertise and resources: CloudHub requires minimal management, while on-premises options require more effort.
    • Security and compliance requirements: CloudHub offers pre-built security features, while on-premises options offer greater control.
    • Application performance and scalability needs: CloudHub scales automatically, while on-premises options require manual scaling.

    References:

  • Mule provides four different event processing models

    1. Synchronous flow
    2. Asynchronous flow
    3. Batch Job
    4. Scatter Gather

    Choice of the model depends on the use case. In the article we will explore different use cases and the appropriate event processing model to use in each of the use cases.

    Use case 1 – One way sync one-directional synchronization integration solution between two enterprise systems: from a RDBMS database (such as MySQL) to a Salesforce CRM system.

    Requirements –

    1. approximately a million rows of customer data are added to the database from another integration application.
    2. The Mule application must quickly forward these changed database rows to Salesforce.
    3. Only new or modified database entries should be processed.
    4. Records must be transformed to match the Salesforce data structures.
    5. Salesforce bulk operations should be used.
    6. Auditing and traceability of each record is critical.
    7. Acknowledgements must be sent to other endpoints and logs when records or files are received and when data is successful written to the target database.

    Questions to ponder

    1. What are the options to read in changes from a source database?
      • Design
        • What components can read the database, as soon as a row (s) are changed in the database?
          • Database on Table Row
          • Scheduler & Select
        • How can changed records be written to Salesforce?
          • One record at a time
          • Can records be written in bulk to Salesforce, or must they be written one record at a time?
            • There are a million records, what will be the performance impact of writing one record at a time
        • How to trigger the process?
        • Connection Management – manage, monitor and restart database connections
        • Types of errors and error handling
        • Processing latency
        • How fast and how much data can be written in the database
        • How can the data synchronization between the system be limited to only new or modified records?
        • How does Salesforce signal database state synchronized with Salesforce?
          • How does Salesforce signal the Mule that records were successfully transferred or not transferred?
          • Use transactions? & rollback
          • Use watermarks to synchronize writes to Salesforce? or rollbacks
          • Impact of duplicate records in Salesforce
          • Missing records in Salesforce
      • SLAs
        • real time/ near real time
        • restriction on how fast data can be read
        • scheduled data be processed at fixed rate
        • Avg. number of records per scheduling cycle
        • Highest number of rows to be processed during a scheduling cycle

  • What is Change Data Capture(CDC)?

    Change Data Capture (CDC) is a process that tracks and identifies changes made to data in a database. It captures the “deltas,” or specific modifications, like insertions, updates, and deletions, to keep other systems and applications informed and up-to-date.

    • Function: CDC identifies and captures incremental changes to data and schemas from a source. This means it focuses on the new or updated information within a data source, rather than copying everything all the time.
    • Origin: Developed to support replication software in data warehousing. By capturing only the changes, CDC allows for faster and more efficient transfer of data to warehouses for analysis.
    • Benefits:
      • Efficiency: Less data to move translates to quicker transfer times.
      • Low Latency: Near real-time updates ensure data warehouses have the latest information.
      • Low Impact: Reduced data movement minimizes strain on production systems.
    • Applications: Enables data to be delivered to various users:
      • Operational Users: Up-to-date information for day-to-day operations.
      • Analytics Users: Latest data for analysis and insights.

    Key aspects of CDC:

    • Real-time or near real-time updates: Unlike traditional methods that involve periodic batch processing, CDC aims to deliver these changes in real-time or near real-time, enabling quicker reactions and decisions based on the latest data.
    • Data integration: CDC plays a crucial role in data integration, particularly when working with data warehouses or data lakes. It efficiently synchronizes changes across different systems, ensuring consistency and avoiding data inconsistencies.
    • ETL applications: CDC can be integrated with Extract, Transform, Load (ETL) processes. Instead of full data refreshes, CDC allows for extracting only the changed data (delta), improving efficiency and reducing processing time.

    There are a variety of strategies employed to monitor and relay changes in data. Four common strategies:

    1. Timestamps: A column indicating “last modified” or a similar timestamp is integrated into the data structure. Subsequent systems scrutinize this column to pinpoint records that have been altered since their last check, focusing solely on the changes. Although this strategy is straightforward to apply, it may overlook modifications that occurred prior to the inclusion of the timestamp.
    2. Trigger-based Approach: Database tables are outfitted with triggers that activate in response to specified data modification events (such as insertions, updates, deletions). These triggers then capture essential details about the alteration and forward it to the intended system. This approach allows for tailor-made solutions and versatility but may elevate the database workload and intricacy. Selecting the most fitting strategy hinges on various elements, including the database in use, the required granularity of detail, the necessity for real-time processing, and available resources. For example, the timestamp-based method is well-suited for straightforward situations, whereas log-based or CDC streaming might be the choice for more intricate data processes or when immediate data access is essential.
    3. Log based: The ongoing surveillance of database transaction logs captures activities like insertions, updates, and deletions. These alterations are then reformatted to suit the requirements of the destination system. While this technique enhances precision, it necessitates extra computation and infrastructure.
    4. CDC Streaming: Some databases come equipped with innate capabilities for tracking data changes as an ongoing flow of data alterations. Downstream systems can tap into these streams either in real-time or with minimal delay. This method is highly effective and scalable, yet its applicability may be confined to certain database types.


    These strategies are compiled into three key enterprise patterns for implementing change data capture (CDC) in an effective and scalable manner:

    1. Push vs. Pull:

    • Push: The source database actively pushes updates to target systems whenever changes occur. This approach offers near real-time data synchronization but can be prone to data loss if target systems are unavailable. Messaging systems are often used to buffer and guarantee delivery.
    • Pull: Target systems periodically query the source database for changes. This is simpler to implement but introduces latency and requires careful handling of state management to avoid missing updates.

    2. Capture Techniques:

    • Log-based CDC: Changes are captured from database transaction logs, allowing for capturing all data modifications. It can be complex to implement and maintain, but it’s comprehensive.
    • Trigger-based CDC: Triggers are set up on tables to capture changes as they happen. This is simpler to implement than log-based CDC but may not capture all changes depending on the trigger configuration.
    • Timestamp-based CDC: A timestamp column is added to tables, and downstream systems query for changes since the last check. This is simple and efficient but may miss updates if timestamps are not frequently updated.

    3. Delivery Patterns:

    • Point-to-point: Data is directly streamed from the source to each individual target system. This is simple but can become unwieldy as the number of targets grows.
    • Event streaming: Changes are published to a central event stream, and target systems subscribe to the stream and process relevant updates. This is more scalable and flexible, allowing for dynamic addition and removal of target systems.

    Choosing the appropriate pattern depends on factors like data volume, latency tolerance, desired level of consistency, and the complexity of the target systems. Combining these patterns can create hybrid approaches tailored to specific use cases

  • UML, or Unified Modeling Language, is another key language used in various fields, but specifically focuses on modeling systems and software. It’s not like BPMN which directly maps business processes, but provides a flexible and standardized way to visualize and document the design of a system.

    Here’s a breakdown of what UML is and how it works:

    What it is:

    • visual modeling language with defined symbols and rules for creating diagrams.
    • Not a programming language, but rather a tool for communication, documentation, and analysis.
    • Used for object-oriented systems but can be applied to other domains as well.

    Purpose:

    • To help system and software developers visualize, construct, and document software and system artifacts.
    • To enable communication and collaboration among different stakeholders (developers, analysts, architects).
    • To explore potential designs and validate system architecture.
    • To document systems for future reference and maintenance.

    Types of Diagrams:

    • Structure diagrams: Show the static structure of a system, including classes, relationships, and components.
    • Behavior diagrams: Show the dynamic behavior of a system, including interactions between objects and sequences of actions.
    • Interaction diagrams: Focus on how objects interact with each other in specific scenarios.
    • Implementation diagrams: Show the physical components of a system and how they are deployed.

    Benefits:

    • Improves communication and understanding between stakeholders.
    • Facilitates design exploration and decision-making.
    • Helps identify potential problems and errors early in the development process.
    • Provides a common language for documenting systems.

    Resources:

  • Integration design patterns are reusable solutions to common challenges faced when integrating different systems and applications. They provide best practices and architectural recommendations for building reliable, scalable, and efficient integrations. 

    Here are some key points about integration design patterns:

    Benefits:

    • Reduce development time and cost: By using proven solutions, you can avoid reinventing the wheel and focus on your specific integration needs.
    • Improve integration quality: Patterns help you build integrations that are robust, scalable, and maintainable.
    • Promote best practices: They capture the collective wisdom of experienced integration developers and architects.

    Types of integration design patterns: 

    • Message Construction Patterns: These deal with the content and structure of messages, like Message and Event Message.
    • Messaging patterns: These patterns deal with how messages are sent, received, and routed between systems. Examples include Publish-Subscribe, Point-to-Point, and Message Queue.
      • Message Channel Patterns: These define how messages are transported, like Point-to-Point (direct sender-receiver) or Queue (messages wait in line).
        • Point-to-Point (PTP): A direct, reliable delivery from one sender to one receiver. Ideal for guaranteed delivery of critical messages.
        • Message Queue: Messages are stored in a queue and processed by receivers in order. Ensures FIFO (First-In-First-Out) processing, good for tasks that need sequential execution.
      • Routing Patterns: These determine how messages reach their intended receiver, like Unicast (single target), Publish-Subscribe (many receivers), and Fanout (broadcasting).
        • Request-Reply: A sender sends a request and waits for a response from the receiver, like HTTP requests.
        • Publish-Subscribe (Pub/Sub): A sender publishes messages to a topic, and interested receivers subscribe to that topic to receive relevant messages. Efficient for broadcasting information to multiple parties.
        • Fanout: A sender broadcasts messages to all subscribed receivers simultaneously. Useful for real-time notifications or event updates.
    • Data transformation patterns: These patterns focus on how data is transformed between different formats and representations. Examples include Canonical Data Model, Façade, and XSLT.
    • Integration infrastructure patterns: These patterns address the overall architecture of your integration solution. Examples include Enterprise Service Bus (ESB), API Gateway, and Integration Broker.