Microservices: The Big Picture

1> Microservices Architecture by Rag Dhiman

2> Modern Software Architecture by Dino Esposito

What Are Microservices?

The main difference between the SOA and microservices is that, though they both develop a single application as a suite of small services, in microservices, we deploy each of these services in their own process and communication with each other (in a lightweight manner, i think this is where json/rest comes into picture)

Each microservice live independently, but they rely on each other to achieve the task.

Bounded context
Autonomously developed
Independently deployable
Exchange messages (should be light weight)
Team more agile
Faster to develop (parallel teams)

TODO: Bounded context

Microservices Elements

Building a Monolith

Monolith

PRO’s

Simple to develop
Simple to build
simple to test
simple to deploy
simple to scale

Cons

Difficult for new team members
Growing teams
Code harder to understand
Difficult for emerging tech
Scaling - need to scale the whole app and not just the required service’s
Overloaded container
huge database

Building Microservices

Sub domains should be independent and separate. One technique is to duplicate the dependencies in both domains. eg. trail or service.

I am not a big fan of this though. TODO: Look for alternative patterns

Organization

In MS, teams are more independent and responsible as the sub domains are modeled that way. The team sizes can be different and easily managable

The teams can have their own documentation formats, repo and technologies.

Data Store

Since services are independent, they have their own data management rules and hence it helps to have their own data source.

Transaction heavy, some are write mostly and some read only. One may need a relational solution and another a noSQL solution. This kind of heterogeneous solution is only possible with MS.

But the concept of individual data sources create the problem of data synchronization (specifically the dependencies that we have duplicated across domains).

There is no distributed transaction concept in MS, as that will lead to low performance. So we will begin with immediate consistent and then move to eventual consistency (this is where event sourcing like kafka comes)

TODO: Read about Debezium for “Capture Data Change”

Data Source : Akka, Kafka, Rabbit MQ

User Interface

UI: Helps to have a composite UI, which is going to make multiple async calls and displays the data as and when it loads. eg:amazon

Services

Services: They need to communicate in some way,

“Remote Procedure Invocation” or RPC. This is based on request and reply concept and could be sync or async. REST, SOAP, gRPC
“Messaging” via a broker or channel using a publish and subscribe.

Async communication helps a lot to maintain things loosely coupled in MS architecture. They also improve the availability since the broker buffers the message till the consumer is able to process them

Messaging : Kafka, Rabbit MQ

Protocol Format Exchange

TEXT : XML, JSON, YAMLS (human readable and easy to process)
Binary : gRPC (more compact)

APIs and Contracts

Each MS need to publish a contract containing the set of services on what is available and how to invoke them
Depending on the protocol that we are using we will be using WSDL or Swagger
To suit multiple devices and needs, you can also have multiple API’s and different device specific contracts

Distributed Services

Service Registry

It is a phone book of services with their locations, letting clients look up services by their logical names.
We can hardcode the hostnames and their ports, but that’s not reliable. To make our app more reliable and scale up for more instances, we need Service Registry
Each service on start does a self registration to keep the registry uptodate for later lookup
Eureka, Zookeeper, Consul

CORS

Cross Origin Resource Sharing
We can only interact with documents on the same server as the current service
Origin = protocol + server + port
In MS, we deploy each service in different server’s, so this is a problem
Need to resolve by using the HTTP Headers “Access-Control-Allow-Origin”
TODO: Are we not relaxing security here? Is this the right thing to do?

Circuit Breaker

When we make service chaining (calling multiple services) and when one of the service is down, it leads to a cascade of failure and we call it “Domino Effect”
One failure in one system can lead the whole system to fail.
We resolve it by invoking a proxy that deviates the calls if it see failure more than a threshold
After timeout, it starts allowing calls or reintroducing traffic in limited numbers till the flowthrough is high.
TODO: Where will it deviate during failure? what will happen to the user action?
Hystrix, JRugged

Gateway

Can be used for resolving cross-cutting concerns like security
Acts as the single entry point to the service
Can handle authorization and authentication here rather than being duplicated in each service call.
Unified Interface
Can handle API Translation
Zuul, Netty, Finagle

Security

Not specific to MS, this is the per-requisite for all now
Identify and Access Management system can be delegated to take care of authorization and authentication
Gateway can take care of this and with Single Sign-On, individual services need to replicate the same.
authentication protocols : Kerberos, OpenID, OAuth 2.0, SAML
IAM: Okta, Keyclock, Shiro

Access Token

it securely stores information about a user and is then exchanged between services
helps to communicate the identity of the requester between services without the need to re-authenticate every time.
just need to verify the token and see if the user is authorized for the operation (without the need to authenticate)

Scalability and Availability

Scalability

Vertical Scaling : More CPU and RAM (i think is this used for monolithic, this is also the CON of monolothic)
Horizontal Scaling : Service Replication across multiple servers. This is where load balancer comes into picture.
Ribbon, Meraki

Monitoring

Many moving parts and machines to monitor
Needs to be centralized
Good to be Visual
Kibana, Grafana, splunk

Health Check

service could be running but incapable of handling requests, due to connection issues?
need a health check API to perform various checks
the health check is similar to heart beat and can be called periodically.

Log Aggregation

Helps to analyze and search the logs with more end-end infor
logStash, Splunk, PaperTrail

Availability

probability for the system to be available at a given time.
apps are usually 99.999% available.
Single Point of Failures like a single Gatway, a single broker or a single IAM could bring the whole system down

TODO: Check on Dapper, HTrace, Zipkin for Distributed Tracing

Deployment

Metrics
Auditing
Rate limiting (to save from DDos attacks)
Alerting
Distributed Tracing (correlation id and chain of calls)

Are Microservices Right for Your Organization?

TODO: Very nice slide summary (Microservices: The Big Picture)

Microservices Architecture

Introduction

What Is a Service?

Services are stateless. So the instance of the service doesn’t need to remember the previous request/call of the call to that service, it has all the info that it needs in the request.

Microservices Introduction

Microservice architecture is SOA done right. Its an evolution of SOA, it builds upon its principles.

SOA led to monolithic, but the concept it right. Microservice is a better implementation of it.

Microservice makes use of lightweight communication, both between,

client to service
service to service

Microservices needs to have a Technology Agnostic API, this means they need to follow a open communication protocol so that it does not dictate the tech. This enabled a .NET to talk with a JAVA service.

Independently

changeable
deployable

The Monolithic

In Monolithic, a minor change could result in complete rebuild of the application

The only advantage of a monolithic over microservice is that the whole application runs on a single machine and hence it is easy to replicate and configure environments.

Design Principles Introduction

Microservice Design Principles

High Cohesion
Autonomous
Business Domain Centric
Resilience
Observable
Automation

High Cohesion

Single focus
Single responsibility
Do one thing and only one good thing
Similar to encapsulation principle
Easily rewritable

Autonomous

Loose coupling
Change to a MS should not be the reason for the change in another MS
Honor contracts and interfaces
Stateless
Independently Changeable and Deployable
Concurrent development

Business Domain Centric

Bounded context
scope of service

Resilience

Embrace failure either by Degrade functionality (like in amazon, show only what is available) or default functionality
Multiple instances
Network failures and other kind of failures

Observable

System Health
Centralized Monitoring
- For capacity or scaling
Centralized logging
- For debugging distributed transactions

Automation

Automate everything

Microservices Design

High Cohesion

Identify a single focus
Split into finer grained services
Avoid “Is kind of the same”
Don’t get lazy by the additional work in creating/splitting a current service into multiple

Autonomous

Loosely coupled
Achieved by communicating with network
Synch - this has an advantage to know if the call is success before going to the next step. But in reality, i do not think so. These can be achieved easily with async too
Asynch - Publish events and others subscribed to event
Asynch is the most decoupled form of communication
Avoid client libraries, this increased coupling
Use Technology agnostic API (like JSON/REST)
But DO NOT overdo the communication and create a chatty exchange between services (i think that’s where DDD comes in future for this)
Avoid sharing things like database between services. It has shortterm benefits, but create high coupling in futures.

Autonomous: Ownership and Versioning

Versioning

Create a new version of the service if there are breaking changes on the contract and is not backward compatible.
Maintain concurrent versioning when we need more time to decom the old version
Co-existing endpoints
Semantic versioning
Major.Minor.Patch (e.g.: 15.2.1)
Major = non-backward compatible
Minor = backward compatible
Patch = bug fix

Resilience

Design for known failures
Failure of downstream systems
Degrade functionality on failure detection
Default functionality on failure detection
Design system to fail fast (no value in doing a degrade/default func after 5min, it should be seamless)
Use timeouts
Timeout after a threshold
service to service
service to other systems
network outages and latency too, not just system/service issues
Monitor and log timeouts

Observable: Centralized Monitoring

Centralized Monitoring
Real time monitoring
Monitor the host (CPU, memory, disk usage)
Expose metrics within the service
Response times
Timeouts
Exceptions and errors
Business data related metrics
Number of orders
Avg. time for checkout
Collect and aggregate monitoring data
Visualize the monitoring data for trends
Tool to compare data across servers
tool that can trigger alerts based on thresholds/inputs

Observable: Centralized Logging

Centralized logging
When to log
Startup or shutdown
code path milestones
timeouts, exceptions
Structured logging
Level (Info/debug/error/stat)
Data and time
Correlation ID
Host name
Service name and service instance
Message
Traceable distributed Correlation ID

Observable

The difference between Centralized Logging and Centralized Monitoring is
Monitoring is just the numbers and counts
Logging stores the detailed information behind those numbers
Monitoring tells that something happened and Logging tells what/why/how that happened.

Summary

High Cohesion
Single thing done well
Single focus
Approach
Keep splitting service until it has only one reason to change
Autonomous
Independently changeable and deploy able
Approach
Loosely coupled system
Versioning strategy
Microservice ownership by team
Business Domain Centric
Represent business functions or represent a business domain
Approach
Course grain business domains
Subgroup into functions (CRUD) and areas (get invoice)
Resilience
Embrace failure (design for it)
Default or degrade functionality
Approach
Design for known failures
Fail fast and recover fast
Observable
See system health
Centralized logging and monitoring
Approach
Tools for real-time centralized monitoring
Tools for centralized structured logging
Automation
Tools for testing and feedback
Tools for deployment
Approach
Continuous Integration Tools
Continuous Deployment Tools

Technology for Microservices

Synchronous Communication

Drawback of Synchronous Communication is that

both parties have to be available, even if one of the service is down, the whole transaction will not be fulfilled.
Performance is subject to network quality. Service may not be slow, it could be the network that is slow.

Asynchronous communication

mitigated the need of client and service availability, there by decoupling them.
More complicated to implement
System relies on other platform (Message Q’s), this creates another point of failure in the system
Queuing issues lead to difficulty in debugging

Moving Forward with Microservices

Brownfield Micro service : Approach

Existing system
Monolithic system
Organically grown
Seems too large to split
Lacks microservices design principles
Identify seams (the borders in your app along which you want to divide the domains)
These seams are the future microservice boundaries
Separation that reflects domains
Identify bounded contexts
These are not going to be clear and there will be lot of spill over. We need to identify the spills and decide where they needs to be added.
Start modularising the bounded contexts
Move code incrementally
Tidy up a section per release
This is to ensure existing functionality remains intact and usable
Write unit and integration tests to verify and validate the changes
keep reviewing and correcting

Brownfield Microservices : Migration

Convert bounded contexts into microservices
Start off with one (to get comfortable)
Make it switchable (maintain two versions of the code to switch in case of issues)
How to prioritize ?
By risk (starting with the least risk)
By Technology
By dependencies
Incremental Approach
Integrating with monolithic
Monitor both for impact
Monitor operations that talk to microservices (to see if the over the network microservice communication is degrading the performance of the monolithic)
Review and improve incrementally
Incrementally the monolithic will be converted

Brownfield MS : Database Migration

Split database across seams
To be compatable with existing monolithic, create a data link that connects to multiple databases
For tables that link across seams, we either need to redesign them or make use of a API call to fetch that data.
TODO: How to ensure data integrity between the services?

Brownfield MS : Transactions

Transactions are very easy in monolithic, not the same in microservice

Greenfield Microservices

Evolving requirements, so not clear understanding of the domains
System boundaries will evolve
Start off with a monolithic design and let the seams evolve into clearer boundaries
Modules becomes services

Microservices Provisos

TODO: “Provisos”, what is this?

Microservices: Getting Started

Understanding the Benefit of Microservices

Human Advantages of Microservices

Monolithic demands a huge domain knowledge as you need to have the ability to make changes across all the modules of the application to implement a feature as they are all tightly coupled together. It is a high learning curve.

It talks about the elimination of bottlenecks due to narrowed scope of responsibility and hence has low risk, this need not be true, as we give the ownership of a microservice to a team of two and hence the whole microservice could be potentially impacted. net-net, a debatable concept.

Daniel Pick’s 3 elements that are essential to creation motivation

Autonomy
Purpose
Mastery

Microservice helps address Autonomy and Mastery. That’s 67% of motivation out of the box.

Prefer a microservice per team, not teams per microservice. Subtle but critical. TODO: How to keep the team occupied if they are dedicated for a service and the service has no work in this PI?

Resilience Benefits of Microservices

Module Boundaries can blue over time. Enforcing isolated codebases reinforce boundaries.

The Challenges of Microservices over Monoliths

Microservices Are Hard

Cost of Microservice is “Complexity”

Resilience of Network Communications

IF not handled properly, the resilience challenges can lead to cascading failure, amplification effects and causing whole app downtime and outages all because a single microservice is down there by defeating the whole purpose of microservice.

Running Microservices in Production

Microservices emphasis strongly on automation as a single manual step in monolithic get amplified to Xtimes (once per each microservice). So the need for automation is more in the microservice.

Challenges of Independent Governance

TODO: What is the role of a software architect in microservices?

The emphasis in MS architecture is in ensuring the right balance between autonomy and consistency for the teams. They should be autonomous enough to do their task, but not so much that we lose the consistence across the applications. It’s a fine balance.

Pulling data reports in monolithic is easy as it’s just a single data store. Due to multiple data stores in MS, we have to redo the same across different data stores.

TODO: How do you do a join query?

Beware of “Distributed Monolith”, it has the complexity of microservices and the inflexibility of a monolithic. The worst of both world

Need for a join across multiple data stores may mean that the boundaries of microservices are poorly choose.

Reviewing Challenges of Microservices

Fancy word - “Module extraction is perilous”

Synchronous Communication Architectures

Introduction to RPC

RPC = Remote Procedure Call

HTTP Traffic, SQL Queries and even SOAP is also a RPC.
RPC Properties
Arguments
has a Return Value
Executed Immediately
Client controls communication (its a pull)
Two hosts involved
Client waits for server (hence synchronous)
Very easy to implement

Service Discovery

Network address of service is located dynamically
DNS of Internet is the same concept.
This enables each service is still independent and need not change when the IP of another service changes.

Apache Zookeper is a kind of Service Discovery Tool

Used internally in Hadoop, Kafka
Has Massive scalability
High performance
But comes with the cost of complexity

Hashicorp Consul

Provide Multi-cloud complexity
Provided security

Stateless and Stateful RPCs

Stateless RPCs

Functional Procedures - Pure functions that only influence return value
Eg: Time server and invoice generator
Conduits - serves as a facade to third party API

Stateful RPC

More complex than stateless
Most meaningful work is stateful (huh?)
State is distributed (very important)
Need to handle Unreliable network (eg: Payment transactions)

TODO: Network Partition Where network thinks some task is completed but the task never made through the network and is left waiting for response from the client end. When you do a retry, the stateful system rejects the duplicate.

TODO: Idempotence State should not be changed due to multiple calls. In a chat service, the decision to show multiple notifications or not should be left to the notification service rather than the chat service. this enables us to reduce the risk of coupling.

Co-ordination of Stateful RPC’s

All-or-nothing behavior of storing data is impossible to achieve.
Choose a failure mode and identify whose/which service state is more important to maintain.
The co-ordination of state across multiple services is often called distributed transaction

Performance of Synchronous Communications

Performance of Synchronous communication

Consider two services A and B with availability of 99.9 and 99.5%

Communication between them will result in a net availability of AxB = 99.9 x 99.5 = 99.4% . If you notice, its less than the individual availability of each of these services.

Now, add another service C with availability of 99.5, this leads to a over all availability of 98.9% which is way less for an acceptable SLA.

This example indicates that the more services are chained together in a sync communication, the more it will lead to cascading failures.

Now, add Latency to this calculation and it becomes way more interesting.

Total latency = A + B + C

Synchronous communications are very sensitive towards Availability and latency tradeoff’s

Asynchronous Communication Architectures

Introducing Asynchronous Communications

Asynchronous Communication

Makes use of Message Bus Communication
is a one way communication
Is a fire and forget mode of communication
In RPC, there are always 2 hosts, here there could be more than 2 due to the listener events
The number of hosts could be dynamic without code changes
The messages wait in a queue and may not be delivered immediately.

Good example of the difference between RPC and async using a chat application

Benefits of Asynchronous communication

Faster, no need to wait for the response from all the services. just fire and forget
Simpler redeploys; automatically handle brief downtime. As the messages are in queue, once the service is up, it will read from the queue and start processing.
This is the answer to the cascading failures in sync comm
Absorb spikes in load, helps when the load is more. demand > capacity. This ensures the servies are not overloaded and get the service degraded.
Isolate slow services (hide the flaws in design? :) ) or could be slow third party operations
New services could be easily added without changes to other services as the new listener are dynamic.

Message Bus Fundamentals

Anatomy of a message

It is the smallest unit of data
Payload
Business-relavent data
Headers
Timestamp
Sender Info
Correlation Id
Error handling information
Custom headers

Messages needs to be small

Large messages harm throughput
large messages Opportunity for failure
Size limits needs to be enforced
Offload to blob storage if need to send large message

Messages are published to a topic in a message bus

Topic defines semantics
eg: post liked, post created, chat message sent

Messages published to a topic are routed to a queue. A listener (client) reads from the queue. A consumer can subscribe to multiple topics

All of them could be routed to the same queue
or they could be routed to individual queues

Message Queue

Temporary storage location
Acknowledge to clear (after processing the message)
Supports Crash recovery

Designing Asynchronous Communication Flows

The problem in sync, where we are trying to dea with the failures in notification and chat service can be avoided by having them read to different channels, so that once post is successful then it will post a message to notification and then notification service will decide if this notification needs to be displayed or if it is a duplicate. Even in case of any failures in the chat service, the message is still stored in dead letter queue and can be salvaged later point of time.

Event Driven Architecture - A system that is driven by the production, detection, consumption and reaction to events.

Distributed Transactions

Distributed Transaction - A set of coordinated state changes spanning across a number of different independent services.

Example is a Create Post needs to be processed to

Sent to timeline
Facial Recognition
Photo screening for inappropriate content
Advertising profile
Notification service
Abuse screening
Payment processing (in case of a promoted tweet/post)

All these events needs to be coordinated together and hence called a distributed transaction

This can be achieved with chaining of events, but it has many pitfalls and is unrealistic

Pitfalls of Event chaining for Distributed Transactions

Hard to design for consistency at intermediate steps.
It gets exponentially more complex to implement as the number of events in the design keep growing. Its not realistic
Difficult to change routing without unintended consequences.
What is not all of them are relevant? like payment processing or facial recognition?
Hard to track the pending transactions as there will be many at each stage
How do you even cancel a post in such architecture.

Primary Concerns of Distributed Transactions

Message Routing
Can we make a single service responsible for all the routing rather than multiple small stake holders?
This can also help us address the conditional routing issues like that of promoted tweets
Transaction state
Helps in centralizing the state and track progress
Also solves the cancel problem
Can also add resiliency to handle timeouts or invalid processing
Failure compensation
Helps with resolving scattered chained messaging (one transaction fails and the whole message is stuck)
You get failure in step#4, then we can automatically perform rollback or compensating operations at step 1,2,3
Helps in achieving all or nothing model (but still not 100% as the rollback can fail too)

Saga Pattern and Routing Slip Pattern

Saga Pattern

Allows for distributed transactions
Originally proposed for LLT’s (Long Lived Transactions) that can take hours or days
Was originally for same database
For compensating operations (rollback)
Later extended to distributed systems
To centralize transaction state
For Message Routing
Failure compensation

Saga Pattern

This makes use of a centralized service that takes care of all state changes.
Each message will consist of a correlation id, which will be maintained consistent across all the states
Sends msg for step-1 and updates the DB with step-1-pending-status against the current correlation id
When it receives the response from step-1, then it updates the status of the message and then sends the message to step-2
this process continue for all the steps
Once the final step is completed, it will delete all the records against the correlation id, implying that the final state is reached.
This pattern is driven by receiving messages.
When they receive a message, they transition state and as part of the state transition, they often publish other messages which will lead to further state transition (when we get the reply)
Since this is all controlled centralized, we can control the sequence of state changes without any major code changes (we just need to change the sequence in the saga and the whole flow would be updated accordingly.)
This helps eliminate the complexity which we would have with a message chaining architecture.
If there is a failure, then we state running the compensating events for the previous steps, to revert the distributed transaction.
This ensures we move the system to a state, where the transaction never happened (remember, all-or-nothing)
This is called failure compensation
This patterns also helps us to handle cancellation without too much complexity
Timeouts can be scheduled too
Additionally helps us to handle Parallel processing. But this will increase the complexity exponentially.

Routing Slip Pattern

Inspired from manufacturing
Set of instructions attached to a work item
Which stations to do
What kind of work per station
Which order
At each station
They do the work
Pass the item to the next station.
Rather than depend on a centralized transaction service, this believes in de-centralized (or empowering the individual units ) to hand off the work to the next appropriate state.
Just like a double linked list. Every node knows what it needs to do and who is its prev and next states.
At each step, we can add additional information to the message before we handoff to next step. Hence this pattern servers excellently as message routing.

So, 1> We determine the processing steps in advance 2> Attach them to the message 3> The message gets routed in that sequence

This do not have a centralized transaction state, so we do not know where the message is in the big pile. so cancel could be difficult.
Once the initial slip is compiled, it will not be possible to change the sequence. Saga pattern is more dynamic in that case.

TODO: Save this slide, good differences,

Review: Asynchronous Communications

Asynchronous communications architectures are complex.

They have many benefits (below), but are undoubted complex to maintain. so measure the costs carefully before jumping into.

Resiliency
Decoupled
Load Smoothing
Flexibility

Event Driven architecture tend to exhibit “Eventual Consistency”, where there are temporary inconsistent states.

Developing Features in a Microservices Architecture

Controlling the Quality of a Microservice

In a micro service, if we consider HTTP as the API and the dependencies as Database, message bus and other third party libs. These can be interchanged as the inputs and outputs for a logic bus. i.e. make use of the API’s to control the behavior of the dependencies. When we are able to do that, then we will be able to automate the test of each and every part of our service/code and write unit tests accordingly.

System Boundary - A place in the code where control is passes to or from an external system

TODO: Take snippets from the slide here for the test pyramid. Good representation

Continuous Delivery Techniques

To add new features while the fear of breaking, we can use either

API Versioning
This would be needing more code changes as we need to maintain both code bases.
Used more for non-backward compatible or major changes to the logic.
Onus is on the developer to ensure bugs are fixed and propagated across all the versions
Differentiated either by the version number in the url or in the messages.
This defers the decision making to other services (like our svcorder v2/v3)
Feature toggles
Also called feature flags, where we control the old vs. new flow based on DB driven conditional operators.
This enables us to only turn this on for specific users or user groups.
This is more dynamic compared to API versioning
more tuned for experimental features
Bottom line is, avoid getting stuck waiting for the acceptance from all user groups before we deploy the services.
These techniques, help us to avoid the synchronous deployments of services (that breaks the independence of the services)

Microservice Team Culture

TODO: Nice slide

Managing Microservices Architecture

Characterizing Design Improvement

Good Read: Martin Fowler, “Design Stamina Hypothesis” - continuing to modify code without continuously improving design causes the cumulative functionality over time to look flat, where over time, more and more effort is required to deliver the same sized increment of functionality.

Focus more on Intent-Revealing Design

Emphasizes “why” over “what”
Hides away the irrelevant design

Good design ensures malleability (TODO)

Prioritizing Architectural Improvements

TODO: “Winning is the worst ting that can happen in Vegas” - David Heinemeier Hansson

TODO: “Winning is the worst thing that can happen in Vegas” - David Heinemeier Hansson

Pursuit for “flexibility” is often the reason for software becoming inflexible.

TODO: Good slide. First Derivative - Gregor Hohpe

Influencing Architecture

Prioritizing Architecture

WE often add speculative functionality under the same of “flexibility”
This leads way to maintenance tax and slows down the future delivery
Pursuit of “flexibility” is often the reason for software becoming inflexible
The question to ponder is, “How do we invest in architectural improvements and avoid speculative “flexibility”?
Architect lives in “First Derivative” - read more

Building Resilient Microservices Architectures

Resilience Starts with Quality

Software Quality

External (what can be observed by an user)
Visible to end-users
Correctness
User Experience
Reliability
Absence of errors
Performance
Internal (the state of the software design or architecture; ease to make changes in application)
Visible to programmers
Maintainability
Testability
Observability
Simplicity
Resilience starts with Quality
Attention to both internal and external quality is the foundation for resilience.
Understanding the systems limits is essential for managing resilience

Observable Microservices

Observability - The quality of a system that allows you to understand its internal behavior by inspecting its outputs.

Evaluating Microservices Readiness

Microservices Are a Poor Goal

Microservices are a means to an end. They are the solution for certain kind of problems and are not (and should not be) the goal in themselves

Problems Suited to Microservices

Key metrics for evaluating software development Capabilities

Deployment frequency - how frequently can we deploy changes
On demand
Once a week to a month
More than a month
Lead time for changes - time it takes to build, test and deploy a new increment or functionality
Mean time to recovery (MTTR) - time taken to restore a service
Change failure rate - % of changes delivered to production that cause service degradation or require immediate remediation

Microservices are well suited for challenges of

Scale
Complexity
Governance