The Hidden Power of Kubernetes Networking: Beyond Basics

***Key Concepts in Networking

Prerequisite — Switching Routing

Basic Networking Principles:

Switches enable communication within a single network by connecting devices.
Routers interconnect networks, facilitating communication across different subnets.
Gateways serve as access points to external networks, including the internet.

2. Linux Networking Tools:

ip link: View and modify network interfaces.
ip addr: Display or assign IP addresses to interfaces.
route or ip route: View and configure routing tables.

3. Network Namespaces:

Understand the basics of network namespaces in Linux, which isolate network configurations for applications, ensuring better security and performance.

4. DNS and CoreDNS:

Explore DNS fundamentals and how to configure DNS settings on Linux systems.
Gain a basic introduction to CoreDNS, a DNS server often used in Kubernetes.

5. Routing and IP Forwarding:

Routing tables define how packets travel across networks. Use commands like ip route add to add routing entries.
IP forwarding in Linux governs the ability to route packets between interfaces. Enable forwarding via /proc/sys/net/ipv4/ip_forward for private networks.

6. Persistent Changes:

Ensure that network configurations persist across reboots by modifying files like /etc/network/interfaces or /etc/sysctl.conf.

Connecting Networks

Consider a scenario where systems A and B are connected to one subnet (e.g., 192.168.1.0/24), and systems C and D are on another (192.168.2.0/24). A router links these subnets by assigning IPs on both networks (e.g., 192.168.1.1 and 192.168.2.1). Adding routes on each system ensures seamless communication between them.

For example:

Add a route on system A

ip route add 192.168.2.0/24 via 192.168.1.1

. Add a reciprocal route on system C

ip route add 192.168.1.0/24 via 192.168.2.1

To further enable communication, configure the router (e.g., system B) to forward packets between its interfaces.

Advanced Scenarios

When scaling networks or connecting to the internet, additional configurations are required:

Define a default gateway for systems to access external networks


ip route add default via 192.168.1.1

Manage complex setups with multiple routers by specifying distinct routes for internal and external traffic.

***DNS in Linux for Absolute Beginners

Introduction to DNS
DNS (Domain Name System) is critical for resolving hostnames to IP addresses.

Key Concepts:

1. Local Name Resolution:

Hosts can map names to IP addresses locally using the /etc/hosts file.
Example: Adding 192.168.1.11 db to /etc/hosts on system A allows referencing system B as db instead of its IP address.
Note: Entries in /etc/hosts override actual hostnames and can map multiple names to the same system.

Key Insight:
The /etc/hosts file allows multiple aliases for the same host. For example, system B can also be mapped as www.google.com, fooling system A into treating B as Google's server.

2. Centralized Name Resolution:

For scalable environments, managing mappings in /etc/hosts becomes inefficient. A DNS server is used for centralized management.
All hosts are configured to refer to a DNS server via the /etc/resolv.conf file:
Example: nameserver 192.168.1.100.

3. Resolution Order:

Hosts resolve names based on the order specified in /etc/nsswitch.conf:
Default: files (local /etc/hosts) first, then dns (DNS server).
This order can be customized for specific needs.

4. Public and Private DNS:

DNS servers in organizations handle internal and external domains:
Internal: Custom domains like web.mycompany.com or db.mycompany.com.
External: Public domains such as www.google.com are resolved via public DNS servers (e.g., Google's 8.8.8.8).
Forwarding: Internal DNS servers can forward unknown queries to public DNS servers.

5. Domain Names and Subdomains:

DNS uses a hierarchical naming structure:
Top-Level Domains (TLDs): .com, .edu, .org, etc.
Domain Names: google.com is Google's domain.
Subdomains: Group related services, e.g., maps.google.com and drive.google.com.

6. Search Domains:

Use the search entry in /etc/resolv.conf to append domain names automatically:
Example: search mycompany.com prod.mycompany.com resolves web to web.mycompany.com or web.prod.mycompany.com.

7. DNS Record Types:

A Record: Maps a hostname to an IPv4 address.
AAAA Record: Maps a hostname to an IPv6 address.
CNAME Record: Maps one hostname to another, useful for aliases.

8. Tools for DNS Testing:

ping: Tests name resolution using /etc/hosts or DNS.
nslookup: Queries only DNS servers, bypassing /etc/hosts.
dig: Provides detailed DNS query results.

9. Challenges in Large Environments:

DNS servers address scalability issues:
Easier management when IP addresses change.
Cache results for faster lookups.

***Understanding Network Namespaces

Namespaces are like individual rooms within the house, offering privacy to each child (or process). Each “child” can only see what’s inside their room, while the parent (the host) has visibility across all rooms. In the same way, namespaces isolate processes or containers, ensuring they operate independently from the host and from one another.

When you create a container, a network namespace is allocated, isolating its network environment. As far as the container is concerned, it operates as if it were on its own host, with no awareness of the underlying host or other containers. The host, however, has complete visibility and control.

Network Namespace Basics

To demonstrate, we use the command ip netns to create and interact with network namespaces:

Create namespaces: Use ip netns add [namespace_name].
List namespaces: Run ip netns.
Execute commands inside namespaces: Use ip netns exec [namespace_name] [command].

Within a network namespace, you can define virtual network interfaces, ARP tables, and routing rules, fully isolated from the host system.

Connecting Namespaces

Initially, namespaces have no network connectivity. To enable communication:

Create a virtual Ethernet pair: ip link add [interface1] type veth peer name [interface2].
Assign interfaces to namespaces: Use ip link set [interface] netns [namespace_name].
Configure IP addresses: Assign IPs with ip addr add and bring interfaces up with ip link set [interface] up.
Test connectivity: Use ping between namespaces to verify the setup.

Scaling with Virtual Bridges

For multiple namespaces, direct connections become cumbersome. A virtual switch, such as a Linux bridge, simplifies this by acting as a central network hub:

Create a bridge: ip link add [bridge_name] type bridge.
Connect namespaces to the bridge: Use virtual Ethernet pairs as before, attaching one end to the namespace and the other to the bridge.
Assign IPs and enable communication: Each namespace gets a unique IP, enabling communication across the virtual network.

External Network Connectivity

Namespaces are isolated from external networks. To enable external communication:

Assign a gateway: Configure a route in the namespace routing table, pointing to the host interface on the bridge network.
Enable NAT: Use iptables to masquerade traffic from the namespace network with the host’s external IP.

This allows namespaces to access the internet while maintaining isolation.

Incoming Connectivity

To expose services within a namespace (e.g., a web application on port 80):

Port Forwarding: Use iptables to forward traffic from the host's port 80 to the namespace's private IP and port 80.
Optional Routing: Alternatively, configure external hosts with a route to the namespace network through the host.

***Docker Networking:

When you run a container, Docker offers various networking options to suit different use cases. Let’s explore these in detail:

1. None Network

Description: Containers are not attached to any network, ensuring total isolation.
Characteristics:
The container cannot access external networks or be accessed externally.
Multiple containers in this mode cannot communicate with each other.

Use Case: Scenarios where absolute isolation is required, such as debugging or testing.

2. Host Network

Description: Containers share the host’s network stack.
Characteristics:
No network isolation; the container operates as if running directly on the host.
Applications running in the container are accessible on the same port as the host.
Port conflicts occur if multiple containers try to use the same port.

Use Case: High-performance scenarios where bypassing network virtualization is necessary.

3. Bridge Network (Default)

Description: Creates a private internal network connecting the Docker host and its containers.
Characteristics:
Each container gets an IP within the network (e.g., 172.17.0.0/16 by default).
Containers can communicate with each other but require port mapping for external access.

Use Case: Standard deployments requiring isolated container communication and selective external exposure.

Bridge Network: A Deeper Dive

The Bridge Network is the default networking mode in Docker, and understanding its mechanics is essential.

1. Creation and Identification

When Docker is installed, it creates an internal bridge network named Bridge (CLI view) or docker0 (host system).
The bridge network is assigned an IP address (e.g., 172.17.0.1) and acts as:
A switch for containers to communicate internally.
An interface for communication with the host.

You can view these networks using:

docker network ls (lists Docker networks).
ip link (shows the docker0 interface on the host).

2. Attaching Containers to the Bridge Network

For every container:

Network Namespace Creation: Docker creates a namespace for the container.
Virtual Ethernet Pair:

A virtual cable is created, with one end connected to the container’s namespace and the other to the docker0 bridge.
Each interface is assigned an IP address (e.g., container 172.17.0.2).

3. Verification:

Use ip link or ip addr commands to inspect interfaces on the host and container.
Use docker inspect to see namespace details.

Interface Pairing and Attachment

When a container is created:

Namespace Creation: Docker sets up a network namespace for the container.
Virtual Ethernet Pair: A virtual cable with two endpoints is established:

One end connects to the container’s namespace.
The other end connects to the bridge network on the host (docker0).

3. Numbering System: Interface pairs are identified by their numerical sequence (e.g., 9–10, 11–12).

4. Inter-Container Communication: Once connected, all containers within the same bridge network can communicate directly.

Port Mapping and External Access

By default, containers in the bridge network are isolated from external access. To enable external communication, port mapping is used.

Default Scenario:

A container (e.g., running an Nginx web application on port 80) is only accessible:
From other containers in the same network.
From the Docker host itself.
External users cannot access the application.

2. Enabling External Access:

Use the port publishing option to map a host port to a container port during container creation.
Example: -p 8080:80 maps port 8080 on the Docker host to port 80 in the container.

Outcome:

External users can access the application using the Docker host’s IP and the mapped port (e.g., http://<host-ip>:8080).
Traffic to port 8080 on the host is forwarded to port 80 in the container.

Mechanics of Port Forwarding

Docker implements port forwarding using NAT (Network Address Translation).

How it Works:
A NAT rule is created in the iptables PREROUTING chain, specifying:
Source Port: The port on the Docker host (e.g., 8080).
Destination: The container’s IP address and port (e.g., 172.17.0.3:80).
Docker dynamically manages these rules and adds them to the Docker-specific chain in iptables.

Verification:

Use iptables commands to view and inspect the rules Docker creates.

***Container Networking Interface(CNI)

Understanding CNI

The Container Networking Interface (CNI) defines a set of standards for creating networking plugins and integrating them with container runtimes.

Plugins: Programs that handle container networking (e.g., connecting a container to a network).
Bridge Plugin Example: A plugin named Bridge performs tasks such as assigning IPs, attaching containers to namespaces, and managing routing.

CNI’s Responsibilities

For Container Runtimes:

Create a network namespace for each container.
Identify the networks to which a container should connect.
Invoke the plugin when a container is created or deleted, using:
ADD command (on container creation).
DEL command (on container deletion).
Configure plugins using a JSON file.

2. For Plugins:

Support commands like ADD, DEL, and CHECK, which accept parameters such as container ID and network namespace.

Handle tasks such as:

Assigning IP addresses to containers.
Configuring routes to enable communication between containers.
Return results in a standardized format.

As long as both runtimes and plugins adhere to these standards, they work seamlessly together, ensuring compatibility across platforms.

Built-In Plugins and Third-Party Integrations

CNI includes several prebuilt plugins, such as:

Bridge
VLAN, IP VLAN, MAC VLAN
Windows-specific plugins
IPAM Plugins like Host Local and DHCP

Additionally, third-party plugins, such as Weave, Flannel, Calico, Cilium, and VMware NSX, implement the CNI standard, enabling diverse networking capabilities.

CNI vs. Docker’s CNM

While many container runtimes (e.g., Kubernetes, Mesos) adopt the CNI standard, Docker uses its own networking model, the Container Network Model (CNM).

Differences:

CNM provides an alternative approach to container networking, making Docker incompatible with CNI plugins by default.
As a result, Docker containers cannot natively use CNI plugins.

Workaround:

Create a Docker container with no network configuration.
Manually invoke the desired CNI plugin (e.g., Bridge).

This is precisely how Kubernetes interacts with Docker:

Kubernetes creates containers with no network (none).
Configured CNI plugins handle the rest of the networking setup.

***Cluster Networking

Basic Networking Requirements

A Kubernetes cluster consists of master and worker nodes, and each node must meet the following criteria:

Network Interface:

Each node should have at least one network interface connected to a network.
The interface must have a unique IP address.

2. Host Identity:

Each node must have:

A unique hostname.
A unique MAC address for its network interface
Important: If you created the nodes by cloning existing virtual machines, ensure these identifiers are updated to avoid conflicts.

Required Open Ports

To facilitate communication between Kubernetes components, certain ports need to be open:

API Server (Port 6443):

The master node’s API server listens on this port.
Used by:
Worker nodes.
kubectl tool.
External users.
Other control plane components.

2. Kubelet (Port 10250):

Kubelet agents on both master and worker nodes listen on this port.
Note: The master node may also run a kubelet.

3. Scheduler (Port 10259):

The kube-scheduler component requires this port to be open.

4. Controller Manager (Port 10257):

The kube-controller-manager requires this port for communication.

5. Worker Node Service Ports (30000–32767):

These ports are used to expose services on worker nodes for external access.

6. ETCD Server Ports:

Port 2379: Used by ETCD server to listen for incoming requests.
Port 2380: Required for communication between ETCD instances in multi-master configurations.

Considerations for Multi-Master Clusters

If you have multiple master nodes, ensure all the ports listed above are open on every master node. This is especially important for ETCD communication (port 2380).

Network Security Configurations

When configuring networking for your Kubernetes cluster, keep these considerations in mind:

Firewall Rules: Ensure the necessary ports are allowed through firewalls.
IP Tables Rules: Review and configure rules to permit communication between nodes.
Cloud Security Groups: If deploying on platforms like AWS, Azure, or GCP, adjust the network security groups accordingly.

If something is not working as expected, checking these configurations is a good starting point during troubleshooting.

***Pod Networking

Understanding the Need for Pod Networking

In Kubernetes, the network setup that connects nodes together is distinct from the networking required for pods. As the cluster grows to accommodate numerous pods and services, key questions arise:

How are pods addressed?
How do they communicate with each other?
How can services running on these pods be accessed, both internally and externally?

Kubernetes does not provide a built-in solution for these challenges. Instead, it expects users to implement a networking model that meets specific requirements.

Kubernetes Pod Networking Requirements

Kubernetes defines the following expectations for pod networking:

Unique Pod IPs: Every pod must have a unique IP address.
Intra-Node Communication: Pods within the same node should be able to communicate using their IP addresses.
Inter-Node Communication: Pods on different nodes must also communicate seamlessly using the same IP addresses, without requiring Network Address Translation (NAT).

The specifics of IP ranges and subnets are left to the implementer, provided these criteria are satisfied.

Implementing Pod Networking

While there are several solutions available to address these requirements, such as Container Network Interface (CNI) plugins, understanding the fundamentals helps in grasping how these solutions work.

Setting the Foundation

Let’s begin with a three-node cluster where each node has an external IP address (e.g., 192.168.1.11, 192.168.1.12, 192.168.1.13). The nodes participate in a common network, and Kubernetes creates network namespaces for containers (pods) upon their creation. To facilitate communication, we:

Create a Bridge Network: A bridge network is established on each node to interconnect the containers.
Assign IP Subnets: Each node’s bridge network is assigned a unique subnet (e.g., 10.240.1.0/24 for Node 1, 10.240.2.0/24 for Node 2, etc.).

Connecting Containers

When a container is created:

A virtual network cable (veth pair) is generated using ip link add.
One end is attached to the container, and the other to the bridge using ip link set.
An IP address is assigned to the container using ip addr.
A default route is configured to the bridge network’s gateway.

This process is repeated for each container, enabling them to communicate within the same node.

Inter-Node Communication

To enable communication between nodes:

Routes are added to each node’s routing table, directing traffic to the appropriate subnet of other nodes. For example:

Node 1 routes traffic to 10.240.2.0/24 via Node 2’s external IP (192.168.1.12).

Alternatively, a central router can manage these routes, simplifying configuration as the network grows.

Unified Network Architecture

The individual bridge networks now form a single logical network (e.g., 10.244.0.0/16), enabling seamless pod-to-pod communication across nodes.

Automation with CNI

Manual configuration of pod networking is impractical for large-scale environments. This is where CNI (Container Network Interface) comes into play.

CNI Overview:
CNI provides a standard framework for networking containers.
It ensures that when a pod is created, a pre-defined script is executed to handle network setup.
Script Adaptation:
The script must adhere to CNI standards, including:
An add section for connecting containers to the network.
A delete section for cleaning up resources when containers are removed.

When a container is created, the container runtime (e.g., Docker, CRI-O) uses the CNI configuration to locate and execute the script, automating the networking process.

***Container Networking Interface(CNI) in Kubernetes:

CNI Responsibilities in Kubernetes

CNI defines the responsibilities of the container runtime. For Kubernetes, this means:

Creating container network namespaces.
Attaching these namespaces to the appropriate networks by invoking the specified CNI plugin.

The container runtime, such as Containerd or CRI-O, is the Kubernetes component responsible for creating containers and triggering the appropriate network plugin after a container is created.

Configuring Kubernetes to Use CNI Plugins

CNI Plugin Installation

Plugin Directory:

All network plugins are installed in /opt/cni/bin.
This directory contains executable files for each plugin, such as bridge, flannel, dhcp, etc.

2. Configuration Files:

Plugin configurations are stored in /etc/cni/net.d.
Each file defines how a specific plugin should be used.
If multiple configuration files exist, the container runtime selects the one with the alphabetically first name.

Key Elements:
name: The name of the network (e.g., "mynet").
type: The plugin type (e.g., "bridge").
isGateway: Determines whether the bridge interface gets an IP address to act as a gateway.
ipMasq: Enables NAT rules for IP masquerading.
ipam: Handles IP address management.
type: Indicates whether IPs are managed locally (host-local) or by an external DHCP server.
subnet: Defines the IP address range for the network.

This configuration ties together concepts like bridging, routing, NAT, and IP allocation, which were covered in earlier lectures.

Key Takeaways

Kubernetes relies on the container runtime (e.g., Containerd, CRI-O) to invoke CNI plugins and manage pod networking.
CNI plugins and their configurations are managed through standard directories:

Executables: /opt/cni/bin
Configurations: /etc/cni/net.d

3. The CNI configuration file defines the network type, IP allocation, routing rules, and NAT behavior.

***Understanding Weaveworks Weave CNI Plugin in Kubernetes

Challenges of Scaling Pod Networking

In the basic networking solution we built earlier, a routing table mapped networks across nodes. When a packet was sent between pods on different nodes, it traversed the network using this routing table. While this approach works in small, simple environments, it becomes impractical as the scale increases.

Scaling Limitations

Routing Table Size: Managing a large number of routes becomes inefficient as clusters grow to hundreds of nodes with thousands of pods.
Complexity: Maintaining accurate routing tables in large, dynamic environments is challenging.

Analogy: Scaling Office Communication

To simplify the concept, imagine the Kubernetes cluster as a company, where:

Nodes are different office sites.
Pods are departments or offices within each site.

Initially, employees (pods) sent packages directly to other offices using local couriers, relying on addresses and direct routes. As the company expanded globally, this approach failed due to logistical complexities.

Solution: A professional shipping company was engaged to streamline communication:

Agents: Representatives were placed at each site to manage shipments.
Coordination: Agents communicated to share information about all sites, departments, and offices.
Encapsulation: Packages were re-wrapped for efficient transport between sites.
Delivery: Agents ensured packages reached the correct department upon arrival.

How Weaveworks Solves Networking Challenges

Weaveworks employs a similar model:

Weave Agents: A Weave agent (or peer) is deployed on each Kubernetes node. These agents exchange information about nodes, networks, and pods to maintain a complete topology of the cluster.
Bridge Network:

Weave creates a bridge network named weave on each node.
IP addresses are assigned to the network and its pods dynamically.

Packet Flow in Weave

Source Node: When a packet is sent from a pod, Weave intercepts it and identifies the destination pod.
Encapsulation:
The packet is encapsulated into a new one with updated source and destination information.
Transmission: The encapsulated packet is sent across the network to the destination node.
Destination Node:
The Weave agent on the receiving node decapsulates the packet and forwards it to the correct pod.

This approach removes the need for extensive routing tables, ensuring efficient and scalable communication in large clusters.

Weave Deployment in Kubernetes

Deployment Methods

Manual Deployment:

Weave agents can be installed as standalone services or daemons on each node.

2. Kubernetes Integration:

The preferred method is deploying Weave as pods in the Kubernetes cluster using DaemonSets.

Why DaemonSets?

A DaemonSet ensures that a specific pod type is deployed on all nodes in the cluster. For Weave, this guarantees that each node runs a Weave agent, facilitating seamless communication.

Deployment Process

With a kubectl apply command, Weave can be deployed in a running Kubernetes cluster.
This command sets up:
Weave agents (peers) as a DaemonSet.
Additional components necessary for the plugin’s functionality.

Post-Deployment Observation

If your cluster is set up using tools like kubeadm, you can verify the Weave installation:
Use kubectl get pods -n kube-system to list the Weave pods.
Each node will have a Weave pod deployed.

Troubleshooting Weave

For debugging:

Use kubectl logs to check logs from Weave pods.
Investigate the configuration files and IP address allocation to identify potential issues.

***IP Address Management- Weave

Who Manages IP Address Allocation?

As per the Container Network Interface (CNI) standards, the responsibility for managing IP address assignment lies with the CNI plugin or the network solution provider.

IP Management in a Basic CNI Plugin

Previously, we implemented a custom CNI plugin, where we directly handled the assignment of IP addresses within the plugin itself. This included managing the allocation of IPs to container network namespaces.

While Kubernetes itself doesn’t dictate how IPs should be assigned, it mandates:

No duplicate IPs across the network.
Proper management of IP allocations.

Approaches to IP Address Management

Local File-Based Management

A straightforward method to manage IPs is to:

Maintain a list of allocated IP addresses in a file stored locally on each host.
Include logic in the CNI plugin script to read, update, and validate entries in this file.

CNI’s Built-in Plugins

Instead of implementing manual logic, CNI provides built-in IPAM plugins to handle this task.

Host-Local Plugin:

Assigns IPs locally on each host, ensuring no overlap.
Requires the CNI plugin script to invoke it explicitly.

2. Dynamic Configuration:

CNI configuration files contain an IPAM section specifying the type of plugin, subnet, and routes to be used.
A dynamic script can read these configurations and invoke the appropriate plugin.

Integration with Network Providers

Each network solution provider has its unique method of managing IPs. For example, let’s look at how Weave works Weave handles this process.

Weave works Weave IP Address Management

Weave uses an efficient and scalable approach to IP allocation:

Default IP Range:

By default, Weave allocates the range 10.32.0.0/12 to the cluster. This provides approximately one million usable IPs, from 10.32.0.1 to 10.47.255.254, for pod networking.

2. IP Range Distribution:

Weave peers (agents) on each node split the IP range equally among themselves.
Each node is assigned a subset of the IP range for use by its pods.

3. Configuration Options:

The default range is configurable. Additional parameters can be specified during the deployment of the Weave plugin, allowing customization based on the cluster’s requirements.

Key Takeaways

CNI Plugins are responsible for IP allocation and management.
Kubernetes does not enforce specific methods but requires unique IPs across the cluster.
Weave uses a distributed approach for efficient IP range allocation, making it scalable for large clusters.
IPAM configurations can be customized through CNI configuration files and dynamic scripting.

***Understanding Service Networking in Kubernetes:

What are Services?

A service is an abstraction that provides a stable network endpoint to access applications running in pods.
Each service is assigned:
A unique IP address.
A name (for DNS-based access).

Types of Services

Cluster IP (Default):

Makes the service accessible only within the cluster.
Ideal for internal applications like databases.

2. Node Port:

Extends accessibility outside the cluster by exposing the service on a specific port of all nodes.
Useful for external-facing applications.

How Services Work in Kubernetes

Cluster-Wide Accessibility

Unlike pods, which are tied to specific nodes, services exist across the cluster. They are:

Virtual objects, meaning they do not involve actual processes, namespaces, or interfaces.
Managed entirely by Kubernetes components such as kube-proxy.

Service IP Address Assignment

Kubernetes assigns service IPs from a predefined range specified by the --service-cluster-ip-range flag in the API server.
Example:
Service CIDR: 10.96.0.0/12
Pod CIDR: 10.244.0.0/16
The ranges must not overlap to avoid conflicts between pod and service IPs.

Role of kube-proxy

The kube-proxy component:

Runs on every node.
Monitors the Kubernetes API server for service-related events.
Configures the network to route traffic to the appropriate pod(s).

How kube-proxy Operates

When a service is created, kube-proxy:

Retrieves the service’s assigned IP address and port.
Sets up forwarding rules on each node to map service IP traffic to pod IPs.

2. Supported Modes:

iptables (Default): Uses NAT rules to redirect traffic to backend pods.
IPVS: Implements a scalable, performance-optimized load balancer.
User space: Legacy mode, listens on service ports and proxies requests to pods.

The default mode is iptables, but it can be customized using the --proxy-mode option.

Example: Cluster IP Service

Scenario

A pod named db runs on Node 1 with IP 10.244.1.2 and exposes a database on port 3306.
A ClusterIP service is created to make this pod accessible within the cluster.

Process

Kubernetes assigns the service an IP, e.g., 10.103.132.104.
kube-proxy adds DNAT rules to iptables:

Incoming traffic to the service IP 10.103.132.104:3306 is redirected to the pod IP 10.244.1.2:3306.

Verification

Use iptables to view NAT table entries.
Rules created by kube-proxy include comments with the service name for easy identification.

Example: NodePort Service

Scenario

A web application is hosted on a pod and requires external access.
A NodePort service is created, exposing the application on a specific port across all nodes.

Process

kube-proxy sets up iptables rules to forward traffic from the node’s port to the pod’s IP and port.
External users access the application using the <NodeIP>:<NodePort> combination.

Logging and Troubleshooting

To troubleshoot kube-proxy configurations:

Check the kube-proxy logs for details about:

Proxy mode in use (e.g., iptables, IPVS).
Service creation or deletion events.

2. Ensure appropriate log verbosity levels for detailed insights.

Key Considerations

Separate CIDR Ranges: Ensure the service and pod CIDR ranges do not overlap.
Dynamic Updates: kube-proxy dynamically updates iptables rules when services are added or removed.
Virtual Nature: Services are purely conceptual, relying on forwarding rules for functionality.

***Cluster DNS

DNS in Kubernetes Clusters:

Kubernetes deploys a built-in DNS server (usually Core DNS) by default. This server is responsible for managing name resolution within the cluster, enabling seamless communication between components like pods and services.

Key Points:

All pods and services have unique IP addresses.
The cluster networking is correctly configured.
DNS primarily focuses on pods and services; nodes and their external DNS configuration are outside the scope of this lecture.

Example: DNS for Pods and Services

Scenario Setup

A test pod with IP 10.244.1.5.
A web pod with IP 10.244.2.5.
A web-service created to expose the web pod, assigned IP 10.107.37.188.

Service DNS Records

When the service is created, Kubernetes automatically creates a DNS record for it.

The service name (web-service) is mapped to its IP address (10.107.37.188).

2. Pods in the same namespace (default) can access the service using its name, web-service.

Understanding Namespaces and DNS Resolution

Within the Same Namespace

Services and pods can be accessed using their first names (e.g., web-service).

Across Namespaces

To access a service in a different namespace:
Use the format <service-name>.<namespace> (e.g., web-service.apps).

Subdomains and Fully Qualified Domain Names (FQDN)

Kubernetes organizes services and pods within subdomains based on namespaces.
Services are grouped into an additional subdomain, svc.
By default, the cluster’s root domain is cluster.local.
A service’s FQDN is structured as:

<service-name>.<namespace>.svc.cluster.local
Example: web-service.apps.svc.cluster.local

DNS for Pods

Pod Records

Pod DNS records are not created by default but can be enabled explicitly.
Kubernetes does not use pod names for DNS. Instead, it generates a name by replacing dots in the pod’s IP address with dashes.

Pod FQDN

A pod’s DNS record includes:

The converted IP address as the hostname.
Namespace as a subdomain.
Root domain set to cluster.local.

Example:

Pod IP: 10.244.1.5
DNS record: 10-244-1-5.default.pod.cluster.local

This record resolves to the pod’s IP address.

How DNS Facilitates Communication

Services

Services provide stable names and IPs for accessing applications hosted on pods.
DNS ensures that any pod can access a service using its name, even if the service’s backend pods change.

Pods

While direct pod-to-pod communication using DNS is possible, it’s less common. Services are typically used as intermediaries to abstract and manage backend pods.

***Core DNS in Kubernetes:

How Kubernetes Implement DNS?

In large clusters where thousands of pods are dynamically created and destroyed, Kubernetes centralizes DNS resolution through a dedicated DNS server.

Centralized DNS Server in Kubernetes

DNS Resolution Mechanism:

Pods are configured to point to a centralized DNS server via their /etc/resolv.conf file.
The DNS server’s IP (typically 10.96.0.10) is specified as the nameserver.

2. Automated DNS Management:

When a pod or service is created, Kubernetes updates the DNS server with the appropriate DNS records.
Each pod’s /etc/resolv.conf is automatically configured to use the DNS server.

3. DNS Record Management:

Services get DNS records automatically.
Pods, however, only receive records if explicitly enabled via the Core DNS configuration (disabled by default).

Core DNS: The Default Kubernetes DNS Server

Starting with Kubernetes 1.12, Core DNS replaced the legacy kube-dns as the default DNS server. Core DNS is deployed within the cluster as:

A replica set with two pods for redundancy.
Located in the kube-system namespace.

CoreDNS Configuration

Core file:

Core DNS uses a configuration file named Corefile, located at /etc/coredns.
This file specifies plugins to handle various functionalities, such as:
Errors: Logging and handling DNS errors.
Health: Reporting the DNS server’s health.
Cache: Caching DNS responses.
Kubernetes Plugin: Integrates CoreDNS with the Kubernetes cluster.

2. Kubernetes Plugin:

Configures the cluster’s top-level domain, e.g., cluster.local.
Manages DNS records for services and optionally for pods.
Converts pod IP addresses into dashed hostnames when the pods option is enabled.

3. Forwarding Requests:

If a DNS query cannot be resolved within the cluster, CoreDNS forwards it to the nameserver specified in its /etc/resolv.conf.

4. ConfigMap Integration:

The Corefile is managed via a ConfigMap, allowing dynamic updates to CoreDNS configuration without redeploying the pods.

DNS Configuration for Pods

Service DNS Records:

Kubernetes automatically creates a service for Core DNS, typically named kube-dns.
This service assigns an IP (e.g., 10.96.0.10), which pods use to resolve names.

2. Kubelet’s Role:

The kubelet is responsible for configuring each pod’s DNS settings when it is created.
The DNS server’s IP and the cluster domain are included in the kubelet’s configuration file.

Accessing Services and Pods

Service Name Resolution

Services can be accessed using various formats:
web-service
web-service.default
web-service.default.svc
web-service.default.svc.cluster.local (FQDN)

Search Domains

The /etc/resolv.conf file on pods includes search domains such as:
default.svc.cluster.local
svc.cluster.local
cluster.local

These allow flexible resolution of service names without needing the full FQDN. For example, querying web-service will automatically append default.svc.cluster.local for resolution.

Pod Name Resolution

Pods do not get short-name DNS entries by default.
For pod name resolution, you must use the full FQDN (e.g., 10-244-1-5.default.pod.cluster.local).

How DNS Works in Practice

When a pod makes a DNS query:

The query is sent to the Core DNS service via the configured nameserver.
Core DNS checks its database for a matching record.

2. If a pod queries an external domain (e.g., www.google.com):

Core DNS forwards the request to an external DNS server defined in its /etc/resolv.conf.

3. For internal service queries (e.g., web-service):

Core DNS resolves the name to the service’s IP and returns it to the querying pod.

***Ingress

Understanding Ingress in Kubernetes

What is the difference between Services and Ingress, and when should you use each?

Scenario: Deploying an Application

Imagine you’re deploying an online store application for a company that sells products. The application needs to be accessible at myonlinestore.com. Let’s break down the steps:

Application Deployment:

Package your application as a Docker image and deploy it as a Pod in a Kubernetes Deployment.
The application uses a MySQL database, deployed as a Pod and exposed internally using a ClusterIP Service named mysql-service.

2. External Access:

To expose your application to the outside world, create a NodePort Service, which makes the application available on a high port (e.g., 30880) on all cluster nodes.
Users can now access your application using a URL like http://<Node_IP>:30880.

3. Scaling:

Increase the number of Pod replicas to handle more traffic. The Service ensures traffic is distributed among these replicas.

Challenges with NodePort Services

DNS and Port Management:

Manually managing IPs or high ports (e.g., 30880) isn’t user-friendly.
Configure your DNS to point to the Node’s IP, but users still need to include the port in the URL.

2. Proxy for Better Access:

Introduce a proxy server to route traffic on port 80 (standard HTTP) to the NodePort.
Users can now access your application at myonlinestore.com without specifying a port.

3. Cloud Environments:

In a public cloud, instead of using NodePort, you can create a LoadBalancer Service.
Kubernetes requests the cloud provider (e.g., GCP) to provision a load balancer that routes traffic to your application.

Scaling Applications with Multiple Services

As the business grows, new services (e.g., video streaming at myonlinestore.com/watch) are introduced:

Deploy the new service as a separate Deployment and expose it using another LoadBalancer Service.
Each LoadBalancer incurs additional cost, and managing multiple external IPs becomes cumbersome.

To simplify this, you need a centralized mechanism to:

Route traffic to different services based on the URL (e.g., /watch or /shop).
Enable SSL for secure communication.

Ingress to the Rescue

Ingress provides a unified solution:

Acts as a Layer 7 load balancer that can route traffic based on URL paths or domain names.
Manages SSL termination in one place.
Is defined using Kubernetes primitives, making it easy to version-control and manage alongside other Kubernetes resources.

How Ingress Works

Ingress Controller:

Ingress is powered by an Ingress Controller, which monitors the cluster for Ingress Resources (rules) and configures itself accordingly.
Popular Ingress Controllers include NGINX, Traefik, and GCE (Google’s HTTP Load Balancer).

2. Ingress Resource:

Specifies routing rules, such as forwarding traffic to specific services based on URLs or hostnames.

3. Configuration Example:

Route myonlinestore.com/shop to the shop-service and myonlinestore.com/watch to the video-service.
Enable SSL and define a default backend for unmatched traffic (e.g., a 404 error page).

Deploying an Ingress Controller

To set up an NGINX Ingress Controller:

Deploy the controller as a Deployment with the necessary image and configuration.
Expose it using a NodePort or LoadBalancer Service for external access.
Create a ConfigMap to manage NGINX settings.
Provide a ServiceAccount with the required permissions to allow the controller to monitor and manage Ingress Resources.

Configuring Ingress Resources

Simple Routing:

Route all traffic to a single service.

2. Path-Based Routing:

Define multiple paths within a single rule to route traffic to different services.
Example:
/shop → shop-service.
/watch → video-service.

3. Hostname-Based Routing:

Use multiple rules to route traffic based on domain names.
Example:
shop.myonlinestore.com → shop-service.
watch.myonlinestore.com → video-service.

🎉🎉In conclusion, Kubernetes redefines networking for modern application architecture with its innovative approach to services, load balancing and traffic management, empowering us to build scalable, secure and user-friendly applications.