Longhorn came with their new update. It is full of surprises. We will peel off one by one to see all the latest updates, features, bug fixes and much more. This one is a much-awaited update, and we will see all of it in a moment. So, without further a due, let’s start.
What is Longhorn?
Longhorn is a CNCF sandbox project that acts as a distributed block storage system for Kubernetes. Kubernetes and container primitives help build Longhorn, and as a result, we refer to it as a cloud native storage. Longhorn is very lightweight, reliable, and robust. We can install Longhorn on an existing Kubernetes cluster with one kubectl apply command or using Helm charts. Once the installation of Longhorn is complete, it adds continuous volume support to the Kubernetes cluster.
Longhorn implements distributed block storage with the help of containers and microservices. Longhorn helps create a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on various nodes.
Though this update mainly focuses on many bug fixes from the previous one, we can see some new features. Version 1.2.0 adds the unique feature of the default recurring backup policy. Also, new features like supporting automatic rebalance of the replicas based on soft anti-affinity and supporting backup encryption get a debut. Also, we get features like supporting CSI volume cloning. The latest adds the volume encryption element and a feature that will speed up rebuilding by getting checksum simultaneously. The new update brings the exciting feature which will enhance the Longhorn data plane on the low-performance environment (spinning disk.), 1GBps network, low CPU etc.). With the new update, we see features like the creation of backing images from volume and upload. We can also see dependency versions of Bump Kubernetes, API version group, and CSI components.
The new update will help in Longhorn Backup refactoring. Several bug fixes also got a special mention. With the latest update, the Volume Attachment Recovery Policy feature will not have conflicts with Pod Deletion Policy. It also fixes the failed rebuilding of a giant replica of 3TB. Recurring jobs will run again when we attach the volume after we detach it for a long time. Another bug got fixed where volumes are not correctly mounted/unmounted when kubelet restarts.
The new release now supports Kubernetes v1.22 and CSI components version update. It updates the minimum supported version of Kubernetes to v1.18 and can also support Kubernetes v1.22 by migrating some deprecated resources to active resource versions and bump the version of CSI components. Besides downloading backing images from remote in v1.1.1, the users can now upload backing images from local and existing volumes. The new update now enhances the Longhorn data plane on a low-performance environment to better support different resource factors (spinning disk, low network bandwidth, low CPU, network latency, etc.).
Longhorn will now support encrypted volumes by utilizing the Linux kernel modules and uses the Kubernetes secret mechanism for essential storage. An encrypted volume results in our data having encryption while in transit as well as at rest. It also means that any backups taken from that volume are also under encryption. Version 1.2.0 will now support CSI volume cloning. It will also support automatic replica rebalancing when node status changes (on/off) based on node/zone soft anti-affinity.
Longhorn’s latest release will support asynchronous backup operations by introducing backup target, backup volume and backup custom resource and controllers to improve the performance issues of backup operations in previous versions. It will also support the concept of recurring jobs and groups by introducing recurring job custom resources and controllers. Users can now create/resue recurring jobs and organize their jobs in groups to apply to volumes.
The new version brings new features in the form of enhancements. A unique feature like changing default log level for CSI sidecars to reduce leader election log spam gets an introduction. Another feature like successful automatic backups should only retain the last snapshot. The new update will now bring a backing image enhancement. We can see another important feature like supporting Read-only volumes (PVC/PV) and mounts. It again brings an enhancement that will add labels for created cron jobs/backup pods for recurring jobs.
The latest update brings a new feature of enabling pprof endpoints in longhorn manager for troubleshooting. It has again added a new feature of WebSocket to support updating BackupVolume and Backup changes. We can see a subsequent shift in the user interface by adding features that will help the user interface show the exact time and not only show a limited number of days as before. Other features like supporting XFS for RWX volumes, UI support for encrypted volumes, and encryption of RWS volumes will help the users.
Now, the application can mount the volume after expanded the volume. The developers have fixed that bug.
The performance of Longhorn gets an update with version 1.2.0 by fixing some of the bugs. Daemon set will not take more time to come up with RWX PVC in this version. Another fix is the not seeming useless, which will full data over WS at per 30 seconds interval. The list order now will cause complete data transfer when the system updates no resource. Again, another fix will cause the list order to have a complete backup image data transfer when there is no updating of any resources. Also, the devs have fixed the bug causing the reduction of Node Status updates by truncating disk StorageAvailable.
Version 1.2.0 comes with many bug fixes that will help the Longhorn users use Longhorn very smoothly. Firstly, we can see the fixing of the investigate engine integration test failure in which the
restore_with_frontend can restore the backup from a replica in mode RW without error. We can also see the fixing of the bug in which the scheduled backups will now be complete and will not leave tons of snapshots behind. Also, the backups will not now fail to complete due to timeout causing duplicate backups. Again, the devs have fixed the bug which gets the /engineUpgrade API error handling. The priority class will now get set for recurring jobs. Also, another bug gets a fixing in which DR restored data will not be faulty in sles15-sp2. The bulk delete for the backing image button will work with this update if few images don’t satisfy the deletion condition.
Again, we can see that the backing image can now get deleted if the key is more than 63 characters. The devs have fixed another bug where we can see that the disconnection replicas can get reattached after a network disconnection. We can see fixing the name column in the event table on the volume detail page, which should be more extensive. The new version brings another fixing where the detach button on the volume detail page will work for a volume attached to a pod. Also, the devs have fixed the bug where UI multiple selections should get a release after acting. Also, Longhorn will be able to provide its own tls certificate for ingress using a helm chart. It will now not face a failure while creating a volume on a retry, unlike version 1.1.0. Again, we see the bug fixing where there will not be missing deepcopy cases for
EngineStatus.Snapshots.Snapshot. Another fixing we can see is the normalization of the pod order for
v.status.KubernetesStatus updates. Also, the devs have fixed the bug where
AnnotateAWSIAMRoleArn modifying lister returned pod instances directly.
With the update of v1.2.0, the volume will not get stuck in detaching during a rebuild or snapshot coalescing (EXT4). The multiple selections of UI will now get a release after performing an action on the backup page. Again, the kubectl drain nodes will not get stuck forever from now on. Another bug got a fixing where we can see the improvement of the usage of Read-only listers, where we can only read. The devs have fixed another bug where there will be no NPE when listing backups of a backup volume. Again, we will not see the dialogue of ‘no data’ during the bulk update of recurring snapshots and backups from this update. Version 1.2.0 will also now not give timeoutSeconds during Liveness or Readiness Probe at Kubernetes v1.20 and further. There will be no failure of backup or restore with AWS IAM Role with NoCredentialProviders. Also, there will not be high CPU utilization for Longhorn managers sometimes due to exhaustion of all the available sockets because of socket leaking log stream. Also, the longhorn-engine image now no longer contains the longhorn-instance manager, and there will not be any restart of Longhorn-csi-plugin pods because of the Longhorn client 10 secs timeout. Again, there will be less CPU utilization for Longhorn and replica instance managers sometimes due to the overwhelming number of backupstatus in the engine.
The latest update now fixes the instance manager grpc connection leak (VersionGet, ProcessLog, ProcessWatch). Also, the devs have fixed another bug at Datastore where we can see a modification of the raw cached object, which gets a return from GetEngine. There will not be any node deletion which will lead volume to get stuck in attaching state. Replica rebuilding will not fail from now on. Also, there will be mentioning of actual size on the volume detail page. The forwarder or proxy will work for HTTPS requests.
The devs have fixed the bug where Longhorn v1.2.0-preview1 was incompatible with old engine image v1.1.2 on backup creation. Also, Longhorn CSI will now validate existing backing images with the fields in StorageClass during volume creation.
Now, it can create a backing image via a YAML file. Now, we can also see the mounting of DR volume after activation when the original volume gets encryption. Also, it will not fail to mount any restoring backup from now on when the initial volume gets an encryption and PV filesystem is xfs.
Also, we will not see any failure of uninstalling jobs where there are many backups. There will not be any listing of invalid backups on the backup page. The backup will also immediately not stop showing the backups on setting from the older target. We will now see the undergoing backup information in the table. We will not see any termination and creation of instance managers while uninstallation is in progress.
We can also see that DR volume will now continue to restore after node reboot. We also see the bug fixing where power off node during backup volume restore does not continue any restoration. The bug at the backing of images also gets a fix where the Upload From Local File with large image file shows Payload Size as an error saying Too Large. Also, there will not be any stuck up of the job during an uninstallation. Lastly, we can also find that there will be no failure of backup and restore due to the size of the BackupVolume as it is empty.
In the end, we can get a glimpse of some of the miscellaneous changes that the devs have adjusted. With the arrival of version 1.2.0, we can see the refactoring of the controller, which will move the LastBackup or LastBackupAt update into the volume controller. The doc search tool now works fully, and we can see the fixing of the newly installed 1.1.0 on Kubernetes and UI editing. There is some addition of valuable docs with Reference Architecture and Sizing Guidelines and documents about how to deal with filesystem corruption.
We have gone through all of the updates that this new version brings. Also, I know you guys also want to try the new Longhorn out. But you have to keep some of the things during the upgradation. We must ensure that our Kubernetes cluster is at least v1.18 before upgrading to Longhorn v1.2.0 because the supported Kubernetes version has been updated (>= v1.18) in v1.2.0. Also, after the upgradation, the recurring job settings of volumes will get migration to new periodic job resources and the RecurringJobs field in the volume spec will get a deprecation. You can easily find the installation guides here, and you can download Longhorn by clicking here.