My name is Philipp C. Heckel and I write about nerdy things.

Posts Categorized / Distributed Systems


  • Oct 19 / 2021
  • Comments Off on Lossless MySQL semi-sync replication and automated failover
Distributed Systems

Lossless MySQL semi-sync replication and automated failover

MySQL is a really mature technology. It’s been around for a quarter of a century and it’s one of the most popular DBMS in the world. As such, as an engineer, one expects basic features such as replication and failover to be fleshed out, stable and ideally even easy to set up.

And while MySQL comes with replication functionality out of the box, automated failover and topology management is not part of its feature set. On top of that, it turns out that it is rather difficult to not shoot yourself in the foot when configuring replication.

In fact, without careful configuration and the right tools, a failover from a source to a replica server will almost certainly lose transactions that have been acknowledged as committed to the application.

This is a blog post about setting up lossless MySQL replication with automated failover, i.e. ensuring that not a single transaction is lost during a failover, and that failovers happen entirely without human intervention.

Continue Reading

  • Nov 19 / 2019
  • Comments Off on Providing remote access to devices via SSH tunnels
Cloud Computing, Distributed Systems, Programming

Providing remote access to devices via SSH tunnels

At my work, the backup appliances are typically physically located inside the LAN of our end users — much like other appliances such as routers, NAS devices or switches. Under normal circumstances that means that they are behind a NAT and are not reachable from the public Internet without a VPN or other tunneling mechanisms. For my employer’s customers, the Managed Service Provider (MSP), only being able to access their devices with direct physical access would be a major inconvenience.

Fortunately we’ve always provided a remote management feature called “Remote Web” for our customers: Remote Web lets them remotely access the device’s web interface as well as other services (mainly RDP, VNC, SSH), even when the device is behind a NAT.

Internally we call this feature RLY (pronounced: “relay”, like the owl, get it?). In this post, I’d like to talk about how we implemented the feature, what challenges we faced and what lessons we learned.

Continue Reading

  • Feb 14 / 2014
  • 7
Cloud Computing, Programming, Synchronization

Deep into the code of Syncany – command line client, application flow and data model (part 2)

I recently published a blog post about my open source file sync project Syncany. I explained the main idea of the project and also went into some of the details about where the development is headed. The post was the first of a series I am planning to write — showing what the project is about from different angles.

While the first post had a few technical elements, it mostly discussed the project’s process and its high level goals and ideas. In this second article, I’d like to go beyond the high level concepts and go a lot deeper into the different packages and modules of the software. Why, you ask? Because I think it might be interesting of others and because I believe that supporters and other developers will benefit from it.

Continue Reading

  • Oct 18 / 2013
  • 30
Cloud Computing, Programming, Security, Synchronization

Syncany explained: idea, progress, development and future (part 1)

Many many moons ago, I started Syncany, a small open source file synchronization project that allows users to backup and share certain folders of their workstations using any kind of storage, e.g. FTP, Amazon S3 or Google Storage.

At the time of the initial annoucement of the project (May 2011), there was a big hype around it. I received many e-mails and lots of support from people around the world. People were excited because the features Syncany offers are great: File synchronization à la Dropbox, paired with storage flexibility (use-your-own), client-side encryption (sorry about that, NSA!), and intelligent versioning.

At the time, I didn’t actually release a runnable version of Syncany. The sole purpose of the announcement (on WebUpd8 and on the Ubuntu Podcast) was to get developers excited about the project in order to get help for the last steps of creating a somewhat stable release. Unfortunately, I was further away from this “stable release” than I could have imagined.

In this blog post, I’d like to recap the idea behind Syncany, what went wrong with the development, and how I brought the project back on track (or so I believe). I’ll also talk about what I plan to do with Syncany and how people can help (if they still want).

Continue Reading

  • May 20 / 2013
  • 3
Cloud Computing, Distributed Systems, Security, Synchronization

Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example

This post introduces my Master’s thesis “Minimizing remote storage usage and synchronization time using deduplication and multichunking: Syncany as an example”. I submitted the thesis in January 2012, and now found a little time to post it here.

The key goal of this thesis was to determine the suitability of deduplication for end-user applications — particularly for my synchronization application Syncany. As part of this work, the thesis introduces Syncany, a file synchronizer designed with security and provider independence as a core part of its architecture.

Continue Reading

  • Jun 15 / 2012
  • 9
Cloud Computing, Linux, Scripting

Script: Your US proxy server in one minute using Amazon EC2

Many of the well known websites determine your location based on your IP address and restrict their content or functionalities based on the country you’re in. Some examples are Gmail (Germans get only @googlemail.com-addresses, legal reasons), YouTube (content is restricted by the GEMA), and Pandora (limited to US citizens) to name only a few. To circumvent these restrictions, being able to quickly get an IP address outside of your own country is most helpful.

To do exactly that I wrote a little script that will start your very own US proxy server in one minute using Amazon EC2. In combination with browser plug-ins such as FoxyProxy, the script enables you to route all your web traffic through a proxy on an Amazon-owned machine — with an IP address in the US, Ireland, Singapore, Tokyo or Sao Paulo (location of Amazon data centers).

Continue Reading

  • May 08 / 2010
  • Comments Off on Hybrid Clouds: A Comparison of Cloud Toolkits
Distributed Systems, Virtualization

Hybrid Clouds: A Comparison of Cloud Toolkits

In the last few years, the importance of the Internet has risen constantly and made it indispensable for businesses and most individuals to be on-line around the clock. One of the greatest drivers of this development was and still is the shift of the traditional one-to-many Web to an advanced, participatory version of the Word Wide Web. Rather than only making editorial information accessible to many users, the Web 2.0 encourages participation and enables user generated contributions. Leveraging this new paradigm, services like Flickr, Facebook, or Twitter have become very prominent examples for this development.

An essential part of this evolution, but mostly hidden to the end-consumer, is the set of tools that enable these large scale applications. Cloud computing is a relatively new technology that serves as underlying architecture for most of these platforms. By providing virtualized computing resources as a service in a pay-as-you-go manner, cloud computing enables new business models and cost effective resource usage. Instead of having to maintain their own data center, companies can concentrate on their core business and purchase resources when needed. Especially when combining a privately maintained virtual infrastructure with publicly accessible clouds in a hybrid cloud, the technology can open up new opportunities for businesses and help consolidating resources.
However, since cloud computing is a very new term, there are as many definitions of its components as there are opinions about its usefulness. Most of the corresponding technologies are only a few years old and the toolkits lack of maturity and interoperability.

This article introduces the basic concepts of cloud computing and discusses the technical requirements for setting up a hybrid cloud. It briefly looks into security concerns and outlines the status quo of current cloud technologies. In particular, it evaluates several existing cloud toolkits regarding its requirements, occurring problems and interoperability.

Continue Reading

  • Mar 16 / 2009
  • 3
Distributed Systems, Programming

KadS: a secure version of the Kademlia protocol

There are various peer-to-peer protocols out there. All of them focus the decentralisation of storage and other system resources. Most implement a distributed hash table (DHT) to store information. That is, each node of the network only holds a small part of the hash table but is able to locate and retrieve any requested entry. Kademlia, a protocol designed by two NYU students in 2002, is one of them.

Continue Reading

  • Nov 01 / 2008
  • Comments Off on Server Virtualization with VMware Infrastructure (vSphere)
Distributed Systems, Virtualization

Server Virtualization with VMware Infrastructure (vSphere)

In the last few years, the Internet has become increasingly important in various fields of our lives. Not only personal households have discovered the nearly endless possibilities of the Web, but also companies found many different ways of gaining revenue through the online world. Most of the global players and many medium-sized IT companies have realized what opportunities the Web and its technologies provide and used them to build up new services for consumers and businesses. In order to compete with the evolving market, companies of traditional business areas such as newspapers or TV broadcasting companies had to diversify their product lines and are forced to react in a fast, flexible and cost efficient way on every day’s changes of demands and technologies. In fact, every company has to adapt these technologies efficiently to have a chance in the growing market.

As it brings its benefits, cost savings and new customers, every new technology also comes with the more or less known downsides. Even if IT managers are qualified to consider most of the details in how to use and implement them, new software, hardware or resources will – no matter what – always cause unpredicted problems. Due to the IT dependence of today’s companies, every downtime, bug or system overload of a production system directly results in declining profits and higher costs. Especially for service providers, every downtime is business critical to many dependent companies and has to be prevented.

Therefore, companies spend a considerably high amount of money and time to create a stable, flexible and extensible IT environment that supports their business by minimizing risks, increasing availability and allowing to provide better service levels to customers.

Virtualization is a key technology that addresses to achieve these goals. It allows to run multiple virtual computers on the same physical system. By creating an abstraction of the underlying hardware, it allows to execute a variety of virtual machines (VMs) on top of a virtualized hardware.

This article will discuss how the technology of virtualization works, what advantages it offers and why it is an essential part of today’s data centers. The focus will be the server virtualization solution VMware Infrastructure, the flagship product suite of VMware Inc.

Continue Reading