Tuesday, September 16, 2014

MySQL Central: It's that time of the year

It's that time of the year again: yes, Oracle Open World is coming up and with that I'll be travelling to San Francisco. New for this year is that we are part of the main Open World event and therefore have our own MySQL Central. Here you will have the opportunity of meeting many of the engineers behind MySQL, discuss technical problems you have, and also learn some about how we look at the future of the MySQL ecosystem.

This year, me and Narayanan Venkateswaran will be presenting two sessions:

Elastic Scalability in MySQL Fabric with OpenStack (Thursday, Oct 2, 1:15 PM-2:00 PM in Moscone South, 252)

In this session you will see how Fabric can use the new provisioning support to fetch servers from an OpenStack instance. The presentation will cover how to use the provisioning support to fetch servers from OpenStack Nova, OpenStack Trove, and also from Amazon AWS. You will also learn about the provisioning interface and how you can use it to create your own hardware registry support.

MySQL Fabric: High Availability at Different Levels (Wednesday, Oct 1, 2:00 PM-2:45 PM in Moscone South - 250)

MySQL Fabric is a distributed system that requires coordination among different components to provide high availability: connectors, servers, and MySQL Fabric nodes must be orchestrated to create a solution resilient in the face of failures. Ensuring that each component alone is fault-tolerant does not guarantee that applications will continue working in the event of a failure. In this session, you will learn how all components in MySQL Fabric were designed to provide a high-availability solution and how they cooperate to achieve this goal. The presentation shows you how to create your own infrastructure to monitor MySQL servers and manage manual switchover or automatic failover operations together with MySQL Fabric.

As every year, it's going to be fun to meet all the people in the MySQL community, both new and old, so I'm looking forward to meeting you all there.

Tuesday, May 27, 2014

MySQL Fabric: Musings on Release 1.4.3

As you might have noticed in the press release, we just released MySQL Utilities 1.4.3, containing MySQL Fabric, as a General Availability (GA) release. This concludes the first chapter of the MySQL Fabric story.

It all started with the idea that it should be as easy to manage and setup a distributed deployments with MySQL servers as it is to manage the MySQL servers themselves. We also noted that some of the features that were most interesting were sharding and high-availability. Since we also recognized that every user had different needs and needed to customize the solution, we set of to create a framework that would support sharding and high-availability, but also other solutions.

With the release of 1.4.3, we have a range of features that are now available to the community, and all under an open source license and wrapped in an easy-to-use package:

  • High-availability support using built-in slave promotion in a master-slave configuration.
  • A framework with an execution machinery, monitoring, and interfaces to support management of large server farms.
  • Sharding using hash and range sharding. Range sharding is currently limited to integers, but hash sharding support anything that looks like a string.
  • Shard management support to move and split shards.
  • Support for failure detectors, both built-in and custom ones.
  • Connectors with built-in load balancing and fail-over in the event of a master failure.

Beyond MySQL Fabric 1.4.3

As the MySQL Fabric story develop, we have a number of challenges ahead.

Loss-less Fail-over. MySQL 5.7 have extended the support for semi-sync so that transactions that are not replicated to a slave server will not be committed. With this support, we can truly have a loss-less fail-over so that you cannot lose a transaction if a single server fails.

More Fabric-aware connectors. We currently have support for Connector/J, Connector/PHP, and Connector/Python, but one common request is to have support for a Fabric-aware C API. This is both for applications developed using C/C++, but also to add Fabric support to connectors based on the MySQL C API, such as the Perl and Ruby connector.

Multi-Node Fabric Instance. Many have pointed out that the Fabric node is a single point of failure, and it is instead a single node, but if the Fabric node goes down, the system do not stop working. Since the connectors cache the data, they can "run on the cache" for the time it takes for the Fabric node to be brought up again. Procedures being executed will stop, but once the Fabric node is on-line again, execution will resume from where it left off. To ensure that the meta-data (the information about the servers in the farm) is not lost in the event of a machine failure, MySQL Cluster can be used as storage engine, and will then ensure that your meta-data is safe.

There are, however, a few advantages in having support for multiple Fabric nodes:

  • The most obvious advantage is that execution can fail-over to another node and there will be no interruption in the execution of procedures. If the fail-over is built-in, you avoid the need for external clusterware to manage several Fabric nodes.
  • If you have several Fabric nodes available to deliver data, you improve responsiveness to bursts in meta-data requests. This can happen if you have a large bunch of connectors brought on-line at the same time.
  • If you have multiple data centers, having a local version of the data to serve the applications deployed in the same center improve locality of data and avoid an unnecessary round-trip over WAN to fetch some meta-data.
  • With several nodes to execute management procedures, you can improve scaling by being able to execute several management procedures in parallel. This would require some solution to avoid that that procedures do no step over each other.
Location Awareness. In deployments spread over several data-centers, the location of all the components suddenly become important. There is no reason for a connector to be directed to a remote server when a local one suffices, but that require some sort of location awareness in the model allowing the location of servers (or other components) to be given.

Extending the model by adding data centers is not enough though. The location of components withing a data center might be important. For example, if a connector is located in a particular rack in the data center, going to a different rack to fetch data might be undesirable. For this reason, the location awareness need to be hierarchical and support several levels, e.g., continent, city, data center, hall, rack, etc.

Multi-Shard Queries. Sharding can improve performance significantly since it split the data horizontally across several machines and each query therefore go directly to the right shard of the data. In some cases, however, you also need to send queries to multiple shards. There are a few reasons for this:

  • You do not have the shard key available, so you want to search all servers for some object of interest. This of course affect performance, but in some cases there are few alternatives. Consider, for example, searching for a person given name and address when the database is sharded on the SSN.
  • You want to generate a report of several items in the database, for example, find all customers above 50 that have more than 2 cars.
  • You want a summary of some statistic over the database, for example, generate a histogram over the age of all your customers.
Session Consistency Guarantees. As Alfranio point out, when you use multiple servers in your farm, and transactions are sent to different servers at different times, it might well be that you write one transaction that goes to the master of a group and then try to read something from the same group. If the write transactions have not reached the server that you read from, then you might get an incorrect result from your transaction. In some cases, this is fine, but in other cases, you have certain guarantees that you want to have on your session. For example, you want to ensure that anything you write will also be available when you read in transactions following the write, you might want to guarantee that multiple reads read later data all the time (called "read monotonicity"), or other forms of guarantees on the result sets you get back from the distributed database. This might require connectors to wait for transactions to reach slaves before reading, but this should be transparent to the application.

This is just a small set of the possibilities for the future, so it is really going to be interesting to see how the MySQL Fabric story develops.

Tuesday, April 29, 2014

MySQL Fabric: Tales and Tails from Percona Live

Going to Percona Live and presenting MySQL Fabric gave me the opportunity to meet a lot of people and get a lot of good feedback. I talked to developers from many different companies and got a lot of great feedback that will affect the priorities we make, so to all I spoke to I would like to say a great "Thank you!" for the interesting discussions that we had. Your feedback is very valuable. It was very interesting to read the comments on MySQL Fabric on MySQL Performance Blog. The article discuss the current version of MySQL Fabric distributed with MySQL Utilities and give some brief points on features of MySQL Fabric. I think it could be good to give some context to some of the points they raise, both to elaborate on the points and show what they mean in reality, and also to give some background to how we were thinking around these points.

The Art of Framing the Fabric

It was a deliberate decision to make MySQL Fabric extensible, so it is not surprising that it have the feel of a framework. By making MySQL Fabric extensible, we allow community and users to explore ideas or add user-specific support.

In the MySQL Team at Oracle we are strong believers in the open source model and are working hard to keep it that way. There are many reasons to why we believe in this model, but one of the reasons is that we do not believe that one size fit all. For any users, there are always minor variations or tweaks that are required by the users own specific needs. This means that the ability to tweak and adapt the solution to their specific needs is very important. Without MySQL being open-source, this would not be possible. As you can see from WebScaleSQL, this is not just a theoretical exercise, this is how companies really use MySQL.

From the start, we therefore focused on building a framework and created the sharding and high-availability as plugins; granted, they are very important plugins, but they are nevertheless plugins. This took a little more effort, and a little more thinking, but by doing it this way we can ensure that the system is truly extensible for everybody.

Hey! I've got a server in my farm!

As noted, many if the issues related to high-availability and sharding require server-side support to get it really solid. This is also something we recognized quite early; the alternative would be to place the logic in the connectors or the Fabric node. We recognized that the right place to solve this is in the server, not in connector layer since that put a lot of complexity at the wrong place. Even if it was possible to handle everything in the connector, there is still a chance that something goes wrong if the constraints are not enforced in the server. This could be because of bugs, because of mistakes in the administration of the server, or any other number of reasons, so to build a solid solution, constraints on the data should be enforced by the servers and not in the connectors or in a proxy.

An example given is that there is no way to check that a row ends up in the right shard, which is very true. A generic solution to this would be to add CHECK constraint on the server, but unfortunately, this is a very big change in the server code-base. Adding triggers to the tables on the server is probably a good short-term solution, but that require managing and deploying extra code on all servers, which in turn is an additional burden on managing the servers, which is something we would like to avoid (the more "special" things you have to do with the servers, the higher the risk is of something going wrong).

On the proximity of things...

One of the central components of MySQL Fabric are the high-availability groups (or just groups, when it is clear from the context) that were discussed in an earlier post. The central idea around a group is that each group manages the same piece of data and MySQL Fabric is designed to handle and coordinate multiple groups into a federation of databases. The feature of being able to manage multiple groups is something that is critical to create a sharded system. On thing that is quite often raised is that it should be possible for a server to belong to multiple groups, but I think this comes from a misunderstanding on what a group represents. It is not a "replica set", which gives information about the topology, that is, how replication is set up, nor does it say anything about how the group is deployed. It is perfectly OK to have members of the group in different data centers (for geographical redundancy), and it is perfectly OK to have replication between groups to support, for example, functional partitioning. If a server belonged to two different groups, it would mean that it manages two different sets of data at the same time.

The fact that group members can be located in different data centers raises another important aspect, something that was often mentioned at Percona Live, that of managing the proximity of components in the system. There is some support for this in Hadoop where you have rack-awareness, but we need a slightly more flexible model. Imagine that you have a group set up with two servers in different data centers and you further have scale-out slaves attached locally. You have connectors deployed in both data centers, but when reading data you do not want to go to the other data center to execute the transaction, it should always be done locally. So, is it sufficient to be able to just have a simple grouping of the components? No, because you can have multiple levels of proximity, for example, data centers, continents, and even rooms or racks within a data center. You can also have different facets that you want to model, such as latency, throughput, or other properties that are interesting for particular uses. For that reason, whatever proximity model we deploy, it need to support a hierarchy and also have a more flexible cost model where you can model different aspects. Given that this problem have been raised several times on Percona Live and also by others, it is likely to be something we need to prioritize.

The crux of the problem

As most of you have already noted, there is a single Fabric node running that everybody talk to. Isn't this a single point of failure? It is indeed, but there is more to the story than just this. A single point of failure is a problem because if it goes down, so does the system... but in this case, it doesn't really go down, it will keep running most of the time.

The Fabric node does a lot of things: it keeps track of the status of all the components of the farm, execute procedures to handle fail-over, and deliver information about the farm on request. However, the connectors are the ones that route the transactions to the correct place, and to avoid having to ask the Fabric node about information each time, the connectors maintain caches. This means that in the event of a Fabric node failure, connectors might not even notice that it is gone unless they had to re-fill their caches. This means that if you restart the Fabric node, it will be able to serve the information again.

Another thing that stops when the Fabric node goes down is that no more fail-overs can be done and ongoing procedures are stopped in their tracks, which could potentially leave the farm in an unknown state. However, the state of the execution of any ongoing procedures are stored in the backing store, so when you bring up the Fabric node again, it will restore the procedures from the backing store and continue executing. This feature alone do not help against a complete loss of the machine where the Fabric node and the backing store are put, but, MySQL Fabric is not relying on specific storage engine features, any transactional engine will do, so by using MySQL Cluster as the storage engine it is possible to ensure safe-keeping of the state.

There are still good reasons to support multi-node Fabric instances:

  • If one Fabric node goes down, it should automatically fail over to another and continue execution. This will prevent any downtime in handling executions.
  • Detecting and bringing up a secondary Fabric node can become very complicated in the case of network partitions since it require handling split-brain scenarios reliably. It is then better to have this built into MySQL Fabric since it makes deployment and management significantly simpler.
  • Management of a farm does not put any significant pressure on the database back-end, but having a single Fabric node can be a bottleneck. In this case, it would be good to be able to execute multiple independent procedures on different Fabric nodes and coordinate the updates.
  • If a lot of connectors are required to fill their caches at the same time, we have a risk of a thundering herd. Having a set of Fabric nodes for read scale-out can then be beneficial.
  • If a group is deployed in two very remote data centers, it is desirable to have a local Fabric node for read-only purposes instead of having to go to the other data center.

More Fabric-aware Connectors

Currently we support connectors for Python, Java, and PHP, but one point that pop up quite often (both at Percona Live and elsewhere) is the lack of a Fabric-aware C connector. It is the basis for implementing both the Perl Database Interface MySQL driver DBD::mysql and for the Ruby connector, but is also desirable in itself for applications using C or C++ connector. All I can say at this point is that we are aware of the situation and know that it is something desired and important.

Interesting links

Tuesday, April 01, 2014

MySQL Fabric 1.4.2 Released

As you saw in the press release, MySQL Fabric 1.4.2 is now released! If you're interested in learning more about MySQL Fabric, there is a session titled Sharding and Scale-out using MySQL Fabric in Ballroom G. MySQL Fabric is a relatively new project in the MySQL ecosystem and it focuses on building a framework for working with large deployments of MySQL Servers. The architecture of MySQL Fabric is such that it allows extensions to be added and the first two extensions that we added were support for high-availability using High-Availability groups (HA groups) and sharding to manage very large databases. The first version of sharding have hash and range sharding implemented as well as procedures for moving and splitting shards.
A critical part of working with a collection of servers is the ability to route transactions to the correct servers, and for efficiency reasons we quite early decided to put this routing logic into the connectors. This avoid one extra network hop and hence improve performance by reducing latency, but it does require that the connectors containing routing logic, caches, and support for fetching data from MySQL Fabric. Putting the routing logic into the connector also make it easy to extend the API to add new support that applications can require.
MySQL Fabric 1.4.2 is distributed as part of MySQL Utilities 1.4.2. To avoid confusion, we have changed the version numbering to match the version of MySQL Utilities it is distributed in.
We have just done a few public releases, even though we did a few internal releases as well, but a brief history of our releases this far is:
  • MySQL Fabric 1.4.0
    • First public release
    • High-Availability groups for modeling farms
    • Event-driven Executor for execution of management procedures.
    • Simple failure detector with fail-over procedures.
    • Hash and Range sharding allowing management of large databases.
    • Shard move and shard split to support management of a sharded database.
    • Connector interfaces to support federated database systems.
    • Fabric-aware Connector/Python (labs)
    • Fabric-aware Connector/J (labs)
    • Fabric-aware Connector/PHP (labs)
  • MySQL Fabric 1.4.1
    • More solid scale-out support in connectors and MySQL Fabric
    • Improvements to the Executor to avoid stalling reads
    • Connector/Python 1.2.0 containing:
      • Range and Hash sharding
      • Load-balancing support
    • Labs release of Connector/J with Fabric-support
  • MySQL Fabric 1.4.2
    • Credentials in MySQL Fabric
    • External failure reporting interfaces supporting external failure detectors
    • Support for unreliable failure detectors in MySQL Fabric
    • Credentials support in Connector/Python
    • Connector/Python 1.2.1 containing:
      • Failure reporting
      • Credentials Support
  • Connector/J 5.1.30 containing Fabric support

  • Do you want to participate?

    There is a lot you can do if you want to help improve MySQL Fabric.

    Blogs about MySQL Fabric