ByteByteGo Logo
Database Partitioning Sharding

Vertical vs Horizontal Partitioning

Explore vertical vs horizontal partitioning strategies in databases.

In many large-scale applications, data is divided into partitions that can be accessed separately. There are two typical strategies for partitioning data.

  • Vertical partitioning: it means some columns are moved to new tables. Each table contains the same number of rows but fewer columns (see diagram below).

  • Horizontal partitioning (often called sharding): it divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows (see diagram below).

Horizontal partitioning is widely used so let’s take a closer look.

Routing algorithm

The routing algorithm decides which partition (shard) stores the data.

  • Range-based sharding. This algorithm uses ordered columns, such as integers, longs, timestamps, to separate the rows. For example, the diagram below uses the User ID column for range partition: User IDs 1 and 2 are in shard 1, User IDs 3 and 4 are in shard 2.

  • Hash-based sharding. This algorithm applies a hash function to one column or several columns to decide which row goes to which table. For example, the diagram below uses User ID mod 2 as a hash function. User IDs 1 and 3 are in shard 1, User IDs 2 and 4 are in shard 2.

Benefits

  • Facilitate horizontal scaling. Sharding facilitates the possibility of adding more machines to spread out the load.

  • Shorten response time. By sharding one table into multiple tables, queries go over fewer rows, and results are returned much more quickly.

Drawbacks

  • The order by operation is more complicated. Usually, we need to fetch data from different shards and sort the data in the application’s code.

  • Uneven distribution. Some shards may contain more data than others (this is also called the hotspot).