Replication for fault tolerance
Description
In this architecture, there are five servers configured. Two are used to host copies of the data. The other three servers are used to coordinate the replication of data. With this example, we'll create a database and table that will be replicated across both data nodes using the ReplicatedMergeTree table engine.
Level: Basic
Terminology
Replica
A copy of data. ClickHouse always has at least one copy of your data, and so the minimum number of replicas is one. This is an important detail, you may not be used to counting the original copy of your data as a replica, but that is the term used in ClickHouse code and documentation. Adding a second replica of your data provides fault tolerance.
Shard
A subset of data. ClickHouse always has at least one shard for your data, so if you do not split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the sharding key, and is defined when you create the distributed table. The sharding key can be random or as an output of a hash function. The deployment examples involving sharding will use rand()
as the sharding key, and will provide further information on when and how to choose a different sharding key.
Distributed coordination
ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.
Environment
Architecture Diagram
Node | Description |
---|---|
clickhouse-01 | Data |
clickhouse-02 | Data |
clickhouse-keeper-01 | Distributed coordination |
clickhouse-keeper-02 | Distributed coordination |
clickhouse-keeper-03 | Distributed coordination |
In production environments, we strongly recommend using dedicated hosts for ClickHouse keeper. In test environment it is acceptable to run ClickHouse Server and ClickHouse Keeper combined on the same server. The other basic example, Scaling out, uses this method. In this example we present the recommended method of separating Keeper from ClickHouse Server. The Keeper servers can be smaller, 4GB RAM is generally enough for each Keeper server until your ClickHouse Servers grow very large.
Install
Install ClickHouse server and client on the two servers clickhouse-01
and clickhouse-02
following the instructions for your archive type (.deb, .rpm, .tar.gz, etc.).
Install ClickHouse Keeper on the three servers clickhouse-keeper-01
, clickhouse-keeper-02
and clickhouse-keeper-03
following the instructions for your archive type (.deb, .rpm, .tar.gz, etc.).
Editing configuration files
When configuring ClickHouse Server by adding or editing configuration files you should:
- Add files to
/etc/clickhouse-server/config.d/
directory - Add files to
/etc/clickhouse-server/users.d/
directory - Leave the
/etc/clickhouse-server/config.xml
file as it is - Leave the
/etc/clickhouse-server/users.xml
file as it is
clickhouse-01 configuration
For clickhouse-01 there are five configuration files. You may choose to combine these files into a single file, but for clarity in the documentation it may be simpler to look at them separately. As you read through the configuration files you will see that most of the configuration is the same between clickhouse-01 and clickhouse-02; the differences will be highlighted.
Network and logging configuration
These values can be customized as you wish. This example configuration gives you:
- a debug log that will roll over at 1000M three times
- the name displayed when you connect with
clickhouse-client
iscluster_1S_2R node 1
- ClickHouse will listen on the IPV4 network on ports 8123 and 9000.
<clickhouse>
<logger>
<level>debug</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>3</count>
</logger>
<display_name>cluster_1S_2R node 1</display_name>
<listen_host>0.0.0.0</listen_host>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
</clickhouse>
Macros configuration
The macros shard
and replica
reduce the complexity of distributed DDL. The values configured are automatically substituted in your DDL queries, which simplifies your DDL. The macros for this configuration specify the shard and replica number for each node.
In this 1 shard 2 replica example, the replica macro is replica_1
on clickhouse-01 and replica_2
on clickhouse-02. The shard macro is 1
on both clickhouse-01 and clickhouse-02 as there is only one shard.
<clickhouse>
<macros>
<shard>01</shard>
<replica>01</replica>
<cluster>cluster_1S_2R</cluster>
</macros>
</clickhouse>
Replication and sharding configuration
Starting from the top:
- The remote_servers section of the XML specifies each of the clusters in the environment. The attribute
replace=true
replaces the sample remote_servers in the default ClickHouse configuration with the remote_server configuration specified in this file. Without this attribute the remote servers in this file would be appended to the list of samples in the default. - In this example, there is one cluster named
cluster_1S_2R
. - A secret is created for the cluster named
cluster_1S_2R
with the valuemysecretphrase
. The secret is shared across all of the remote servers in the environment to ensure that the correct servers are joined together. - The cluster
cluster_1S_2R
has one shard, and two replicas. Take a look at the architecture diagram toward the beginning of this document, and compare it with theshard
definition in the XML below. The shard definition contains two replicas. The host and port for each replica is specified. One replica is stored onclickhouse-01
, and the other replica is stored onclickhouse-02
. - Internal replication for the shard is set to true. Each shard can have the internal_replication parameter defined in the config file. If this parameter is set to true, the write operation selects the first healthy replica and writes data to it.
<clickhouse>
<remote_servers replace="true">
<cluster_1S_2R>
<secret>mysecretphrase</secret>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse-01</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-02</host>
<port>9000</port>
</replica>
</shard>
</cluster_1S_2R>
</remote_servers>
</clickhouse>
Configuring the use of Keeper
This configuration file use-keeper.xml
is configuring ClickHouse Server to use ClickHouse Keeper for the coordination of replication and distributed DDL. This file specifies that ClickHouse Server should use Keeper on nodes clickhouse-keeper-01 - 03 on port 9181, and the file is the same on clickhouse-01
and clickhouse-02
.
<clickhouse>
<zookeeper>
<!-- where are the ZK nodes -->
<node>
<host>clickhouse-keeper-01</host>
<port>9181</port>
</node>
<node>
<host>clickhouse-keeper-02</host>
<port>9181</port>
</node>
<node>
<host>clickhouse-keeper-03</host>
<port>9181</port>
</node>
</zookeeper>
</clickhouse>
clickhouse-02 configuration
As the configuration is very similar on clickhouse-01 and clickhouse-02 only the differences will be pointed out here.
Network and logging configuration
This file is the same on both clickhouse-01 and clickhouse-02, with the exception of display_name
.
<clickhouse>
<logger>
<level>debug</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>3</count>
</logger>
<display_name>cluster_1S_2R node 2</display_name>
<listen_host>0.0.0.0</listen_host>
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
</clickhouse>
Macros configuration
The macros configuration is different between clickhouse-01 and clickhouse-02. replica
is set to 02
on this node.
<clickhouse>
<macros>
<shard>01</shard>
<replica>02</replica>
<cluster>cluster_1S_2R</cluster>
</macros>
</clickhouse>
Replication and sharding configuration
This file is the same on both clickhouse-01 and clickhouse-02.
<clickhouse>
<remote_servers replace="true">
<cluster_1S_2R>
<secret>mysecretphrase</secret>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse-01</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-02</host>
<port>9000</port>
</replica>
</shard>
</cluster_1S_2R>
</remote_servers>
</clickhouse>
Configuring the use of Keeper
This file is the same on both clickhouse-01 and clickhouse-02.
<clickhouse>
<zookeeper>
<!-- where are the ZK nodes -->
<node>
<host>clickhouse-keeper-01</host>
<port>9181</port>
</node>
<node>
<host>clickhouse-keeper-02</host>
<port>9181</port>
</node>
<node>
<host>clickhouse-keeper-03</host>
<port>9181</port>
</node>
</zookeeper>
</clickhouse>
clickhouse-keeper-01 configuration
When configuring ClickHouse Keeper by editing configuration files you should:
- Backup the
/etc/clickhouse-keeper/keeper_config.xml
- Edit the
/etc/clickhouse-keeper/keeper_config.xml
file
ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper. This configuration enables ClickHouse Keeper on port 9181. The highlighted line specifies that this instance of Keeper has server_id of 1. This is the only difference in the enable-keeper.xml
file across the three servers. clickhouse-keeper-02
will have server_id
set to 2
, and clickhouse-keeper-03
will have server_id
set to 3
. The raft configuration section is the same on all three servers, it is highlighted below to show you the relationship between server_id
and the server
instance within the raft configuration.
If for any reason a Keeper node is replaced or rebuilt, do not reuse an existing server_id
. For example, if the Keeper node with server_id
of 2
is rebuilt, give it server_id of 4
or higher.
<clickhouse>
<logger>
<level>trace</level>
<log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
<errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
<size>1000M</size>
<count>3</count>
</logger>
<listen_host>0.0.0.0</listen_host>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>clickhouse-keeper-01</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>clickhouse-keeper-02</hostname>
<port>9234</port>
</server>
<server>
<id>3</id>
<hostname>clickhouse-keeper-03</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>
clickhouse-keeper-02 configuration
There is only one line difference between clickhouse-keeper-01
and clickhouse-keeper-02
. server_id
is set to 2
on this node.
<clickhouse>
<logger>
<level>trace</level>
<log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
<errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
<size>1000M</size>
<count>3</count>
</logger>
<listen_host>0.0.0.0</listen_host>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>2</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>clickhouse-keeper-01</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>clickhouse-keeper-02</hostname>
<port>9234</port>
</server>
<server>
<id>3</id>
<hostname>clickhouse-keeper-03</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>
clickhouse-keeper-03 configuration
There is only one line difference between clickhouse-keeper-01
and clickhouse-keeper-03
. server_id
is set to 3
on this node.
<clickhouse>
<logger>
<level>trace</level>
<log>/var/log/clickhouse-keeper/clickhouse-keeper.log</log>
<errorlog>/var/log/clickhouse-keeper/clickhouse-keeper.err.log</errorlog>
<size>1000M</size>
<count>3</count>
</logger>
<listen_host>0.0.0.0</listen_host>
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>3</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>trace</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>clickhouse-keeper-01</hostname>
<port>9234</port>
</server>
<server>
<id>2</id>
<hostname>clickhouse-keeper-02</hostname>
<port>9234</port>
</server>
<server>
<id>3</id>
<hostname>clickhouse-keeper-03</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
</clickhouse>
Testing
To gain experience with ReplicatedMergeTree and ClickHouse Keeper you can run the following commands which will have you:
- Create a database on the cluster configured above
- Create a table on the database using the ReplicatedMergeTree table engine
- Insert data on one node and query it on another node
- Stop one ClickHouse server node
- Insert more data on the running node
- Restart the stopped node
- Verify that the data is available when querying the restarted node
Verify that ClickHouse Keeper is running
The mntr
command is used to verify that the ClickHouse Keeper is running and to get state information about the relationship of the three Keeper nodes. In the configuration used in this example there are three nodes working together. The nodes will elect a leader, and the remaining nodes will be followers. The mntr
command gives information related to performance, and whether a particular node is a follower or a leader.
You may need to install netcat
in order to send the mntr
command to Keeper. Please see the nmap.org page for download information.
echo mntr | nc localhost 9181
zk_version v23.3.1.2823-testing-46e85357ce2da2a99f56ee83a079e892d7ec3726
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 0
zk_packets_sent 0
zk_num_alive_connections 0
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count 6
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 1271
zk_key_arena_size 4096
zk_latest_snapshot_size 0
zk_open_file_descriptor_count 46
zk_max_file_descriptor_count 18446744073709551615
zk_version v23.3.1.2823-testing-46e85357ce2da2a99f56ee83a079e892d7ec3726
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 0
zk_packets_sent 0
zk_num_alive_connections 0
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 6
zk_watch_count 0
zk_ephemerals_count 0
zk_approximate_data_size 1271
zk_key_arena_size 4096
zk_latest_snapshot_size 0
zk_open_file_descriptor_count 48
zk_max_file_descriptor_count 18446744073709551615
zk_followers 2
zk_synced_followers 2
Verify ClickHouse cluster functionality
Connect to node clickhouse-01
with clickhouse client
in one shell, and connect to node clickhouse-02
with clickhouse client
in another shell.
- Create a database on the cluster configured above
CREATE DATABASE db1 ON CLUSTER cluster_1S_2R
┌─host──────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-02 │ 9000 │ 0 │ │ 1 │ 0 │
│ clickhouse-01 │ 9000 │ 0 │ │ 0 │ 0 │
└───────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
- Create a table on the database using the ReplicatedMergeTree table engine
CREATE TABLE db1.table1 ON CLUSTER cluster_1S_2R
(
`id` UInt64,
`column1` String
)
ENGINE = ReplicatedMergeTree
ORDER BY id
┌─host──────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-02 │ 9000 │ 0 │ │ 1 │ 0 │
│ clickhouse-01 │ 9000 │ 0 │ │ 0 │ 0 │
└───────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
- Insert data on one node and query it on another node
INSERT INTO db1.table1 (id, column1) VALUES (1, 'abc');
- Query the table on the node
clickhouse-02
SELECT *
FROM db1.table1
┌─id─┬─column1─┐
│ 1 │ abc │
└────┴─────────┘
- Insert data on the other node and query it on the node
clickhouse-01
INSERT INTO db1.table1 (id, column1) VALUES (2, 'def');
SELECT *
FROM db1.table1
┌─id─┬─column1─┐
│ 1 │ abc │
└────┴─────────┘
┌─id─┬─column1─┐
│ 2 │ def │
└────┴─────────┘
Stop one ClickHouse server node Stop one of the ClickHouse server nodes by running an operating system command similar to the command used to start the node. If you used
systemctl start
to start the node, then usesystemctl stop
to stop it.Insert more data on the running node
INSERT INTO db1.table1 (id, column1) VALUES (3, 'ghi');
Select the data:
SELECT *
FROM db1.table1
┌─id─┬─column1─┐
│ 1 │ abc │
└────┴─────────┘
┌─id─┬─column1─┐
│ 2 │ def │
└────┴─────────┘
┌─id─┬─column1─┐
│ 3 │ ghi │
└────┴─────────┘
- Restart the stopped node and select from there also
SELECT *
FROM db1.table1
┌─id─┬─column1─┐
│ 1 │ abc │
└────┴─────────┘
┌─id─┬─column1─┐
│ 2 │ def │
└────┴─────────┘
┌─id─┬─column1─┐
│ 3 │ ghi │
└────┴─────────┘