Why is ClickHouse Keeper recommended over ZooKeeper?

ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with ZooKeeper, but it might not be obvious why you should use ClickHouse Keeper instead of ZooKeeper. This article discusses some of the benefits of Keeper.

Answer

ClickHouse Cloud uses clickhouse-keeper at large scale for thousands of services in a multi-tenant environment. We designed and built Keeper so that we could remove our dependency on the Java-based ZooKeeper implementation. ClickHouse Keeper solves many well-known drawbacks of ZooKeeper and makes additional improvements, including:

Snapshots and logs consume much less disk space due to better compression
No limit on the default packet and node data size (it is 1 MB in ZooKeeper)
No zxid overflow issue (it forces a restart for every 2B transactions in ZooKeeper)
Faster recovery after network partitions due to the use of a better distributed consensus protocol
It uses less memory for the same volume of data
It is easier to setup, and it does not require specifying the JVM heap size or a custom garbage collection implementation
A few custom commands in the protocol enable faster operations in ReplicatedMergeTree tables
A larger coverage by Jepsen tests

In addition, ClickHouse Support has observed a massive decrease in cluster problems in cases with sites who use clickhouse-keeper rather than ZooKeeper.

Check out the Keeper docs page for more details on how to configure and run ClickHouse Keeper.

Answer​

Answer