fileCluster Table Function
Enables simultaneous processing of files matching a specified path across multiple nodes within a cluster. The initiator establishes connections to worker nodes, expands globs in the file path, and delegates file-reading tasks to worker nodes. Each worker node is querying the initiator for the next file to process, repeating until all tasks are completed (all files are read).
This function will operate correctly only in case the set of files matching the initially specified path is identical across all nodes, and their content is consistent among different nodes.
In case these files differ between nodes, the return value cannot be predetermined and depends on the order in which worker nodes request tasks from the initiator.
Syntax
fileCluster(cluster_name, path[, format, structure, compression_method])
Arguments
cluster_name
— Name of a cluster that is used to build a set of addresses and connection parameters to remote and local servers.path
— The relative path to the file from user_files_path. Path to file also supports globs.format
— Format of the files. Type: String.structure
— Table structure in'UserID UInt64, Name String'
format. Determines column names and types. Type: String.compression_method
— Compression method. Supported compression types aregz
,br
,xz
,zst
,lz4
, andbz2
.
Returned value
A table with the specified format and structure and with data from files matching the specified path.
Example
Given a cluster named my_cluster
and given the following value of setting user_files_path
:
$ grep user_files_path /etc/clickhouse-server/config.xml
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>
Also, given there are files test1.csv
and test2.csv
inside user_files_path
of each cluster node, and their content is identical across different nodes:
$ cat /var/lib/clickhouse/user_files/test1.csv
1,"file1"
11,"file11"
$ cat /var/lib/clickhouse/user_files/test2.csv
2,"file2"
22,"file22"
For example, one can create these files by executing these two queries on every cluster node:
INSERT INTO TABLE FUNCTION file('file1.csv', 'CSV', 'i UInt32, s String') VALUES (1,'file1'), (11,'file11');
INSERT INTO TABLE FUNCTION file('file2.csv', 'CSV', 'i UInt32, s String') VALUES (2,'file2'), (22,'file22');
Now, read data contents of test1.csv
and test2.csv
via fileCluster
table function:
SELECT * FROM fileCluster('my_cluster', 'file{1,2}.csv', 'CSV', 'i UInt32, s String') ORDER BY i, s
┌──i─┬─s──────┐
│ 1 │ file1 │
│ 11 │ file11 │
└────┴────────┘
┌──i─┬─s──────┐
│ 2 │ file2 │
│ 22 │ file22 │
└────┴────────┘
Globs in Path
All patterns supported by File table function are supported by FileCluster.
See Also