tdigest
t-digest algorithm data structure for accurate quantile estimation with configurable compression parameters
📝 Syntax
td = tdigest()
td = tdigest(compression)
td = tdigest(X)
td = tdigest(compression, X)
📥 Input argument
compression - compression factor: positive integer scalar.
X - an array of double, single, integers, ...
📤 Output argument
td - t-digest representation of the array elements.
📄 Description
td = tdigest(compression, X) returns a t-digest representation of the array elements of X.
TDigest is a data structure for accurate on-line accumulation of rank-based statistics such as quantiles and cumulative distribution functions. It is particularly effective for large data sets and for estimating extreme quantiles. The algorithm is described in detail in the paper "Computing Extremely Accurate Quantiles Using t-Digests" by Ted Dunning and Otmar Ertl.
The t-digest is particularly useful for:
Large datasets where you need memory-efficient quantile estimation
Streaming data where data arrives continuously
Accurate extreme quantiles (like 99th percentile) even with limited memory
Online algorithms where you can't store all the data
The compression factor (100 in the examples) controls the trade-off between accuracy and memory usage - higher values give more accuracy but use more memory.
Once you have a t-digest object, you can add new data points to it using the + operator, and compute percentiles or quantiles using the percentile or quantile methods.
For more details, see the original paper linked in the bibliography.
Methods available:
percentile(p): Returns the value(s) at the given percentile(s)
p(in [0, 100]).quantile(q): Returns the value(s) at the given quantile(s)
q(in [0, 1]).+: Adds new data points to the t-digest object.
Properties:
compression: The compression factor used to create the t-digest.
totalWeight: The total weight of all the centroids in the t-digest.
Used function(s)
tdigest algorithm
📚 Bibliography
https://www.sciencedirect.com/science/article/pii/S2665963820300403
💡 Examples
M = rand(1, 15000);
td = tdigest(100, M);
td = td + [1:15000];
td.percentile([5, 50, 95])
td.quantile([0.05 0.5 0.95])streaming updates
td = tdigest(100);
while(1)
td = td + randn();
td.percentile([5, 50, 95])
end🔗 See also
mean.
🕔 History
1.15.0
initial version
Last updated
Was this helpful?