Statistical Methods¶
These methods are used for performing statistical analysis on pipeline data.
finalVariance()¶
Calculates a comprehensive set of statistics for numeric data in the pipeline.
Signature: finalVariance(?callable $castFunc = null, ?RunningVariance $variance = null): RunningVariance
$castFunc: A function to convert pipeline values to floats. Defaults tofloatval. Returnnullto skip non-numeric values.$variance: An optional, pre-initializedRunningVarianceobject to continue calculations from.
Behavior:
- This is a terminal operation that returns a
RunningVarianceobject. - The
RunningVarianceobject contains methods to get the mean, variance, standard deviation, min, max, and count. - Values that the
$castFuncreturns asnullare not included in the statistics.
Examples:
use Pipeline\Helper\RunningVariance;
// Basic statistics
$stats = take([1, 2, 3, 4, 5])->finalVariance();
echo $stats->getCount(); // 5
echo $stats->getMean(); // 3.0
echo $stats->getVariance(); // 2.5
echo $stats->getStandardDeviation(); // ~1.58
echo $stats->getMin(); // 1.0
echo $stats->getMax(); // 5.0
// Statistics for a specific field
$stats = take($users)->finalVariance(fn($user) => $user['age']);
// Handling mixed data (skip non-numeric values)
$stats = take(['1', 'abc', 2, null, 3.5])
->finalVariance(fn($x) => is_numeric($x) ? (float)$x : null);
echo $stats->getCount(); // 3 (only numeric values counted)
// Continuing from existing statistics
$initialStats = take($firstBatch)->finalVariance();
$combinedStats = take($secondBatch)->finalVariance(null, $initialStats);
runningVariance()¶
Observes values as they pass through the pipeline, calculating statistics without consuming the pipeline.
Signature: runningVariance(?RunningVariance &$variance, ?callable $castFunc = null): self
&$variance: A reference to aRunningVarianceobject, which will be updated with the statistics. It will be created ifnull.$castFunc: A function to convert pipeline values to floats.
Behavior:
- This is a non-terminal operation that allows you to inspect statistics at a point in the chain.
Examples:
$stats = null;
$processedData = take([1, 2, 3, 4, 5])
->runningVariance($stats)
->map(fn($x) => $x * 2)
->toList();
echo $stats->getMean(); // 3.0
RunningVariance Helper Class¶
The Pipeline\Helper\RunningVariance class provides a powerful way to work with statistics. It uses Welford's online algorithm to calculate variance and other metrics in a single pass, which is highly efficient.
Key RunningVariance Methods¶
getCount(): int: The number of observations.getMean(): float: The arithmetic mean.getVariance(): float: The sample variance.getStandardDeviation(): float: The sample standard deviation.getMin(): float: The minimum value.getMax(): float: The maximum value.
Merging Statistics¶
You can merge RunningVariance instances, which is useful for parallel processing or combining historical and current data.
// Merge stats from two different sources
$stats1 = take($source1)->finalVariance();
$stats2 = take($source2)->finalVariance();
$combinedStats = new RunningVariance($stats1, $stats2);