Use the Statistical Tests Function
The Statistical Tests Function can help you determine how well a set of data matches a certain distribution, especially normal distribution. You can do this by analyzing the data's characteristics and calculating corresponding p-values.
You have the option of the using the following types of statistical tests.
The Jarque-Bera test measures how a dataset's skewness and kurtosis compare to that of a normal distribution, with its value indicating the likelihood of non-normality.
Formula: JB = (n/6) (skewness^2 + (1/4)(kurtosis-3)^2)
This test assesses the fit of a data sample to a specified cumulative distribution by analyzing the squared differences, with a low p-value indicating a poor fit.
Formula: cramerVonMises = (1/(12*n)) + sum(((2*i - 1)/(2*n) - Phi(Z))^2)
The Anderson-Darling test focuses on the tails of the distribution to evaluate if the data conforms to a specified distribution, typically giving more weight to outliers.
Formula: A^2 = -n - (1/n) * sum((2*i - 1) * log(Phi(Z_i)) + (2*(n - i) + 1) * log(1 - Phi(Z_(n-i))))
This test combines skewness and kurtosis to examine if the shape of a dataset's distribution aligns with a normal distribution, looking for deviations in symmetry and sharpness of the highest point.
The Kolmogorov-Smirnov test, particularly the Lilliefors modification, evaluates the normality of a dataset by comparing the empirical distribution function with the cumulative normal distribution.
Formula: K = max(D+, D-) * sqrt(n)
Review the following scenario for the Statistical Tests function. Then, you will simulate PLC data and calculate the corresponding test values for the collected data.
In a chemical manufacturing plant, quality control engineers use the Statistical Tests Function to ensure that the mixture ratios of raw materials are consistent with the required standards for product batches. By analyzing characteristics such as consistency and concentration, the tests determine if the batch data deviates from normal distribution, which is critical for product quality.
Follow the steps to Connect a Device and configure the following parameters:
- Device Type: Simulator
- Driver Name: Generator
- Enable Alias Topics: Select the checkbox.
After connecting the device, add the following tags. See Add Tags to learn more.
- Name: Select S - Random value generator
- Value Type: Select float64
- Polling Interval: Enter 1
- Tag Name: Enter input1
- Min_value: Enter 20
- Max_value: Enter 30
You can now create the analytics flows using data from the device and tag you previously created.
To create an analytics flow with the Statistical Function Processor:
- In Manufacturing Connect Edge, navigate to Analytics.
On the analytics canvas, click Add processor. The Create a processor dialog box displays.
- Select DataHub Subscribe.
In the Topic field, click the Search icon, select the device you previously created, and then select the alias topic for the input1 tag.
- Click Save.
- Click Add processor again and select the Statistical Function processor. The Edit a Processor dialog box appears.
- Window Size: Enter a value that represents the range to apply the statistical tests. For this example, we input a value of 100.
- Select the Jarque Bera, Anderson Darling, Cramer Von Mises, D Agostino Pearson, and Kolmogorov Smirnov checkboxes.
Click Save.
- Connect the DataHub Subscribe processor (tag: input1) to the Statistical Function processor with a wire and use the events connection.
On the analytics canvas, click Save. The configured analytics flows should look like the following:
Click the View icon in the Statistical Function processor to view the output values.
The tests suggest the sample data may not fit a normal distribution, as indicated by the p-values and test statistics provided.