Help Manual

Contents






Sigma Magic Help Version 15

Basic Statistics

Overview

Basic Statistics tool is used to calculate some basic statistical information about the data and plot a histogram of the given data. You can use this analysis whenever you have collected any data and want to summarize the data that you have collected.
MeasureData TypesDescription
Freq TableBinary, Nominal, OrdinalProvides a table of the frequencies of different categories present in the data.
MeanDiscrete, ContinuousReports the average or the mean value of the data within that group
MedianOrdinal, Discrete, ContinuousReports the median value of the data. The median value is the central value of the data when the data is put in either increasing or decreasing order. For ordinal data, we need to categories to be numeric in order to report the median value.
ModeAll TypesReports the most frequently occurring category for the binary, nominal and ordinal data and the most frequently occurring data value for discrete and continuous data types within a group.
Trimmed MeanDiscrete, ContinuousReports the mean value after subtracting the bottom 5% and the top 5% of the data points. Basically, it is like reporting the mean value after possibly removing the outliers.
MinOrdinal, Discrete, ContinuousReports the minimum value observed in the data. Note that for Ordinal data type, this value is reported only if the categories are numeric. If text values are entered in the category column, then no minimum values are reported.
MaxOrdinal, Discrete, ContinuousReports the maximum value observed in the data. Note that for Ordinal data type, this value is reported only if the categories are numeric. If text values are entered in the category column, then no maximum values are reported.
RangeOrdinal, Discrete, ContinuousReports the range values (max - min) observed in the data. Note that for Ordinal data type, this value is reported only if the categories are numeric. If text values are entered in the category column, then no range values are reported.
StdevDiscrete, ContinuousReports the sample standard deviation value for the given data set within a group. Note that this is the sample standard deviation and not the population standard deviation.
VarianceDiscrete, ContinuousReports the variance value which is the square of the standard deviation for the given data within a group. Note that this is the variance of the sample and not the population.
Q1Ordinal, Discrete, ContinuousReports the first quartile of the data values for discrete and continous. This value is reported for ordinal data only if the category values are numeric. The first quartile means that 25% of the data points are less than Q1 and 75% of the data points are greater than Q1.
Q3Ordinal, Discrete, ContinuousReports the third quartile of the data values for discrete and continous. This value is reported for ordinal data only if the category values are numeric. The third quartile means that 75% of the data points are less than Q3 and 25% of the data points are greater than Q3.
IQROrdinal, Discrete, ContinuousReports the difference between Q3 and Q1 values. This value is reported for ordinal data only if the category values are numeric.
SumDiscrete, ContinuousReports the sum of all the data values within a group.
SSQDiscrete, ContinuousReports the sum of squares of all the data values within a group. Note that the mean value is not subtracted from the data to calculate the Sum of Squares values.
CVDiscrete, ContinuousReports the coefficient of variation for the given set of data within a group. The coefficient of variation is the ration of the standard deviation to the mean value. It reports the amount of variation in the data. If CV < 1, then we can consider the data as having low variation and if CV > 1, then the data has high variation.
SkewnessContinuousReports a measure of the Skewness of the data. This measure can be used to check if the distribution is symmetric. If skewness is 0, then the distribution is symmetric. If the skewness is negative then left tail is longer and if the skewness is positive then the right tail is longer.
KurtosisContinuousReports a measure of the shape of the distribution. If the Kurtosis is close to 3, the shape of the distribution is similar to normal distribution. If Kurtosis is less than 3, then there are fewer outliers and if the Kurtosis is greater than 3, there could be more number of outliers.
NormalityContinuousReports the P value of the normality test to check if the given data is normally distributed. If P value is less than alpha, we conclude that the data is not normally distributed.
CIContinuousReports the confidence interval of the mean value. The confidence interval reports the likelihood of where the population mean is expected to lie and is based on the variation in the data and the confidence level required in the analysis.


This tool can be added to your active workbook by clicking on Stats and then selecting Basic Stats under the Tool Wizard menu.

Analysis Setup

Click on Analysis Setup to open the menu options for this tool.

Setup

A sample screenshot of the setup menu is shown below.
inputs page 1
1
Data Type: Click on the data type to specify the type of input data. The type of statistics you can compute and the graphs you can display depend on the type of data you have. Following are the available data types for this option:
OptionDescription
BinaryA Binary data has two possible values (like Good/Bad or True/False or 0/1).
NominalNominal data has more than two categories but they cannot be ranked (like nationality, religion etc.).
OrdinalOrdinal data has more than two categories and they can be compared with each other and ranked (like the grades in an exam A > B > C).
DiscreteDiscrete data is like count data that has only finite set of values (like the number of defects in a product example 20, 30, 43).
ContinuousContinuous data can take any arbitrary value (like the temperature of the room example 34.53 deg centigrade).
2
Notes Type: Specify the type of notes output you would like to generate from this analysis. The available options are:
OptionDescription
AutoThe system will automatically pick multiple columns if the number of groups is less than 4 or a single column if there are more than 4 groups.
Single ColumnSpecify if you want the descriptive stats in a single column for each group. Use this option if you have a large number of groups.
Multiple ColumnsSpecify if you want to display the descriptive stats for all groups in a column format. Use this option if there are fewer groups for ease of comparison.
3
Statistics: To select the statistics that you want to display for each analysis select the appropriate checkbox next to the statistic you are interested in. Depending on your data type only those metrics that are suitable for that data type can be selected. For example, for binary data you cannot calculate the mean value. For ordinal data, you can select a set of different metrics to calculate only if the categories are numeric. For example, if you are measuring the survey responses on a Likert scale with Disagree, Neutral, Agree. If you enter these as the categories then you can't calculate the median value but if you transpose the data to numeric values (like 1, 2, 3) then you can report the statistics such as the median. For discrete and continuous data, the data column must be numeric.
4
Help Button: Click on this button to open the help file.
5
Cancel Button: You can click on this button to cancel any changes you have made in the Analysis Setup and exit this form. No outputs are computed or updated.
6
OK Button: You can click on this button to save any changes you have made in the Analysis Setup and recompute all the outputs (notes and graphs section). Note that the outputs are updated only if you have all the required data entered on the worksheet. If there are missing inputs, then you will need to enter the missing information and then click on Compute Outputs to generate the analysis results.

Data

If you click on the Data button, you will see the following dialog box. Here you can specify the data required for this analysis. Data
1
Search Data: The available data displays all the columns of data that are available for analysis. You can use the search bar to filter this list and to speed up finding the right data to use for analysis. Enter a few characters in the search field and the software will filter and display the filtered data in the Available Data box.
2
Available Data: The available data box contains the list of data available for analysis. If your workbook does not have any data in tabular format, this box will display "No Data Found". The information displayed in this box includes the row number, whether the data is Numeric (N) or Text (T), and the name of the column variable. Note that the software displays data from all the tables in the current workbook. Even though data within the same table have unique column names, columns across different tables can have similar names. Hence, it is important that you not only specify the column name but also the table name.
3
Add or View Data: Click on this button either to add more data into your workbook for analysis or to view more details about the data listed in the available data box. When you click on this button, it opens up the Data Editor dialog box where you can import more data into your workbook, or you can switch from the list view to a table view to see the individual data values for each column.
4
Required Data: The code for the required data specifies what data can be specified for that box. An example code is N: 2-4. If the code starts with an N, then you will need to select only numeric columns. If the code starts with a T, then you can select both numeric and text columns. The numbers to the right of the colon specify the min-max values. For example, if the min-max values are 2-4, then you need to select a minimum of 2 columns of data and a maximum of 4 columns of data in this box. If the minimum value is 0, then no data is required to be specified for this box.
5
Select Button: Click on this button to select the data for analysis. Any data you select for the analysis is moved to the right. To select a column, click on the columns in the Available Databox to highlight them and then click on the Select Button. A second method to select the data is to double click on the columns in the list of Available Data. Finally, you can also drag and drop the columns you are interested in by holding down the select columns using your left mouse key and dragging and dropping them in one of the boxes on the right.
6
Selected Data: If the right amount of data columns has been specified, the list box header will be displayed in the black color. If sufficient data has not been specified, then the list box header will be displayed in the red color. Note that you can double-click on any of the columns in this box to remove them from the box.

The data you specify for this analysis depends on the options you have specified in the Setup tab.
OptionDescription
1If you want to analyze a single column of data, you can specify that column under Analysis Variables and leave the Grouping Variable blank.
2If you want to analyze multiple columns of data, you can specify the multiple columns under Analysis Variables and leave the Grouping Variable blank.
3If you are working with grouped data then you enter the variable data under the Analysis Variables and the column that contains grouping information under Grouping Variables. Note that you can have up to two grouping variables.
7
View Selection: Click on this button to view the data you have specified for this analysis. The data can be viewed either in a tablular format or in a graphical summary.

Charts

If you click on the Charts button, you will see the following dialog box. Charts
1
Title: The system will automatically pick a title for your chart. However, if you would like to override that with your own title you can specify a title for your chart here. Note that this input is optional.
2
Sub Title: The system will automatically pick a subtitle for your chart. However, if you would like to override that with your own subtitle you can specify a subtitle for your chart here. Note that this input is optional.
3
X Label: The system will automatically pick a label for the x-axis. However, if you would like to override that with your own label for the x-axis you can specify a different label here. Note that this input is optional.
4
Y Label: The system will automatically pick a label for the y-axis. However, if you would like to override that with your own label for the y-axis you can specify a different label here. Note that this input is optional.
5
X Axis: The system will automatically pick a scale for the x-axis. However, if you would like to override that with your values for the x-axis, you can specify them here. The format for this input is to specify the minimum, increment, and maximum values separated by a semi-colon. For example, if you specify 10;20 then the minimum x-axis scale is set at 10 and the maximum x-axis scale is set at 20. If you specify 10;2;20, then, in addition to minimum and maximum values, the x-axis increment is set at 2. Note that this input is currently disabled and you will not be able to change this setting.
6
Y Axis: The system will automatically pick a scale for the y-axis. However, if you would like to override that with your values for the y-axis, you can specify them here. The format for this input is to specify the minimum, increment, and maximum values separated by a semi-colon. For example, if you specify 10;20 then the minimum y-axis scale is set as 10, and the maximum y-axis scale is set as 20. If you specify 10;2;20, then, in addition to minimum and maximum values, the y-axis increment is set at 2. Note that this input is optional.
7
Horizontal Lines: If you want to add a few extra horizontal reference lines on top of your chart you can specify the values here. The format for this input is numeric values separated by semi-colon. For example, if you specify 12;15 then two horizontal lines are plotted at Y = 12 and Y = 15 respectively. Note that this input is optional.
8
Vertical Lines: If you want to add a few extra vertical reference lines on top of your chart you can specify the values here. The format for this input is numeric values separated by semi-colon. For example, if you specify 2;5 then two vertical lines are plotted at X = 2 and X = 5 respectively. Note that this input is optional.

Verify

If you click on the Verify button, the software will perform some checks on the data you have entered. A sample screenshot of the dialog box is shown in the figure below. Verify The objective of this analysis as well as any checks that are performed is listed in this dialog box. For example, the software may check if you have correctly specified the input options and entered the required data on the worksheet. The results of the analysis checks are listed on the right. If the checks are passed, then they are shown as a green-colored checkmark. If the verification checks fail, then they are shown as a red-colored cross. If the verification checks result in a warning, they are shown in the orange color exclamation mark and finally, any checks that are required to be performed by the user are shown as blue info icons.

Outputs

A sample screenshot of the worksheet is shown below. This worksheet was created for continuous data with groups. Basic Stats Example
1
Notes: The notes section shows a summary of the inputs, specifically the number of rows of data, any skipped rows, and the number of distinct groups. In the analysis results area, the analysis lists the measures that you have selected to report such as Mean, Median, Min and Max values, Q1, Q3, IQR etc.
2
Graphs: For binary, nominal and ordinal data, a bar chart of the data is shown and for discrete and continuous data, a histogram of the data is shown in the graphical output box. A normal curve is superimposed on the histogram.

Interpreting Results

The summary statistics for each group (Shift 1 and Shift 2) are shown in the notes section. We can see that the mean and median values are close for both shifts however, the variation is slightly higher for Shift 1 compared to Shift 2. There are only about 20 data points for both groups. The histogram shows that the data roughly follows the normal distribution. There are no outliers present in the data sets.

Notes

Here are a few pointers regarding this analysis:
  • If you are copying data from another Excel spreadsheet, make sure that you do not copy any extraneous information such as formatting, etc. Make sure that you always use "Paste Special" and only paste "Values" and nothing else. This will preserve the formatting on the worksheet.
  • The histogram is auto-created. You will not be able to change the way the bin sizes are calculated in this tool. If you want more control over how to create the histogram, you will need to use the histogram tool.
  • If you are calculating statistics for ordinal data such as min, max, range, quartiles, etc., you will need to enter numeric values for the categories - otherwise the software will not be able to compare them to rank them against each other.
  • You can change the number of decimal places displayed for the statistics by clicking on Options > General > Number Format.
  • When you change the data type, the column names will change only if no data is entered on the worksheet. Once you enter the data on the worksheet and then you change the data type, the column names will not be altered.

Examples

The following examples can be found by clicking on the Examples button on the main menu bar.
  • Compute the mean, median and IQR for the Pizza delivery time given in the data file (Basic Stats 1.xlsm).

Additional Info

Here are a few related topics of interest.