Improved Performance Configurations

Accuracy of reporting is important, especially when working with drill down, detailed, data. When looking at which courses Johnny Learner completed, for example, you need to know that what you’re seeing in reports is absolutely correct.

But when it comes to high level reporting involving tens of thousands of data points or more, exact precision becomes less important. In a line chart tracking month by month utilization of a new learning experience platform, for example, it probably doesn’t make any practical difference whether 9826 people or 9832 people accessed the platform that month; the numbers are close enough for all practical purposes. This is important, because crunching data for large data sets takes time and it’s possible to speed up the processing of these large dataset reports by sacrificing some of the precision returned in the results. This guide outlines the Advanced Configuration required to enable those features.

Please note: The features described in this guide should only be used in reports where measures are performing calculations on tens of thousands of data points or more. Their accuracy is significantly lower with smaller data sets.

Who can use this feature?
 User Roles
Global Admins, Area Admins, and some Users can use this feature.
 Pricing 
Available on paid plans (AnalystCLO, and Enterprise).
 Expertise
Experts can use this feature.

Distinct Count Sampling

The distinct count measure counts the number of unique items in a data set. In large data sets it is possible to use statistical wizardry to get a good estimate of the number of unique items in the whole data set, based on a partial sample of that data set. This is what the Distinct Count Sampling feature does.

Sampling is enabled by including the samplePercent property in a measure’s aggregation as shown in the example below.

{
  "name": "Activity Count Sampled",
  "aggregation": {
    "type": "DISTINCT_COUNT",
    "samplePercent": 0.5
  },
  "valueProducer": {
    "type": "STATEMENT_PROPERTY",
    "statementProperty": "object.id",
    "caseSensitive": true
  }
}

The value of the samplePercent property is a number between 0 and 1 indicating the proportion of the data to sample. The higher this number, the more accurate the result will be; the lower the number, the faster the report will run. So, for example, with a samplePercent value of 0.5, 50% of the data will be sampled; with a samplePercent value of 0.2, 20% of the data will be sampled.

The image below illustrates how accurate sampling can be with data sets of tens of thousands of results at different sample percentages. 

sampling.png

The decision of what level to set the samplePercent at will depend on the size of the data set and how important accuracy vs. speed is for you on the report. The larger the dataset, the more accurate the measure is likely to be with smaller sample percents, so for reports where the data is likely to grow over time, you may need to return to re-configure the card to a smaller sample as that data grows.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

If you can't find what you need or you want to ask a real person a question, please contact customer support.