ICAEW.com works better with JavaScript enabled.
Exclusive

Tip of the Week

Excel Tip of the Week #400 - VBA case study - Exhausting a question bank

Author: David Lyford-Smith

Published: 23 Jun 2021

Exclusive content
Access to our exclusive resources is for specific groups of students, users, subscribers and members.

Hello all and welcome back to the Excel Tip of the Week! This week we have a Developer-level post in which we are going to take a particular complicated scenario we want to model, and we’re going to use it to get deeper into understanding how loops and variables can help us when we’re working in VBA.

The basic problem

Imagine you roll a normal die and get a 5. Next time, you get a 4, and then another 5. How many rolls should you expect to need before you have seen all six sides of the die?

This version is fairly simple to solve explicitly mathematically. The first roll will give us a never-before-seen face for sure, so that will take 1 roll. Then 5/6 of the faces are novel, so we should expect to need another 6/5 rolls to see one of them. Then we have 4 new faces to find, which on average will take 6/4 rolls, and so on. That gives us:

6/6 + 6/5 + 6/4 + 6/3 + 6/2 + 6/1 = 14.7 

So on average we should expect to need 14.7 rolls.

This same logic applies for dice with different numbers of faces, or other similar situations. For example, if you are trying to collect a complete set of eleven football stickers, and they are equally likely to show up, then the same calculation tells you to expect to need to buy 33.2 packets of stickers on average.

In Excel 365 this calculation can be easily written for a number of events as:

=SUM(A1/SEQUENCE(A1))

Taking it to the limit

Now we are going to move to the more complicated version of our situation that we will be exploring via modelling with VBA. We imagine we have a question bank for an exam (or any other similar situation). The exam consists of a certain number of question slots, and there are multiple potential questions for each slot. How many times would somebody have to take the exam to see every question? This scenario is analogous to plenty of other situations with more complicated random behaviour.

Now getting an exact answer mathematically is much less approachable. Instead, we will simulate the process of taking and retaking the exam until every question has been seen, and then automate re-running that simulation many times, until we have enough data to approximate the result.

Let’s look at how to simulate this all in Excel. We’ll start by clearing the output range of anything from previous times the macro has been run and then defining our variables, and asking the user to set the values of the ones that are user-defined:

Sub TestTaker()

Range("A:A").ClearContents

Dim ExamSize As Integer
ExamSize = InputBox("How many questions in the exam?")

Dim PoolDepth As Integer
PoolDepth = InputBox("How many possibilities for each question?")

Dim SimulationSize As Integer
SimulationSize = InputBox("How many simulations to run?")

Dim QuestionsSeen As New Collection
Dim InCollection As Boolean
Dim LoopCounter As Integer
Dim QuestionValue As String

[code goes here]

End Sub

The first three variables are easily understandable by the InputBox statements that define them. QuestionsSeen is a collection of all the questions seen in the current run, InCollection is a true/false that we’ll use to check if a new question needs to be added to the collection or has been seen before, LoopCounter will track how many exams we have to take until our collection is complete, and QuestionValue is the ID of the current question we’re looking at.

Now we need a For statement that will run our simulation the required number of times:

For i = 1 To SimulationSize
  LoopCounter = 0
  Do While QuestionsSeen.Count > 1
    QuestionsSeen.Remove (1)
  Loop
  [code to run a simulation goes here]
  Cells(i, 1) = LoopCounter
Next i

We start by setting LoopCounter to 0, and clearing out anything in the QuestionsSeen collection so we can start afresh. We then run the process of taking exams until we’ve seen all the questions, counting how many times we do that by incrementing LoopCounter, and then finally output the number of attempts it took us in column A.

Let’s dive deeper and see how we’re actually simulating taking the exam many times:

Do While QuestionsSeen.Count < ExamSize * PoolDepth
  LoopCounter = LoopCounter + 1
  For j = 1 To ExamSize
    InCollection = False
    QuestionValue = j & "/" & WorksheetFunction.RandBetween(1, PoolDepth)
    For k = 1 To QuestionsSeen.Count
      If QuestionsSeen(k) = QuestionValue Then
        InCollection = True
      Else
      End If
    Next k
    If InCollection Then
    Else
      QuestionsSeen.Add (QuestionValue)
    End If
  Next j
Loop

Breaking this down into English:

  • We are going to run this process until the number of items in our QuestionsSeen collection is the total size of the question pool, aka the number of question slots multiplied by the depth of the question pool for each slot
  • We increment the number of loops by 1 (we start at 0 so we do this for every loop)
  • We run a for-loop that is the length of the exam, simulating a single taking of the test
  • For each question, we generate a random candidate in a format such as 1/2 – the question slot / the specific option that showed up this time
  • We then check through the QuestionsSeen collection to see if we have seen this one before
  • If we haven’t, we add the question to QuestionsSeen
  • We continue until all questions have been seen

The final output of this code will be a column listing how many exams had to be taken to exhaust the question bank for each simulation. From this we can get a sense of our expectation. For example, if we have a 30-question slot exam with four candidate questions for each slot, on average we should expect a test-taker to need to take a little over 19 exams to see all the questions. But more importantly, we can see how to use collections, For loops, Do While loops, and a few variables to model a complex scenario.

You can check out the full code in the attached file.

Excel community

This article is brought to you by the Excel Community where you can find additional extended articles and webinar recordings on a variety of Excel related topics. In addition to live training events, Excel Community members have access to a full suite of online training modules from Excel with Business.

Excel polaroid
Archive and Knowledge Base

This archive of Excel Community content from the ION platform will allow you to read the content of the articles but the functionality on the pages is limited. The ION search box, tags and navigation buttons on the archived pages will not work. Pages will load more slowly than a live website. You may be able to follow links to other articles but if this does not work, please return to the archive search. You can also search our Knowledge Base for access to all articles, new and archived, organised by topic.

Open AddCPD icon

Add Verified CPD Activity

Introducing AddCPD, a new way to record your CPD activities!

Log in to start using the AddCPD tool. Available only to ICAEW members.

Add this page to your CPD activity

Step 1 of 3
Download recorded
Download not recorded

Please download the related document if you wish to add this activity to your record

What time are you claiming for this activity?
Mandatory fields

Add this page to your CPD activity

Step 2 of 3
Mandatory field

Add activity to my record

Step 3 of 3
Mandatory field

Activity added

An error has occurred
Please try again

If the problem persists please contact our helpline on +44 (0)1908 248 250