Simple introduction to resource estimation part 2: From Kriging to Machine Learning

A more fancy way to create estimates: Kriging

Actually, the title should be “A fancy way to estimate the Weights”, because in part 1 we demonstrated that the quality of an estimate only depends on the weights we use. So, how do we find the ‘best’ weights for our estimates? This is where some statistics and mathematics come into the equation, although we will not discuss those in this post.

In the estimates in part 1, we only assumed we wanted to estimate at some unknown time in the future. Can we measure how good this estimation actually is? Not for something in the future, but it is a little different for spatial estimates such as grades. So, here we’ll switch to real world assay data and leave our little babies behind.

From estimating the future to grade shells

During drilling we sample grade values of some possible resource. This means that instead of measuring people’s lengths at some time, we measure the grade value at various positions with locations X,Y,Z. Estimates in this scenario are then produced at locations where samples have not been taken. This way we can estimate on various types of grids to produce grade shells.

In these situations can we control a little how good an estimation actually is. We can use information about estimates at known sample locations and compare them to our measured values there. That is in a nutshell what Kriging does.

Using sample locations to determine weights: Ordinary Kriging

In Kriging, calculating weights can then be thought of as a 2-stage process. In a first pass the similarity between each sample point and all other sample points is calculated. This similarity is calculated using a variogram model. In a second step, the location where the estimate itself is to be evaluated is used to further adjust the weighting factors based on the similarity between that estimation location and each known sample, using the same variogram model.

Variogram

Remember that for IDW we can use the distance directly or use its squared or cubed values to control how much influence points further away have on for our estimate. Likewise, in variography this is established by creating a variogram using either a spherical, Gaussian or exponential model (or a combination). This variogram model determines how quickly the distance loses influence on the estimates. Therefore, through variography we can get an idea of the influence the distance should have on our estimates. In general: the further a point is away from another point, the smaller the contribution to an estimate, just like with IDW.

We can now use some clever math to ‘solve’ this puzzle and obtain our final weights, which we multiply with our sample values to get our final estimate just like in our weighted average.

From Kriging to implicit modelling with RBFs

For Kriging, we mentioned how a variogram model determines the influence of far away points. The variogram model therefore is very similar to a Radial Basis Function (RBF). An RBF is a function that decays with increased distances. That is why you will see similar names used in ‘Implicit Modelling’, ie. RBFs. The most common RBFs are the Gaussian and the Spheroidal, but can also be some other ‘function’ or variogram model. This function is nothing more than a way to adjust the WFs with distance (again, compare with the IDW).

Same, same but different? Faster estimation methods

Now we understand that an RBF function is similar to a variogram model that determines how quickly weights decay with distance, why are they not the same and why are RBFs so much quicker to produce estimates?

Without going into details, recall in Kriging that we produce estimates in 2 stages. For larger datasets, these steps become very calculation intensive. But what if we can determine the weights of step 1 in one go and re-use them each time for an estimate? That is roughly what RBFs do and is called fitting or creating an interpolant: the weights between each sample and every other sample are determined only once and re-used each time thereafter. That is why RBFs take a bit of time initially, but then produce estimates much faster than naïve OK.

Selective points

However, also recall that to produce good estimates not only depends on the weights. The sample points themselves play an important role too. By selectively including or excluding points we can further control our estimations. In Kriging this is typically done through a search radius that determines which points are included locally for an estimate. In implicit modelling using fast RBFs advanced methods are used to speed up estimation. These typically depend on the distance between samples based on the idea that points too far away will not influence an estimate. This be compared a little bit to the use of a search ellipsoid.

From RBFs to Machine Learning (ML)

Support Vector Machines (SVMs)

In many Machine Learning applications, for example in Support Vector Machines (SVMs), a technique very similar to fitting RBFs is used through slightly different mathematics. Furthermore, in SVMs they talk about kernels. However, a kernel is just another name for an RBF or a variogram model. In fact, certain types of SVMs are nearly identical to fitting RBFs.

One benefit of SVMs is that it extracts the most important points needed for the estimations, ignoring samples that do not contribute (compare our initial example estimating the length of a baby from the Netherlands). The end result is still a set of weights for the significant sample points that are then used for subsequent estimations, providing the same benefit as RBFs once those initial weights are established.

Neural Networks (NNs)

An alternative ML implementation is a Neural Network, which forms the foundation for most ML well-known applications. NNs are the key ingredient for Deep Learning in systems such as Multi-Layer Perceptron (MLP), Convolution Neural Networks (CNNs) etc. NNs can be used for the exact same purposes as fitting RBFs or SVMs. A NN consists of a number of nodes, where each node contains an activation function and a weight.

The weights and function output of all nodes are then multiplied and added together (just like in weighted averaging). The activation function again can be thought of as a variogram model and often uses a Gaussian or Sigmoid functions as activation functions. A NN is then trained to adjust these functions and weights to produce estimates. In that process the NN produces an estimate and compares it to a known sample. The error in the estimate is used to adjust its internal weights and thus becomes like trial and error. There are some more subtleties, but this should give you a rough idea.

Final words

In this and the previous post we introduced the concept of a weighted average to estimate values. We showed that if all samples remain the same, the only difference is in how we establish the weights. Using more advanced methods to calculate those weights we are able to produce better estimates. A main factor is the influence of points further away which controlled by things like the the power in IDW, the variogram model in Kriging, an RBF in implicit modelling or a kernel in SVMs. The user can control these typically by choosing the used model and through setting a variables such as the range.