Introduction to resource estimation part 4: importance of the covariance function

In our previous post we have demonstrated that estimates are all roughly calculated the same way: multiply a set of weights with either all the known samples (IDW and simple Kriging), or by multiplying the weights with the covariance vector between the known location and all known samples (Dual Kriging, RBF and ML).

What is clear is that the latter methods produce at least as good an estimate as traditional kriging or IDW, but that those methods are a lot faster. That is why we’ll mainly focus on those, but with a little help from our good old friend simple Kriging and its partner Variography.

With this post we finalize our introduction to resource estimation and provide some practical insights into the choice of a variogram model (for all versions of Kriging), the RBF for implicit modelling, or a kernel for Machine Learning. Since they are all roughly the same (see previous post), understanding the basics applies to all interpolation methods.

Covariance matrix and covariance function

Let’s recap briefly how weights are calculated for Kriging:

\[\omega = K^{-1} k\verb|(|s_0 \verb|)|\]

We have the matrix K and the vector k. Both contain covariances. But what are they and how are they calculated? We’ll start with the matrix K. For simple Kriging it is defined as:

\[ K =
\begin{matrix}
C \verb|(| s_1 – s_1 \verb|)| & \dots & C \verb|(| s_1 – s_n \verb|)| \\
\vdots & \ddots & \vdots \\
C \verb|(| s_n – s_1 \verb|)| & \dots & C \verb|(| s_n – s_n \verb|)|
\end{matrix}
\]

The C in this matrix is called Covariance Function (CF) in this case. In Ordinary Kriging, the matrix is slightly different mainly due to the C’s being replaced by the semivariogram, γ, which is mostly 1 – C (if nugget,n is 0, otherwise n – nC), again C being a covariance function like in simple Kriging. This matrix is fundamental to most methods, but vary mostly in the CF used to calculate their values.

Covariance function terminology

Calculating the weights for methods like RBFs or SVMs all involve a covariance matrix K in some form, as well as a covariance vector k. So, are they all the same then? In essence, yes, but each method uses a different name. Some are slightly different but can be derived from each other, such as Covariance and Semi-variogram. Since one can be directly derived from another, we will all call those a CF. Let’s focus on different expressions being used for a CF:

  • Kriging: Covariances, Semivariograms, or variogram models
  • RBFs: Radial Basis functions
  • SVMs: Kernels
  • NNs: Activation Functions

To further highlight the similarities, let’s start with listing some common CF’s starting with some variogram models (we will assume a nugget is 0 and a sill of 1, as we just want to highlight the functions). Don’t be afraid of the functions too much, just check the major parameters used to define these functions.

Commonly used Covariance Functions (CF’s)

First, we need to define some variables, such as the distance d between the estimation location s0 and sample location sx as: $d = \|s_x – s_0 \|$, a range a and a control parameter similar to range, r.

Kriging

  • Gaussian: $exp \verb|(| – \frac{d^2}{r^2} \verb|)|$
  • Exponential: $exp \verb|(| -\frac{d}{r} \verb|)|$
  • Spherical: $1.5 \frac{d}{a} -0.5 \verb|(| \frac{d}{a} \verb|)|^3$, if d < a, otherwise 1, where a is the range

RBFs

  • Gaussian: as above
  • Inverse Multiquadric: $\frac{1}{\sqrt{1 + ( \frac{d}{r})^2}}$

SVMs

  • Gaussian: as above
  • Linear: $\langle s_i, s_j \rangle$, which is the dot product between to points Si and Sj which are vectors with X, Y and Z
  • Polynomial: $\langle s_i, s_j \rangle^n$, where n is 2 or more.

NNs

  • Gaussian: as above
  • Sigmoid: $\frac{1}{ (1 + exp^{-d})}$

Worth to note: each CF depends on the distance in some form, and a tuning parameter such as the range. Even though we did not write the Dot products for SVMs as a distance, in its simplest form the Dot product is actually the distance between two points, or more generally, between two vectors.

A word of warning: In mining, the term RBF is often used synonymously for implicit modelling, ie. referring to fitting RBFs. However, in Machine Learning the term RBF kernel or function is often used when specifically denoting the Gaussian function. Just keep in minds that an RBF can be a range of different functions and we will mostly refer to fitting RBFs as implicit modelling due to its common understanding in the mining industry.

Also, worth to note is that we specify the distances as the distance between our sample locations sx, referring to points in Euclidean space. In contrast, in RBF-speak and RBF NNs those locations are referred as Centers, whereas in SVMs they are referred to as basis vectors or support vectors.

What we hope to have demonstrated by listing some common CF’s is that most of them rely on a form of the distance between points. And although we are mostly dealing with spatial data (X, Y, Z) and thus defined the Euclidean distances, in ML distances can actually be calculated between any two arbitrary vectors. Hence, a distance could be between grade values, RGB values in images, or combinations of spatial coordinates and other values or even categorical data such as text after conversion to tokens (a completely different topic we won’t discuss here).

For good estimation this means that the choice of the method and CF are at the heart of any interpolation, followed by an appropriate value for the range. But what are good choices?

Choosing a covariance function

Until now we have somewhat falsely called all the functions above a CF. Some however are not strictly speaking a CF, but we’ll use this generic term anyway for all interpolation methods to have a common language.

It should be clear by now that the CF is crucial for any prediction as it is so fundamental to the calculation of the weights. Luckily geostats has developed a method to help estimate which CF to use. This is where variography comes into play.

Variography

Although we won’t go into detail on variography, the key here is to highlight that it is a tool that can help determine a good CF, however crude it may be, from the known data, and to get a feel for the range parameter.

In practice, especially in exploration phases, one of the common CF’s is used and the results are inspected visually to see if the resulting surface or shell fits the expected shape until detailed resource models need to be calculated. Without full analysis and just testing different CF’s makes a quick evaluation of the initial estimations essential as it allows for testing multiple scenarios (methods and or CF’s) in a short time.

Some other points of interest

Performance

A major factor in the development of the various estimation methods has been performance. Performance is an increasingly important consideration during drilling phases. There is a shift in the industry to provide near real-time updates of models. For small grids (large blocks) this is no issue and Kriging or even IDW can cope quite well. However, for fine grids (lots of blocks) slow methods become prohibitive and is why improvements to Kriging have been developed.

RBFs

As we mentioned, the shift from standard Kriging to Dual Kriging and RBFs greatly changed performance. This was further improved by the fast RBF method which introduced a special method for quickly taking into account that points far away will not play a major role in local estimates.

ML

SVMs, due to the way weights are calculated, can filter the input points to obtain a sub-set that still describes the geology or grades reliably. E.g. when interpolating grades, you will have many samples with 0.0 or ‘below detection’. Many of those will not significantly contribute to an estimate and can therefore be ignored: imagine a single point within a region receiving a high WF to replace many other points in that region all with a smaller weight.

NNs

NNs have been optimized for use on a GPU (Graphics programming unit). GPU are small CPU’s, but because there are so many of them it can calculate things in parallel quickly. However, a distinction between NNs and other methods is the way weights and centers are estimated. Instead of direct calculations like for SVMs and RBFs, NNs start with some random values for weights (and centers if wanted) and through ‘trial and error’ try to find the best weights.

Notice how in the RBF network above the kernel was also a Gaussian with centers c. In typical networks those centers would be trained from the data together with the weights. However, since we know those centers are our sample locations, we can ‘pre-train’ our RBF network by creating a number of hidden nodes equal to the number of sample points and setting their center to equal the sample locations.

A NNs can also reduce its centers, and more interestingly it can find better centers. Remember centers are typically at the sample locations, but for estimation they might not be in the optimal location. By training them, more optimal centers could be found to reduce the number of weights. This is purely theoretical as we have not tested this ourselves. A downside is that training will take significantly longer and typically requires much more data, especially for more complex (multi-layered) NNs.

Updating weights

Another difference between ML methods and RBFs is the ability to update weights without needing to recompute everything. RBFs will need to be recalculated completely if we have a new drill hole. For large data sets this might still take a significant amount of time. ML methods have been developed to allow ‘re-training’. This means updating the weights from an existing trained system. Again, we have not tested this ourselves (yet), but can become a powerful concept for future modelling systems.

Anisotropy

Nowhere in our previous text did we mention anything about anisotropy. We all know that anisotropy plays a huge role in geology so how does it affect the above. To partially answer that question, consider the use of the distance measure again in the kernels. In the case of a ‘global’ interpolation (global in the sense of a domain where the trend is constant within that domain), we can adjust our distance by taking the anisotropy into account. This means that instead of the normal distance, a scaled distance is used that is adjusted according to the anisotropy. This means that all we have written above remains the same except for our definition of the distance.

Final words

We hope our posts on resource estimation have been informative. What we hoped to have demonstrated is that estimation through weighted means is actually not that complicated. In the end, it is similar to just multiplying a set of weights with the measured sample values to obtain the estimate. The way the weights are calculated depends on the method used and the parameters set by a user. Adjusting the parameters can be done through trial and error, ie. adjust them until the shape makes geological sense, or through more advanced analysis using variography.

In the end, the most important settings are the influence of values at distances determined primarily through the choice of CF and an influence parameter such as the range. Using the same CF and parameters for various methods will produce very similar results and other factors like performance become more important.

References

Our work on NNs and SVMs is based mainly from over 20 years’ experience playing with NNs and several years with SVMs. Most information for this article was retrieved from various internet sources and no single reference would do justice to put it here. Just do an internet search, plenty of sources to be found on these subjects with one exception, the book below on SVMs.

Hardcore SVMs

  • Learning with Kernels, Schölkopf and Smola 2002

Articles

  • Fast multidimensionial interpolations, Horowitz et. Al, 1996
  • Possible alternatives to geostatistics, Henley and Watson, 1998
  • Generalisation of the moving average to moving statistics (unpublished work), Henley 2013

Hardcore geostats

  • Geostatistics – Modeling spatial uncertainty, Chiles and Delfiner

Recommended reading

Linear algebra

  • Coding the Matrix; Linear Algebra through computer science applications, Klein, 2013

Kriging and variography

  • Local Models for spatial analysis 2nd edition, Lloyd, 2011