How does one fit a quadratic-plateau model in R?

Like this:

`quad_plateau(data)`

Or at least that is how it works once I explain a few things. Like my post on linear-plateau models in R, this post is for the person^{1} who has been googling how to fit quadratic-plateau models in R. This post provides my creative function that helps one do quadratic-plateau fits *and* make a `ggplot`

style figure all at once.

So first, get out your library card and borrow some packages.

```
library(tidyverse) # for data manipulation and plotting and so on
library(minpack.lm)# for the model fitting
library(broom) # for getting model info in a tidy way
library(rcompanion)# for the r-squared
library(nlstools) # for confidence intervals
library(agridat) # for the data
```

## The Data

The data is from some cotton trials in the `agridat`

package. Basically, more potassium measured in the soil should correlate to more cotton yield, but likely the relationship will not be linear. Fitting a model to determine something about this relationship is central to the soil-test correlation process. Let’s plot it to see what we’re working with.

```
cotton <- tibble(potassium = agridat::cate.potassium$potassium,
yield = agridat::cate.potassium$yield)
cotton %>%
ggplot(aes(potassium, yield)) +
geom_point(size = 2, alpha = 0.5) +
geom_rug(alpha = 0.5, length = unit(2, "pt")) +
scale_x_continuous(breaks = seq(0, 1000, 20)) +
scale_y_continuous(breaks = seq(0, 200, 5)) +
labs(title = paste(
"Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
y = "Relative yield (%)",
x = "Concentration of Soil Potassium (ppm)")
```

The relationship between soil-test potassium (STK) and relative cotton yield may be nonlinear^{2}. Perhaps a polynomial function could be fit, or the data could be transformed, but we’ll fit a non-linear model known as the quadratic-plateau. Quadratic-plateau (QP) models are also called **quad-plats**^{3}. Quad-plats are good in that they have a curved component (important to biological systems) followed by a join point and a zero-slope plateau (convenient for researchers). Notice how the relative yield of the cotton does not appear to increase anymore from about 60-200 ppm K (the plateau!)

## The Function(s)

The equation for the QP is essentially that of a quadratic up until the join point: y = B0 + B1x + Bx^2. While we could define the QP equation in the `nls()`

function later, lets make a function. Actually, let’s make two functions^{4}.

```
# b0 = intercept
# b1 = slope
# b2 = quadratic term
# jp = join point = critical concentration
qp <- function(x, b0, b1, b2) {
jp <- -0.5 * b1 / b2
if_else(
condition = x < jp,
true = b0 + (b1 * x) + (b2 * x * x),
false = b0 + (b1 * jp) + (b2 * jp * jp)
)
}
qp_jp <- function(x, b0, b1, jp) {
b2 <- -0.5 * b1 / jp
if_else(
condition = x < jp,
true = b0 + (b1 * x) + (b2 * x * x),
false = b0 + (b1 * jp) + (b2 * jp * jp)
)
}
```

These functions say that up until some join-point, the relationship is a second-order polynomial (quadratic), after which it hits a flat line (plateau). This model is sometimes also called quadratic*- plus-*plateau. That join-point is important; in the context of soil fertility it represents a critical concentration for the nutrient in the soil. The join point of a quad-plat will always be higher than with a linear-plateau, an important consideration for another time and place.

The `qp`

and `qp_jp`

functions will be used inside the `nlsLM()`

functions inside my `quad_plateau()`

function soon. So look for them there.

Next is my big `quad_plateau()`

function that fits the QP model and either gets useful numbers into a table or plots it visually. I’m not a function wizard, but I’m trying, so forgive any poor coding practices. That’s why in my function, I’ve just put in the variables `yield`

and `potassium`

. Here are the steps of the function:

Get initial values for the

`nlsLM()`

function coming up.- this can be done reliably by fitting a quadratic model, and yanking out coefficients.

Run

`nlsLM()`

^{5}with the`qp()`

and`qp_jp()`

functions as part of the formula.Make a null model, and get a pseudo-R

^{2}value.Get a confidence interval for the join-point (critical soil test value).

Get coefficients from model for use in table and plot.

Create a table of results.

OR

Plot the cotton yield data with a regression line using

`nlsLM`

and the model final values as starting values.

```
quad_plateau <- function(data, confidence = 95, plot = FALSE) {
# initial values
start <- lm(yield ~ poly(potassium, 2, raw = TRUE),
data = data)
start_b0 <- start$coef[[1]]
start_b1 <- start$coef[[2]]
start_b2 <- start$coef[[3]]
start_jp <- mean(data$potassium)
### Quad plateau model ###
# nlsLM tends to converge/work better on average than nls, IMO
try(corr_model <<- minpack.lm::nlsLM(
formula = yield ~ qp(potassium, b0, b1, b2),
data = data,
start = list(b0 = start_b0,
b1 = start_b1,
b2 = start_b2)
))
try(corr_model_jp <<- minpack.lm::nlsLM(
formula = yield ~ qp_jp(potassium, b0, b1, jp),
data = data,
start = list(b0 = start_b0,
b1 = start_b1,
jp = start_jp)
))
# Define null model
nullfunct <- function(x, m) {
m
}
m_ini <- mean(data$yield)
null <- nls(yield ~ nullfunct(potassium, m),
data = data,
start = list(m = m_ini),
trace = FALSE)
# Find p-value and pseudo R-squared
model_error <- round(summary(corr_model)$sigma, 2)
pseudo_r2 <- nagelkerke(corr_model, null)
label_pseudo_r2 <- round(pseudo_r2$Pseudo.R.squared.for.model.vs.null[3], 2)
# print and set confidence interval (use JP model)
ci <- nlstools::confint2(corr_model_jp, level = (confidence / 100))
ci_lower <- round(ci[3, 1], 0)
ci_upper <- round(ci[3, 2], 0)
# get model coefficients
b0 <- tidy(corr_model)$estimate[1]
b1 <- tidy(corr_model)$estimate[2]
b2 <- tidy(corr_model)$estimate[3] # quadratic term of equation
# critical concentration value = join point of segmented regression
cc <- tidy(corr_model_jp)$estimate[3]
plateau <- round(b0 + (b1 * cc) + (b2 * cc * cc), 1)
equation <- paste0(round(b0, 1), " + ",
round(b1, 2), "x + ",
round(b2, 3), "xx")
se_b0 <- round(tidy(corr_model)$std.error[1], 2)
se_b1 <- round(tidy(corr_model)$std.error[2], 2)
se_b2 <- round(tidy(corr_model)$std.error[3], 3)
pval_b0 <- tidy(corr_model)$p.value[1]
pval_b1 <- tidy(corr_model)$p.value[2]
pval_b2 <- tidy(corr_model)$p.value[3]
pval_b0 <- case_when(
tidy(corr_model)$p.value[1] > 0.05 ~ "NS",
tidy(corr_model)$p.value[1] <= 0.001 ~ "***",
tidy(corr_model)$p.value[1] <= 0.01 ~ "**",
tidy(corr_model)$p.value[1] <= 0.05 ~ "*")
pval_b1 <- case_when(
tidy(corr_model)$p.value[2] > 0.05 ~ "NS",
tidy(corr_model)$p.value[2] <= 0.001 ~ "***",
tidy(corr_model)$p.value[2] <= 0.01 ~ "**",
tidy(corr_model)$p.value[2] <= 0.05 ~ "*")
pval_b2 <- case_when(
tidy(corr_model)$p.value[3] > 0.05 ~ "NS",
tidy(corr_model)$p.value[3] <= 0.001 ~ "***",
tidy(corr_model)$p.value[3] <= 0.01 ~ "**",
tidy(corr_model)$p.value[3] <= 0.05 ~ "*")
# Printouts
if (plot == FALSE) {
tibble(
equation,
plateau,
critical = round(cc, 0),
ci_lower,
ci_upper,
r_square = label_pseudo_r2,
model_error,
AIC = round(AIC(corr_model),0),
BIC = round(BIC(corr_model),0),
se_b0,
se_b1,
se_b2,
pval_b0,
pval_b1,
pval_b2
)
} else {
# Residual plots and normality
#nls_resid <- nlsResiduals(corr_model)
#plot(nls_resid, which = 0)
# ggplot of data
data %>%
ggplot(aes(x = potassium, y = yield)) +
geom_point(size = 2, alpha = 0.5) +
geom_rug(alpha = 0.5, length = unit(2, "pt")) +
geom_vline(xintercept = round(cc, 0),
linetype = 3,
alpha = 0.5) +
geom_line(
stat = "smooth",
method = "nlsLM",
formula = y ~ qp(x, b0, b1, b2),
method.args = list(start = as.list(coef(corr_model))),
se = FALSE,
alpha = 0.5
) +
annotate(
"text",
label = paste(
equation,
"\nR^2 =",
label_pseudo_r2,
"\nPlateau =",
round(plateau, 1),
"% RY",
"\nCritical conc. =",
round(cc, 0),
"ppm",
"\nConf Int [",
ci_lower,
",",
ci_upper,
"]"
),
x = max(data$potassium) * 0.7,
y = min(data$yield) + 5,
hjust = 0,
family = "Roboto Condensed"
) +
scale_x_continuous(breaks = seq(0, 1000, 20)) +
scale_y_continuous(breaks = seq(0, 200, 5))
}
}
```

## The result

So how do you fit a quadratic-plateau model in R? Like this:

`quad_plateau(cotton)`

```
## # A tibble: 1 x 15
## equation plateau critical ci_lower ci_upper r_square model_error AIC BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 12.9 + 1.~ 95.3 86 45 127 0.68 10.8 187 192
## # ... with 6 more variables: se_b0 <dbl>, se_b1 <dbl>, se_b2 <dbl>,
## # pval_b0 <chr>, pval_b1 <chr>, pval_b2 <chr>
```

The model is written out in a data frame, ready for other use and manipulation. This is really helpful if for example you have a grouped data frame of different fertilizer trials, and want to perform a QP fit to each group using something like `group_modify()`

.

If you want to plot it along with an overlay of model results, use the `plot = TRUE`

argument and add your labels. Use `group_map()`

if you want to “loop” this over multiple groups in your data frame.

```
quad_plateau(cotton, plot = TRUE) +
labs(title = paste(
"Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
y = "Relative yield (%)",
x = "Soil Test Potassium (ppm)")
```

The critical soil concentration for potassium appears to be around 86 ppm, higher than the linear-plateau model. The confidence interval for the join-point is also wider than the “lp” model. In fact, the confidence interval is just wide. This is one of the problems with getting the confidence limits from the join point, rather than at a lower point such as 95% of the maximum (plateau), but this is to be discussed at another time and place.

The model summary shows that the quadratic term is not actually statistically significant. It may be then that a linear-plateau model would be better for this dataset. The pseudo-R^{2} is about 0.67, so pretty good for noisy crop yield data. Still, the distribution of points has gaps in it, which limits the confidence of the correlation.

Now, what if the two points with “high” STK (> 180 ppm), were removed? How would that change the analysis? Go back and test yourself^{6}. How do other models compare? Linear-plateau? Mitscherlich-Bray?

## Other resources about nonlinear functions in R

`rcompanion`

package - gave me a good jumpstart and has some useful functions inside`nlstools`

package`nlraa`

package by Fernando Miguez- he also has a series of YouTube videos worth checking out.

`easynls`

package - it’s quick and dirty, but not as useful as it could be`nls.multstart`

package - tries lots of starting values to try and better get at the best values for the equation. But I think for QP models, my approach works fine and using multiple starting values is unnecessary.

Probably a grad student↩︎

No one will judge you if you hold a clear, plastic ruler up to your computer screen.↩︎

This isn’t true, but I wish it was.↩︎

I’m humble enough to admit that, as of now, I can’t figure out how to write one function that gets me 4 model parameters instead of three AND doesn’t yell at me for creating a singularity in space or something. So I’m using two functions, one to get the first three (intercept, slope, and quadratic term) and then the second to get the join point.↩︎

`nlsLM()`

seems more reliable overall than plain`nls()`

from base R. I’ve had decent luck with nls(), but sometimes it fails.`nlsLM()`

seems to fail less, probably due to using a different algorithm. You can test their similarity in reaching the same final results by using the output of`nlsLM()`

as starting values in`nls()`

.↩︎Hint:

`filter()`

↩︎