How does one fit a quadratic-plateau model in R?

Like this:

`quad_plateau(data)`

Or at least that is how it works once I explain a few things. Like my post on linear-plateau models in R, this post is for the person^{1} who has been googling how to fit quadratic-plateau models in R. This post provides three ways to accomplish this, and also shows off my ~~convoluted~~ creative function that helps one do quadratic-plateau fits *and* make a `ggplot`

style figure all at once.

So first, get out your library card and borrow some packages.

```
library(tidyverse) # for data manipulation and plotting and so on
library(nlraa) # for self-starting models
library(minpack.lm)# alternative for nls model making
library(broom) # for getting model info in a tidy way
library(modelr) # for the r-squared and rmse
library(nlstools) # for Wald confidence intervals
library(agridat) # for the data
```

## The Data

The data is from some cotton trials in the `agridat`

package. Basically, more potassium measured in the soil should correlate to more cotton yield, but likely the relationship will not be linear. Fitting a model to determine something about this relationship is central to the soil-test correlation process. Let’s plot it to see what we’re working with.

```
cotton <- tibble(x = agridat::cate.potassium$potassium,
y = agridat::cate.potassium$yield)
cotton %>%
ggplot(aes(x, y)) +
geom_point(size = 2, alpha = 0.5) +
geom_rug(alpha = 0.5, length = unit(2, "pt")) +
scale_x_continuous(breaks = seq(0, 1000, 20)) +
scale_y_continuous(breaks = seq(0, 200, 10)) +
labs(title = paste(
"Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
y = "Relative yield (%)",
x = "Concentration of Soil Potassium (ppm)")
```

The relationship between soil-test potassium (STK) and relative cotton yield may be nonlinear^{2}. Perhaps a polynomial function could be fit, or the data could be transformed, but we’ll fit a non-linear model known as the quadratic-plateau. Quadratic-plateau (QP) models are also called **quad-plats**^{3}. Quad-plats are good in that they have a curved component (important to biological systems) followed by a join point and a zero-slope plateau (convenient for researchers). Notice how the relative yield of the cotton does not appear to increase anymore from about 70-200 ppm K (the plateau!).

## The QP Function

The equation for the QP is essentially that of a quadratic up until the join point: y = B0 + B1x + Bx^{2}. While we could define the QP equation in the `nls()`

function later, let’s make a function now.

```
# b0 = intercept
# b1 = slope
# b2 = quadratic term (curvy bit)
# jp = join point = critical concentration
qp <- function(x, b0, b1, jp) {
b2 <- -0.5 * b1 / jp
if_else(
condition = x < jp,
true = b0 + (b1 * x) + (b2 * x * x),
false = b0 + (b1 * jp) + (b2 * jp * jp)
)
}
```

This function says that up until some join-point, the relationship is a second-order polynomial (quadratic), after which it hits a flat line (plateau). This model is sometimes also called quadratic*- plus-*plateau. That join-point is important. In the context of soil fertility it represents a critical concentration for the nutrient in the soil. The join point of a quad-plat will always be higher than with a linear-plateau, an important consideration for another time and place.

The `qp`

function will be used inside the base R `nls()`

function inside my `quad_plateau()`

function later. So look for them there.

# Method 1: base R

Using the `nls()`

function in base R requires that we provide the formula (in this case the QP function), the data, and starting values. **Starting values are critical**^{4}, and the computer doesn’t really know where to start, so poor starting values may result in failures of the model to converge as the algorithms working their tails off behind the scenes try to home in on the best values. It helps, though, to know that 1) the slope will be positive, 2) the quadratic term will therefore be negative, and 3) the join point will certainly not be less than your minimum soil test value (and ideally not more than your maximum).

### Get starting values by guessing

Eyeballing that scatter plot, I’m trying to visualize in my mind’s eye a quadratic line coming up and bending over at a join point. The join point is easiest to imagine, and I’d guess it’s somewhere between 60 and 80 ppm K. Going to the left, that imaginary quadratic line looks like it might intercept the y-axis around 0. The slope of that line would be positive, and looks like maybe it’s around 2 (2% RY gain for every 1 ppm increase).

```
nls(formula = y ~ qp(x, b0, b1, jp),
data = cotton,
start = list(b0 = 0,
b1 = 2,
jp = 70))
```

```
## Nonlinear regression model
## model: y ~ qp(x, b0, b1, jp)
## data: cotton
## b0 b1 jp
## 12.864 1.912 86.197
## residual sum-of-squares: 2463
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 4.329e-06
```

And it worked! The response curve is 12.8 + 1.91x - ??x^{2} up to 86 ppm K. Wait what is the quadratic term? It’s not in the model, but it is in the QP formula which is B2 = -0.5*B1/JP, or -0.011.

### Get starting values with *augmented* guessing

Another way to estimate starting values is to mess around with sliders on a graphing calculator like Desmos. If the equation is a + bx - cx^{2}, for example, then you can move the sliders and their value ranges until you get something you could imagine would go through your points. So zoom out until the x-axis and y-axis ranges match your dataset, then move sliders, then use those values in the starting values list like before.

### Get starting values by fitting a quadratic polynomial

If you’re a sane person, you might decide to get starting values for a QP model by fitting a regular ol’ quadratic polynomial^{5}. From this the intercept and slope can be estimated more precisely, and the join point can then be estimated by taking the mean or median of the soil test values. In fact, we could wrap all of these steps into a function.

```
fit_qp <- function(data) {
# get starting values for nls
start <- lm(y ~ poly(x, 2, raw = TRUE), data = data)
# make model
fit <- nlsLM(formula = y ~ qp(x, b0, b1, jp),
data = data,
start = list(b0 = start$coef[[1]],
b1 = start$coef[[2]],
jp = mean(data$x)))
return(fit)
}
fit_qp(cotton)
```

```
## Nonlinear regression model
## model: y ~ qp(x, b0, b1, jp)
## data: data
## b0 b1 jp
## 12.865 1.912 86.198
## residual sum-of-squares: 2463
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 1.49e-08
```

It worked again. Also, did you notice that I used `nlsLM`

instead of `nls`

in my function? More on that later.

# Method 2: `nlraa`

Better than providing starting values to `nls`

or `nlsLM`

is to use a function that can *“self-start”*. The `nlraa`

package provides just that, a self-starting function for a quadratic-plateau model. There are two functions available, but the one with 4 parameters seems to have issues, so let’s just use the `SSquadp3`

three parameter function.

```
nls(formula = y ~ SSquadp3(x, b0, b1, b2),
data = cotton)
```

```
## Nonlinear regression model
## model: y ~ SSquadp3(x, b0, b1, b2)
## data: cotton
## b0 b1 b2
## 12.86429 1.91241 -0.01109
## residual sum-of-squares: 2463
##
## Number of iterations to convergence: 8
## Achieved convergence tolerance: 1.441e-06
```

Flawless. This function optimizes the starting values through some sort of searching ^{6}. One difference though is that it doesn’t give the join-point. Instead, calculate as JP = -0.5 * B1 / B2. One downside with not getting a join-point in the model is that confidence intervals around it cannot be estimated as quickly.

# Method 3: Convoluted way

Next is my big `quad_plateau()`

^{7} that fits the QP model and either gets useful numbers into a table or plots it visually. Getting results into a table means that this function is compatible with *tidyverse/tidymodels* style work, usable on grouped data frames and so on. Getting results into a plot could be done with a separate function, but why not cram it all into one? Here are the steps of the function:

Get starting values for the

`nlsLM()`

function by fitting a quadratic polynomial.Run

`nlsLM()`

^{8}with the`fit_qp`

function (defined earlier).Test goodness-of-fit with R

^{2}, RMSE, and model error.Get model coefficients for response curve equation.

Get a confidence interval for the join-point (critical soil test value).

Optional: look at standard error and p-values for the coefficients, though these may not matter much.

Optional: get the soil test value at some point below the join-point, such as 95% of the maximum yield plateau.

Create a table of results.

OR

Plot the cotton yield data with a regression line using

`nlsLM`

and the model final values as starting values (which all ensure identical results).

```
quad_plateau <- function(data,
alpha = 0.05,
confidence = 95,
percent_of_max = 95,
plot = FALSE) {
# uses the fit_qp function defined earlier
try(corr_model <- fit_qp(data))
# How did the model do overall?
rsquared <- round(modelr::rsquare(corr_model, data), 2)
rmse <- round(modelr::rmse(corr_model, data), 2)
model_error <- round(summary(corr_model)$sigma, 2)
# get model coefficients
b0 <- tidy(corr_model)$estimate[1]
b1 <- tidy(corr_model)$estimate[2]
# join point of segmented regression = critical soil test value
jp <- tidy(corr_model)$estimate[3]
b2 <- -0.5 * b1 / jp
# equation from QP function
plateau <- round(b0 + (b1 * jp) + (b2 * jp * jp), 1)
equation <- paste0(round(b0, 1), " + ",
round(b1, 2), "x + ",
round(b2, 3), "xx")
# get wald confidence interval of join point at 100%
ci <- nlstools::confint2(corr_model, level = confidence / 100)
lower_cl <- round(ci[3, 1], 0)
upper_cl <- round(ci[3, 2], 0)
# CSTV at defined % of max/plateau (use percent_of_max argument)
adjust_cstv <- function(b0, b1, b2) {
jp <- -0.5 * b1 / b2
newplateau <- (b0 + (b1 * jp) + (b2 * jp * jp)) * percent_of_max / 100
newb0 <- b0 - newplateau
discriminant <- (b1 ^ 2) - (4 * newb0 * b2)
if (discriminant < 0) {
return(NA)
}
else if (discriminant > 0) {
cstv_adj <- (-b1 + sqrt(discriminant)) / (2 * b2)
return(cstv_adj)
}
}
cstv_adj <- round(adjust_cstv(b0, b1, b2), 0)
# Printouts
if (plot == FALSE) {
tibble(
equation,
plateau,
join_point = round(jp, 0),
lower_cl,
upper_cl,
rsquared,
rmse,
model_error,
AIC = round(AIC(corr_model),0)
)
} else {
# Residual plots and normality
nls_resid <- nlstools::nlsResiduals(corr_model)
plot(nls_resid, which = 0)
predicted <- nlraa::predict_nls(corr_model,
newdata = data,
interval = "confidence") %>%
as_tibble() %>%
bind_cols(data)
# ggplot of data
predicted %>%
ggplot(aes(x, y)) +
geom_ribbon(aes(y = Estimate,
ymin = Q2.5,
ymax = Q97.5),
alpha = 0.05) +
ggpp::geom_vhlines(xintercept = jp,
yintercept = plateau,
alpha = 0.5,
linetype = 3) +
ggpp::geom_vhlines(xintercept = cstv_adj,
yintercept = plateau * percent_of_max / 100,
alpha = 0.3,
linetype = 3) +
geom_line(
stat="smooth",
method = "nlsLM",
formula = y ~ qp(x, b0, b1, jp),
method.args = list(start = as.list(coef(corr_model))),
se = FALSE,
color = "#CC0000"
) +
geom_point(size = 2, alpha = 0.5) +
geom_rug(alpha = 0.2, length = unit(2, "pt")) +
annotate("text",
label = paste0("y = ", equation,
"\nR-squared = ", rsquared,
"\nRMSE = ", rmse,
"\nCSTV at ", round(plateau, 1), "% is ", round(jp, 0)," ppm (CI ", lower_cl, "-", upper_cl, ")",
"\nAdjusted CSTV at ", percent_of_max, "% of max = ", cstv_adj, " ppm"),
x = max(data$x) * 0.6,
y = quantile(data$y, 0.10),
hjust = 0,
family = rc
) +
labs(x = "Soil test K (ppm)",
y = "Relative yield (%)",
caption = paste("Each point is a site. n =", nrow(data)))
}
}
```

## Convoluted result

So how do you fit a quadratic-plateau model in R? Like so:

`quad_plateau(cotton)`

```
## # A tibble: 1 x 9
## equation plateau join_point lower_cl upper_cl rsquared rmse model_error AIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 12.9 + ~ 95.3 86 45 127 0.68 10.1 10.8 187
```

The model is written out in a data frame, ready for other use and manipulation. This is really helpful if for example you have a grouped data frame of different fertilizer trials, and want to perform a QP fit to each group using something like `group_modify()`

.

If you want to plot it along with an overlay of model results, use the `plot = TRUE`

argument and add your labels. Use `group_map()`

if you want to “loop” this over multiple groups in your data frame.

`quad_plateau(cotton, plot = TRUE)`

The critical soil concentration for potassium appears to be around 86 ppm, higher than the linear-plateau model. The confidence interval for the join-point is also wider than the LP model. In fact, the confidence interval is just wide (45-127!). This problematic width is partially due to getting the confidence limits at the join point, rather than at a lower point such as 99 or 95% of the maximum yield plateau. But in the case of this dataset, I’d say the problem has to do with data availability. There are gaps in the data, including a hole between 40-80 ppm STK and 65-90% RY. Basically the model lacks confidence in the fit approaching the join point, and this is reflected in the wide gray RY confidence band shown around the red line in the plot. The experimenter needs more data in that gap.

Now, what if the two points with “high” STK (> 180 ppm), were removed? How would that change the analysis? Go back and test yourself^{9}. How do other models compare? Linear-plateau? Mitscherlich-Bray?

## More resources about nonlinear functions in R

`rcompanion`

package - gave me a good jumpstart and has some useful functions inside`nlstools`

package`nlraa`

package by Fernando Miguez- he also has a series of YouTube videos worth checking out.

`easynls`

package - it’s quick and dirty, but not as useful as it could be`nls.multstart`

package - tries lots of starting values to try and better get at the best values for the equation. But I think for QP models, my approach works fine and using multiple starting values is unnecessary.

# Bonus: Quick figure with QP line

What if you just want a quick figure of a QP fitted to your scatter plot points, and do not care so much about the model results and information, etc? Using the self-starting function from `nlraa`

, try the following:

```
cotton %>%
ggplot(aes(x, y)) +
geom_point(size = 2,
alpha = 0.5) +
geom_line(stat="smooth",
method = "nlsLM",
formula = y ~ SSquadp3(x, b0, b1, jp),
se = FALSE,
color = "#CC0000") +
labs(title = paste(
"Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
y = "Relative yield (%)",
x = "Concentration of Soil Potassium (ppm)")
```

Congratulations! You are now certified to quad-plat in R.

Probably a grad student↩︎

No one will judge you if you hold a clear, plastic ruler up to your computer screen.↩︎

This isn’t true, but I wish it was.↩︎

What if the starting values result in a failure of converge? Check out these tips from Dr. Miguez.↩︎

Also known as a quadrapoly.↩︎

The trade-off is that it takes a little longer to run than my

`fit_qp`

function. You probably won’t notice a difference unless you are running it 1000 times in a bootstrapping analysis or unless you are super fast with a stopwatch. The reliability of the`SSquadp3`

function can be worth the trade-off.↩︎I’m not a function wizard, but I’m trying, so forgive any poor coding practices.↩︎

`nlsLM()`

seems more reliable overall than plain`nls()`

from base R. I’ve had decent luck with nls(), but sometimes it fails.`nlsLM()`

seems to fail less, probably due to using a different algorithm. You can test their similarity in reaching the same final results by using the output of`nlsLM()`

as starting values in`nls()`

.↩︎Hint:

`filter()`

↩︎