The `rcompanion`

package is the main reason I figured out how to fit non-linear models to agricultural data. Go reference it here. I’ll wait.

This post provides minimal explanations, and is intended for the person who has been googling how to fit linear-plateau models in R. This post also provides the function within another function that sort of helps one do linear-plateau fits *and* make a `ggplot`

style figure all at once. So first, load some packages.

```
library(tidyverse)
library(rcompanion)
library(minpack.lm)
library(nlstools)
library(agridat)
```

## The data

The data is from some cotton trials in the `agridat`

package. Basically, more potassium measured in the soil should correlate to more cotton yield, but the relationship will not be linear. Fitting a model to determine something about this relationship is central to the soil-test correlation process. Let’s plot it.

```
cotton <- agridat::cate.potassium
cotton %>%
ggplot(aes(x = potassium, y = yield)) +
geom_point(size = 1, alpha = 0.5) +
geom_rug(alpha = 0.5, length = unit(2, "pt")) +
scale_x_continuous(breaks = seq(0, 1000, 20)) +
scale_y_continuous(breaks = seq(0, 200, 5)) +
labs(title = paste(
"Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
y = "Relative yield (%)",
x = "Concentration of Soil Potassium (ppm)")
```

The relationship between soil-test potassium (STK) and relative cotton yield may be non-linear. Perhaps a polynomial function could be fit, or the data could be transformed, but we’ll fit a non-linear model known as the linear-plateau.

```
## Linear-plateau model
lp <- function(x, a, b, c) {
if_else(condition = x < c,
true = a + b * x,
false = a + b * c)
}
```

This function says that up until some join-point, the relationship is linear, after which the stick breaks and it is a flat line (plateau). Sometimes this model is called *broken-stick* or *linear-plus-plateau*. That join-point is important; it represents a critical concentration for the nutrient in the soil.

Below is my big function to fit it, get some useful numbers, and plot it. I’m not a function wizard, but I’m trying, so forgive any poor coding practices. That’s why in my function, I’ve just put in the variables `yield`

and `potassium`

,

## What is going on in the following ‘function’?

- Get initial values for the
`nlsLM()`

function coming up.

- this can be done by fitting a linear model, and yanking out coefficients.

- Run
`nlsLM()`

^{1}with the`lp()`

model as part of the formula. - Look at your model summary.
- Make a null model, and get a pseudo-R
^{2}value. - Get confidence interval for the join-point.
- Look at residuals.
- Get coefficients from model for use in
`ggplot`

. - Make a line from coefficients, and plot it with the cotton yield data.

```
# run model and graph all in one!
lin_plateau <- function(data) {
# initial values
ini_fit <- lm(data = data, formula = yield ~ potassium)
ini_a <- ini_fit$coef[[1]]
ini_b <- ini_fit$coef[[2]]
ini_c <- mean(data$potassium)
# linear plateau model
lp_model <<- nlsLM(
formula = yield ~ lp(potassium, a, b, c),
data = data,
start = list(a = ini_a, b = ini_c, c = ini_c)
)
print(summary(lp_model))
# Define null model
m.ini <- mean(data$yield)
nullfunct <- function(x, m) {
m
}
null <- nls(yield ~ nullfunct(potassium, m),
data = data,
start = list(m = m.ini),
trace = FALSE,
nls.control(maxiter = 1000))
# Find p-value and pseudo R-squared
pseudo_r2 <- nagelkerke(lp_model, null)[2]
print(pseudo_r2)
# print and set confidence interval
confinterval <- confint2(lp_model, level = 0.95)
conflo <- round(confinterval[3, 1], 1)
confhi <- round(confinterval[3, 2], 1)
print("Confidence Interval")
print(confinterval)
plotNormalHistogram(residuals(lp_model))
plot(fitted(lp_model), residuals(lp_model))
a <- coef(lp_model)[[1]]
b <- coef(lp_model)[[2]]
cc <- coef(lp_model)[[3]]
plateau <- round(a + b * cc, 1)
print(paste("Plateau at ", plateau, "% yield"))
# other plot
data %>%
ggplot(aes(x = potassium, y = yield)) +
geom_point(size = 1, alpha = 0.5) +
geom_rug(alpha = 0.5, length = unit(2, "pt")) +
geom_vline(xintercept = cc,
color = "#CC0000") +
geom_vline(
xintercept = c(conflo, confhi),
color = "#CC0000",
alpha = 0.2
) +
geom_smooth(method = "nlsLM",
formula = y ~ lp(x, a, b, c),
method.args = list(start = as.list(coef(lp_model))),
se = FALSE,
color = "#4156a1",
size = 0.5) +
scale_x_continuous(breaks = seq(0, 1000, 20)) +
scale_y_continuous(breaks = seq(0, 200, 5)) +
labs(subtitle = paste0("Critical concentraion = ", round(cc, 0),
" ppm; 95% CL [", conflo, ", ", confhi, "]"))
}
```

## The result

So how do you fit a linear-plateau model in R? Like this:

```
lin_plateau(cotton) +
labs(title = paste(
"Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
y = "Relative yield (%)",
x = "Concentration of Soil Potassium (ppm)")
```

```
##
## Formula: yield ~ lp(potassium, a, b, c)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 39.3394 9.4833 4.148 0.000456 ***
## b 0.7489 0.2077 3.606 0.001659 **
## c 75.0000 10.7277 6.991 6.66e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.06 on 21 degrees of freedom
##
## Number of iterations to convergence: 13
## Achieved convergence tolerance: 1.49e-08
##
## $Pseudo.R.squared.for.model.vs.null
## Pseudo.R.squared
## McFadden 0.126276
## Cox and Snell (ML) 0.662275
## Nagelkerke (Cragg and Uhler) 0.662397
##
## [1] "Confidence Interval"
## 2.5 % 97.5 %
## a 19.6177161 59.061050
## b 0.3170364 1.180782
## c 52.6905311 97.309472
```

`## [1] "Plateau at 95.5 % yield"`

```
ggsave("static/linear-plateau.png",
dpi = 300, width = 7, height = 5)
```

The critical soil concentration for potassium appears to be around 75 ppm, though the confidence interval is pretty wide. This is probably due to the lack of data points. The model summary shows that most of the error comes from the the *a* and *c* parameters, essentially the intercept and join-point. The psuedo-R^{2} is about 0.66, so not too bad. Still, the distribution of points has gaps in it, which limits the confidence of the correlation.

Now, what if the two points with “high” STK (> 180 ppm), were removed? How would that change the analysis? Go back and test yourself^{2}. How do other models compare? Quadratic-plateau? Mitscherlich-Bray?