knitr::opts_chunk$set(collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, fig.height = 5, fig.width = 10)

The aim of `factorMerger`

is to provide a set of tools to support results from post hoc comparisons. Post hoc testing is an analysis performed after running *ANOVA* to examine differences between group means (of some response numeric variable) for each pair of groups (groups are defined by a factor variable).

This project arose from the need to create a method of post hoc testing which gives the hierarchical interpretation of relations between groups means. Thereby, for a given significance level we may divide groups into non-overlapping clusters.

In the current version the **factorMerger** package supports parametric models:

- one-dimensional Gaussian (with the argument
`family = "gaussian"`

), - multi dimensional Gaussian (with the argument
`family = "gaussian"`

), - binomial (with the argument
`family = "binomial"`

), - survival (with the argument
`family = "survival"`

).

There are four algorithms available: adaptive, fast-adaptive, fixed and fast-fixed (they are set with the `method`

argument of `mergeFactors`

). *Fast* algorithms enable to unite only those groups whose group statistics (i.e. means in the Gaussian case) are close.

To visualize functionalities of `factorMerger`

we use samples or real data examples with response variable whose distribution follow one of the listed above. The corresponding factor variable is sampled uniformly from a finite set of a size $k$.

To do so, we may use function `generateSample`

or `generateMultivariateSample`

.

library(factorMerger) library(knitr) library(dplyr) randSample <- generateMultivariateSample(N = 100, k = 10, d = 3)

`mergeFactors`

is a function that performs hierarchical post hoc testing. As arguments it takes:

- matrix/data.frame/vector with numeric response,
- factor vector defining groups.

By default (with argument `abbreviate = TRUE`

) factor levels are abbreviated and surrounded with brackets.

fmAll <- mergeFactors(response = randSample$response, factor = randSample$factor)

`mergeFactors`

outputs with information about the 'merging history'.

mergingHistory(fmAll, showStats = TRUE) %>% kable()

Each row of the above frame describes one step of the merging algorithm. First two columns specify which groups were merged in the iteration, columns *model* and *GIC* gather loglikelihood and Generalized Information Criterion for the model after merging. Last two columns are p-values for the *Likelihood Ratio Test* -- against the full model (*pvalVsFull*) and against the previous one (*pvalVsPrevious*).

If we choose a *fast* version of merging one dimensional response is fitted using `isoMDS{MASS}`

. Next, in each step only groups whose means are closed are compared.

fm <- mergeFactors(response = randSample$response, factor = randSample$factor, method = "fast-fixed") mergingHistory(fm, showStats = TRUE) %>% kable()

Algorithms implemented in the **factorMerger** package enable to create unequivocal partition of a factor. Below we present how to extract the partition from the `mergeFactor`

output.

- predict new labels for observations

```
cutTree(fm)
```

By default, `cutTree`

returns a factor split for the optimal GIC (with penalty = 2) model. However, we can specify different metrics (`stat = c("loglikelihood", "p-value", "GIC"`

) we would like to use in cutting. If `loglikelihood`

or `p-value`

is chosen an exact threshold must be given as a `value`

parameter. Then `cutTree`

returns factor for the smallest model whose statistic is higher than the threshold. If we choose `GIC`

then `value`

is interpreted as GIC penalty.

mH <- mergingHistory(fm, T) thres <- mH$model[nrow(mH) / 2] cutTree(fm, stat = "loglikelihood", value = thres)

In this example data partition is created for the last model from the merging path whose loglikelihood is greater than `r thres`

.

- get final clusters and clusters dictionary

```
getOptimalPartition(fm)
```

Function `getOptimalPartition`

returns a vector with the final cluster names from the factorMerger object.

```
getOptimalPartitionDf(fm)
```

Function `getOptimalPartitionDf`

returns a dictionary in a data frame format. Each row gives an original label of a factor level and its new (cluster) label.

Similarly to `cutTree`

, functions `getOptimalPartition`

and `getOptimalPartitionDf`

take arguments `stat`

and `threshold`

.

We may plot results using function `plot`

.

plot(fm, panel = "all", nodesSpacing = "equidistant", colorCluster = TRUE)

plot(fmAll, panel = "tree", statistic = "p-value", nodesSpacing = "effects", colorCluster = TRUE)

plot(fm, colorCluster = TRUE, panel = "response")

The heatmap on the right shows means of all variables taken into analysis by groups.

plot(fm, colorCluster = TRUE, panel = "response", responsePanel = "profile")

In the above plots colours are connected with the group. The plot on the right shows means rankings for all variables included in the algorithm.

It is also possible to plot *GIC* together with the merging path plot.

plot(fm, panel = "GIC", penalty = 5)

Model with the lowest GIC is marked.

oneDimRandSample <- generateSample(1000, 10)

oneDimFm <- mergeFactors(response = oneDimRandSample$response, factor = oneDimRandSample$factor, method = "fixed") mergingHistory(oneDimFm, showStats = TRUE) %>% kable()

plot(oneDimFm, palette = "Reds")

plot(oneDimFm, responsePanel = "boxplot", colorCluster = TRUE)

If `family = "binomial"`

response must have to values: `0`

and `1`

(`1`

is interpreted as success).

binomRandSample <- generateSample(1000, 10, distr = "binomial") table(binomRandSample$response, binomRandSample$factor) %>% kable()

binomFm <- mergeFactors(response = binomRandSample$response, factor = binomRandSample$factor, family = "binomial", method = "fast-adaptive") mergingHistory(binomFm, showStats = TRUE) %>% kable()

plot(binomFm, colorCluster = TRUE, penalty = 7)

plot(binomFm, gicPanelColor = "red")

If `family = "survival"`

response must be of a class `Surv`

.

library(survival) data(veteran) survResponse <- Surv(time = veteran$time, event = veteran$status) survivalFm <- mergeFactors(response = survResponse, factor = veteran$celltype, family = "survival")

mergingHistory(survivalFm, showStats = TRUE) %>% kable()

```
plot(survivalFm)
```

plot(survivalFm, nodesSpacing = "effects", colorCluster = TRUE)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.