SCE parameters

The options for the SCE algorithm are explained below, other options are explained in Input options, Creating animations and Parallelisation.

The progress bar shows the percentage complete, the current learning rate \(\eta\), the divergence \(Eq\) (and clash rate, see Parallelisation).:

Optimizing   Progress: 99.9%, eta=0.0010, Eq=0.0547636151, clashes=0.1%
Optimizing done in 20s

The algorithm should be run until \(Eq\) stabilises.

SCE options:

--perplexity PERPLEXITY
                      Perplexity for distance to similarity conversion [default = 15]
--no-preprocessing    Turn off entropy pre-processing of distances
--weight-file WEIGHT_FILE
                      Weights for samples
--maxIter MAXITER     Maximum SCE iterations [default = 100000]
--nRepuSamp NREPUSAMP
                      Number of neighbours for calculating repulsion (1 or 5) [default = 5]
--eta0 ETA0           Learning rate [default = 1]
--bInit BINIT         1 for over-exaggeration in early stage [default = 0]
--no-clustering       Turn off HDBSCAN clustering after SCE
  • --perplexity roughly sets the balance between global and local structure, smaller values making fewer neighbours matter, and emphasising local structure. Typical values are between 5 and 50, but making plots at multiple values is often useful.

  • --no-preprocessing uses raw similarities as input, rather than a probability distribtion using a desired perplexity. Do not use this option unless you are having issues setting the perplexity.

  • --weight-file allows you to give different samples different weights in being picked to attract or repel. The default is to equally weight samples, but you could for example weight by cluster size or its inverse. This requires a tab-separated file with no header, the first column with sample names, the second column with their weights.

  • --maxIter sets how long the algorithm will run for. Larger datasets will need more iterations. We recommend running until \(Eq\) stabilises.

  • --nRepuSamp is the number of neighbours used for calculating the repulsion at each iteration. This can be one or five.

  • --eta0 sets the scale of the learning rate. A higher value will move points around more at each iteration.

  • --bInit turns on over-exaggeration, which sets the stength of attraction four times higher in the first 10% of the iterations.

  • --no-clustering turns off the HDBSCAN clustering if you don’t want spatial clustering run on the embedding (which may stall when no structure has been found). This is implied if --labels have been provided.