# Using LibSVM in Java

For the past couple of months, I’ve been trying to get my feet wet with machine learning and started work on implementing a Behavioral Authentication mechanism for Android devices using Support Vector Machines (more on that later in another blog post). SVM is a relatively popular classifier which seemed appropriate for a beginner like me, and everything did go well until I had to implement the R prototype in Java.

I went with One-Class SVM for modelling purposes, and the obvious choice was the `libsvm`

library by Chih-Jen Lin, but there’s virtually no documentation for the Java
version either on their homepage or Github, simply referencing
their C documentation for Java implementations. So after digging through all of their Java
examples, I had a basic version of my port ready, but it gave wildly different results
compared to the R version.

Turns out you need to scale and normalize all data values between `0`

and `1`

, at least for
`OC-SVM`

. These are the `double`

values using which you construct the `svm_node`

2D Array
`x`

in the `svm_problem`

object. Without doing this, the classifier just goes bat-shit
crazy and just spits out random values. I imagine the R version of the library does that
automatically the for given data. Other than that, you also don’t need to have an extra
`svm_node`

object with an index of `-1`

at the end of the `x[]`

arrays to denote the end
of the vector (like the C version).

For running the One-Class classifier, everything else was pretty much same as the C code or the available Java examples, but I would usually use some sort of helper function to build the node arrays. For example, for building 2D points on a plane, I used something like this:

```
public static svm_node[] buildPoint(double x, double y) {
svm_node[] point = new svm_node[2];
// x
point[0] = new svm_node();
point[0].index = 1;
point[0].value = x;
// y
point[1] = new svm_node();
point[1].index = 2;
point[1].value = y;
return point;
}
```

Combine many of these together and you get the 2D array `svm_node[][]`

we need for the
SVM problem. Building the model is pretty straight-forward (use your own gamma & nu values
depending on your data):

```
public static svm_model buildModel(svm_node[][] nodes) {
// Build Parameters
svm_parameter param = new svm_parameter();
param.svm_type = svm_parameter.ONE_CLASS;
param.kernel_type = svm_parameter.RBF;
param.gamma = 0.802;
param.nu = 0.1608;
param.cache_size = 100;
// Build Problem
svm_problem problem = new svm_problem();
problem.x = nodes;
problem.l = nodes.length;
problem.y = prepareY(nodes.length);
// Build Model
return svm.svm_train(problem, param);
}
private static double[] prepareY(int size) {
double[] y = new double[size];
for (int i=0; i < size; i++)
y[i] = 1;
return y;
}
```

For classificiation, there’s the `svm.svm_predict(model, nodes)`

function that returns
either a `-1`

or `+1`

for one-class, but there’s another method available:
`svm.svm_predict_values(m, n, v)`

that can give you a prediction confidence score used
to return the positive or negative one. For `RBF`

, this score means the distance from
the center of the elliptical hyperplane drawn during modelling. Getting this “score” is
a bit different since this function itself also returns either a `-1`

or `+1`

. You have
to pass a 2-element array as the third argument to this function. After calling it, the
first value of the array will contain the score:

```
public static double predict(svm_model model, svm_node[] nodes) {
double[] scores = new double[2];
double result = svm.svm_predict_values(model, nodes, values);
return scores[0];
}
```

I really hope someone writes a better version/wrapper of LibSVM in Java, or improves the documentation so beginners like me can avoid wasting hours over implementation issues.