Run a Scikit Learn Job
Here's the code for this specimen on Github
A job is an execution of code with a fixed start and end unlike a serving which is an endless process. With that clear, we will try to run a simple job using sklearn
. This is taken from the sklearn documentation and we extend it to connect to Relics which is used to store all the artifacts generated by the job.
Code
Let's start by creating a file main.py
and add code from here:
from nbox import operator
def bench_k_means(kmeans, name, data, labels):
"""Benchmark to evaluate the KMeans initialization methods.
Parameters
----------
kmeans : KMeans instance
A :class:`~sklearn.cluster.KMeans` instance with the initialization
already set.
name : str
Name given to the strategy. It will be used to show the results in a
table.
data : ndarray of shape (n_samples, n_features)
The data to cluster.
labels : ndarray of shape (n_samples,)
The labels used to compute the clustering metrics which requires some
supervision.
"""
...
@operator()
def benchmark(n_init = 5, save_to_relic: bool = False):
"""Benchmark to evaluate the KMeans initialization methods.
Parameters
----------
n_init : int, default=5
Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of n_init
consecutive runs in terms of inertia.
save_to_relic : bool, default=False
Whether to save the benchmark results to Relic.
"""
...
Note that we want to run benchmark
function as job, it can take in a couple of arguments like n_init
which determines the number of times the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
It also takes in another argument save_to_relic
which if passed as True
will save the matplotlib plot to Relics. We will see how to use Relics in the next section.
To deploy this job all we need to run is this single command on shell terminal or you can use the "Compute Fabric" strategy as well.
nbx jobs upload \
main:benchmark \
--id "<my_id>" \
--trigger
We have also ensure that you can pass arguments to the running function through CLI as well. So if you want to save the resulting image in the Relics, you can simply append --save_to_relic True
to the above command.
On the dashboard you will see logs that look like this:
[2023-01-24T07:10:15+0000] [INFO] [auth.py:136] Current workspace id: None (None)
[2023-01-24T07:10:16+0000] [INFO] [dist.py:49] Workspace Id: wnja9glc
[2023-01-24T07:10:16+0000] [INFO] [dist.py:61] Tag:
[2023-01-24T07:10:16+0000] [INFO] [dist.py:87] Looking for init.pkl at hxo6nga7/args_kwargs
# digits: 10; # samples: 1797; # features 64
__________________________________________________________________________________
init time inertia homo compl v-meas ARI AMI silhouette
k-means++ 0.637s 69662 0.680 0.719 0.699 0.570 0.695 0.175
random 0.145s 69707 0.675 0.716 0.694 0.560 0.691 0.167
PCA-based 0.107s 72686 0.636 0.658 0.647 0.521 0.643 0.147
[2023-01-24T07:10:20+0000] [INFO] [dist.py:104] Saving output to hxo6nga7/return
[2023-01-24T07:10:20+0000] [INFO] [dist.py:113] Job hxo6nga7 completed with status 7
Conclusion
In this demo we saw how you can run a job on the NimbleBox platform.