Glossary Page
Definitions of all the terms.
This page contains all the definitions of how we treat all the objects in the NimbleBox system. The easiest way we would recommend is to cmd/ctrl+f
.
Project / nbox.Project
NimbleBox Project is a single workbench for your ML Pipelines and it's the first thing you need to create. It's a container for all the other objects like Jobs, Deploy, Relics, Experiment and Live Tracker. You can use projects as follows
from nbox import Project
p = Project('j898sj1')
# when running it on NBX Jobs/Deploy you don't need to pass proejct ID
# p = Project()
relic = p.get_relic() # get relic instance
exp_tracker = p.get_exp_tracker(
metadata = {"lr": lr} # pass any metadata you want to track
)
live_tracker = p.get_live_tracker() # get live tracker instance
Build / nbox.Instance
Build Instances are development ready VMs with GPUs. You can run any application and develop on them. Here's some quick API reference:
from nbox import Instance
i = Instance('2993')
i.start(gpu = "p100") # to start an instance with a p100 gpu
i.stop() # to stop the instance
# you can run some nicer commands like
i.ls() # to list the files in the instance
i.mv(
'test.txt', 'nbx://test.txt'
) # to print the current working directory
i.rm('nbx://test.txt') # to remove a file
i.remote('git clone my-repo.git') # to run a command on the instance
Model (Deploy-Model) / nbox.Serve
A single model is a single API endpoint that can be controlled as well. This class exposes those methods.
model = Serve(
serving_id = "qwerty12",
model_id = "blackbry"
)
model.pin() # pin the model to the deploy group
model.unpin() # unpin the model from the deploy group
model.scale() # scale the model to the desired number of replicas
model.logs() # get realtime logs for the model
Link to full documentation of nbox.Serve.
Jobs / nbox.Job
Jobs has a defined start and a defined end unlike Deploy, which has no end and this is the only difference between the two. A Job consists of two components the logic of the program which is an artibtrary nbox.Operator
tree and the nbox.Resource
to run it.
j = Job(job_id = "qwerty12")
j.trigger() # trigger the job
j.pause() # pause the job
j.resume() # resume the job
j.logs() # stream logs logs for the job
# to get information about the runs
runs = j.get_runs() # get runs for the job
j.display_runs() # display runs in a table
runs = j.last_n_runs() # get last n runs
j.delete() # delete the job
Link to full documentation of nbox.Job.
Experiment Tracking / nbox.Lmao
LMAO is an experiment tracking system build on top of NBX-Jobs/Deploy/Relics to provide a single dashboard for training.
# Connect to create a project
lmao = Lmao(
"project_name",
metadata = {
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 10,
}
)
for i in range(10):
# store any logs that you want
lmao.log({
"loss": 0.1,
"accuracy": 0.9,
})
lmao.save_file("model.pt") # automatically sync artifacts
lmao.end()
Operators / nbox.Operator
, nbox.operator
In the NimbleBox way of MLOps, Operator is the unit piece of computation that you want to run. The reasoning behind Operator
is too long to properly summarize and has a page of its own. You can define an Operator
in 4 different ways:
@operator
on a function@operator
on a classOperator.from_job
which will latch itself to a job and calling it will run a jobOperator.from_serving
which will latch itself to a serving group and calling it will call the API
# apply this over a function makes no difference
@operator()
def foo(x: float, y: float):
return {"data": x + y}
foo(1, 2) == 3
# apply this over a class and it makes not difference (*)
# * some function names are reserved and will be overriden
@operator()
class Counter:
def __init__(self, a: int 3):
self.a = a
def inc(self, x: int = 0):
self.a = self.a + x
def dec(self, x: int = 0):
self.a = self.a - x
cntr = Counter(2)
cntr.inc(1)
assert cntr.a == 3
cntr.dec(2)
assert cntr.a == 1
These are the two recommended ways of creating Operator
objects. The first is a function that returns a dict and other is a OOps way of defining a counter. But the real magic is something is "Compute Fabric", where we can jsut deploy these functions as a Job or a Serving API.
# since foo is a function we can deploy it both as a job and as a serving
j: nbox.Job = foo.deploy(type = "job") # will return a job object
s: nbox.Serve = foo.deploy(type = "serve") # will return a serving objects
# while Counter is a class and can only be deployed as a serving
s: nbox.Serve = Counter.deploy(type = "serve") # will return a serving objects
# to get the details
print(j, s)
And then it all comes full circle when you can use these deployed Jobs and Serving as an operator.
# Operator can latch to a job or a serving
foo_job = Operator.from_job(j.job_id)
foo_serve = Operator.from_serving(s.serving_id)
out_foo = foo(1, 2)
out_foo_job = foo_job(1, 2)
out_foo_serve = foo_serve(1, 2)
assert out_foo == out_foo_job == out_foo_serve
# Operator can also latch to an `Operator` serving and make internet invisible
cntr_serve = Operator.from_serving(s.serving_id)
cntr_serve.inc(1)
print(cntr_serve.a)
cntr_serve.dec(2)
print(cntr_serve.a)
Link to full documentation of nbox.Operator.
Relics / nbox.RelicsNBX
Relics is a store front for object stores like AWS S3, GCP Bucket and Azure Blob Storage. Relics generates the download / upload links, tracks using logs and provides a simple UI to manage and lookup files. Here's how you can use Relics:
# create a sample file
with open("whoami.txt", "w") as f:
f.write("This is Sparta!")
r = Relics("my-relic-name", create = True) # create a new relic
r.put_to("whoami.txt", "new_data/whoami.txt") # local -> remote
r.has("new_data/whoami.txt") # check if file exists
r.get_from("whoami.txt", "new_data/whoami.txt") # local -> remote
r.list_files("new_data/") # list files in a directory
r.rm("new_data/whoami.txt") # delete a file
# store and retrieve a python objects
py_obj_original = (1, 2, 3, 4, 5) # a python object
r.put_object("py/obj", py_obj_original) # put a python object
py_obj_retrieved = r.get_object("py/obj") # get a python object
assert py_obj_original == py_obj_retrieved
r.delete() # delete the relic
Link to full documentation of nbox.Relics.
Resource / nbox.Resource
This object is used to define kubernetes style hardware resource on which you want to run any computation. It can take in the following arguments:
cpu
: a string like "100m"memory
: a string like "100Mi"gpu
: a string like "nvidia-tesla-k80"gpu_count
: a string like "1"disk_size
: a string like "4Gi"timeout
: an integer like 3600max_retries
: an integer like 3
Schedule / nbox.Schedule
Define the schedule at which you want to run any Job. NimbleBox system will honour the schedule but not guarantee that exact time of run, ie. it will try to run it at the closest time after the defined schedule.