Blob Manager Plugin¶
Blob Managers are the central storage that supports uploading and downloading files. There are four purposes of having them:
The AIFlow client needs to submit artifacts(user codes, dependencies, and resources) to AIFlow server.
The AIFlow Server needs to distribute artifacts among workers.
The artifacts of each execution should be stored in persistent storage for restoring.
Users may need to transfer files between jobs in the same project.
Blob Managers have a common API and are “pluggable”, meaning you can swap Blob Manager based on your needs. AIFlow provides some built-in implementations, you can choose one of them or even implement your own BlobManager if needed.
Each project can only have one Blob Manager configured at a time, this is set by the blob section on top-level of the project.yaml. The blob section has two required sub-configs:
blob_manager_class: the fully-qualified name of the
Blob Managerclass.blob_manager_config: custom configuration of this type of implementation.
Built-in Blob Managers¶
LocalBlobManager¶
LocalBlobManager is only used when the AIFlow client, server, and workers are all on the same host because it relies on the local file system. LocalBlobManager has following custom configurations:
Key |
Type |
DESCRIPTION |
|---|---|---|
root_directory |
String |
The root directory of local filesystem to store artifacts |
A complete configuration example of LocalBlobManager in project.yaml.
blob:
blob_manager_class: ai_flow_plugins.blob_manager_plugins.local_blob_manager.LocalBlobManager
blob_manager_config:
root_directory: /tmp
OssBlobManager¶
OssBlobManager relies on Alibaba Cloud OSS to store resources. To use OssBlobManager you need to install python SDK for OSS client on every node that needs to access OSS file system.
pip install 'ai-flow-nightly[oss]'
OssBlobManager has following custom configurations:
Key |
Type |
DESCRIPTION |
|---|---|---|
root_directory |
String |
The root path of OSS filesystem to store artifacts |
access_key_id |
String |
The id of the access key |
access_key_secret |
String |
The secret of the access key |
endpoint |
String |
Access domain name or CNAME |
bucket |
String |
The name of OSS bucket |
A complete configuration example of OssBlobManager in project.yaml.
blob:
blob_manager_class: ai_flow_plugins.blob_manager_plugins.oss_blob_manager.OssBlobManager
blob_manager_config:
access_key_id: xxx
access_key_secret: xxx
endpoint: oss-cn-hangzhou.aliyuncs.com
bucket: ai-flow
root_directory: tmp
HDFSBlobManager¶
HDFSBlobManager relies on HDFS to store resources. To use HDFSBlobManager you need to install python SDK for HDFS client on every node which needs to access HDFSBlobManager.
pip install 'ai-flow-nightly[hdfs]'
HDFSBlobManager has following custom configurations:
Key |
Type |
DESCRIPTION |
|---|---|---|
hdfs_url |
String |
The url of WebHDFS |
hdfs_user |
String |
The user to access HDFS |
root_directory |
String |
The root path of HDFS filesystem to store artifacts |
A complete configuration example of HDFSBlobManager in project.yaml.
blob:
blob_manager_class: ai_flow_plugins.blob_manager_plugins.hdfs_blob_manager.HDFSBlobManager
blob_manager_config:
hdfs_url: http://hadoop-dfs:50070
hdfs_user: hdfs
root_directory: /tmp
S3BlobManager¶
// TODO
Using Blob Manager in a Workflow¶
The Blob Manager is not only be used by the AIFlow framework, users can also upload or download files with the Blob Manager if it has been configured in project.yaml. E.g.
from ai_flow.context.project_context import current_project_config
from ai_flow.workflow.workflow import WorkflowPropertyKeys
from ai_flow.plugin_interface.blob_manager_interface import BlobConfig, BlobManagerFactory
blob_config = BlobConfig(current_project_config().get(WorkflowPropertyKeys.BLOB))
blob_manager = BlobManagerFactory.create_blob_manager(blob_config.blob_manager_class(),
blob_config.blob_manager_config())
blob_manager.upload(local_file_path='/tmp/file')
Customizing Blob Manager¶
You can also implement your own Blob Manager if the built-in ones do not meet your requirements. To create a blob manager plugin, one needs to implement a subclass of ai_flow.plugin_interface.blob_manager_interface.BlobManager to upload and download artifacts. To take configurations upon construction, the subclass should have a __init__(self, config: Dict) method. The configurations can be added when someone setup AIFlow to use the custom blob manager.