ai_flow.operators.spark package¶
Submodules¶
ai_flow.operators.spark.spark_sql module¶
-
class
ai_flow.operators.spark.spark_sql.SparkSqlOperator(name: str, sql: str, master: str = 'yarn', application_name: Optional[str] = None, executable_path: Optional[str] = None, **kwargs)[source]¶ Bases:
ai_flow.model.operator.AIFlowOperatorSparkSqlOperator only supports client mode for now.
- Parameters
name – The operator’s name.
kwargs – Operator’s extended parameters.
-
await_termination(context: ai_flow.model.context.Context, timeout: Optional[int] = None)[source]¶ Wait for a task instance to finish. :param context: The context in which the operator is executed. :param timeout: If timeout is None, wait until the task ends.
If timeout is not None, wait for the task to end or the time exceeds timeout(seconds).
-
start(context: ai_flow.model.context.Context)[source]¶ Start a task instance.
-
stop(context: ai_flow.model.context.Context)[source]¶ Stop a task instance.
ai_flow.operators.spark.spark_submit module¶
-
class
ai_flow.operators.spark.spark_submit.SparkSubmitOperator(name: str, application: str, application_args: Optional[List[Any]] = None, executable_path: Optional[str] = None, master: str = 'yarn', deploy_mode: str = 'client', application_name: Optional[str] = None, submit_options: Optional[str] = None, k8s_namespace: Optional[str] = None, env_vars: Optional[Dict[str, Any]] = None, **kwargs)[source]¶ Bases:
ai_flow.model.operator.AIFlowOperatorSparkSubmitOperator is used to submit spark job with spark-submit command line.
- Parameters
name – The name of the operator.
application – The application file to be submitted, like app jar, python file or R file.
application_args – Args of the application.
executable_path – The path of spark-submit command.
master – spark://host:port, yarn, mesos://host:port, k8s://https://host:port, or local.
deploy_mode – Launch the program in client(by default) mode or cluster mode.
application_name – The name of spark application.
submit_options – The options that passes to command-line, e.g. –conf, –class and –files
k8s_namespace – The namespace of k8s, when submit application to k8s, it should be passed.
env_vars – Environment variables for spark-submit. It supports yarn and k8s mode too.
status_poll_interval – Seconds to wait between polls of driver status in cluster mode (Default: 1)
name – The operator’s name.
kwargs – Operator’s extended parameters.
-
await_termination(context: ai_flow.model.context.Context, timeout: Optional[int] = None)[source]¶ Wait for a task instance to finish. :param context: The context in which the operator is executed. :param timeout: If timeout is None, wait until the task ends.
If timeout is not None, wait for the task to end or the time exceeds timeout(seconds).
-
start(context: ai_flow.model.context.Context)[source]¶ Start a task instance.
-
stop(context: ai_flow.model.context.Context)[source]¶ Stop a task instance.