ai_flow.operators.spark package

Submodules

ai_flow.operators.spark.spark_sql module

class ai_flow.operators.spark.spark_sql.SparkSqlOperator(name: str, sql: str, master: str = 'yarn', application_name: Optional[str] = None, executable_path: Optional[str] = None, **kwargs)[source]

Bases: ai_flow.model.operator.AIFlowOperator

SparkSqlOperator only supports client mode for now.

Parameters
  • name – The operator’s name.

  • kwargs – Operator’s extended parameters.

await_termination(context: ai_flow.model.context.Context, timeout: Optional[int] = None)[source]

Wait for a task instance to finish. :param context: The context in which the operator is executed. :param timeout: If timeout is None, wait until the task ends.

If timeout is not None, wait for the task to end or the time exceeds timeout(seconds).

start(context: ai_flow.model.context.Context)[source]

Start a task instance.

stop(context: ai_flow.model.context.Context)[source]

Stop a task instance.

ai_flow.operators.spark.spark_submit module

class ai_flow.operators.spark.spark_submit.SparkSubmitOperator(name: str, application: str, application_args: Optional[List[Any]] = None, executable_path: Optional[str] = None, master: str = 'yarn', deploy_mode: str = 'client', application_name: Optional[str] = None, submit_options: Optional[str] = None, k8s_namespace: Optional[str] = None, env_vars: Optional[Dict[str, Any]] = None, **kwargs)[source]

Bases: ai_flow.model.operator.AIFlowOperator

SparkSubmitOperator is used to submit spark job with spark-submit command line.

Parameters
  • name – The name of the operator.

  • application – The application file to be submitted, like app jar, python file or R file.

  • application_args – Args of the application.

  • executable_path – The path of spark-submit command.

  • master – spark://host:port, yarn, mesos://host:port, k8s://https://host:port, or local.

  • deploy_mode – Launch the program in client(by default) mode or cluster mode.

  • application_name – The name of spark application.

  • submit_options – The options that passes to command-line, e.g. –conf, –class and –files

  • k8s_namespace – The namespace of k8s, when submit application to k8s, it should be passed.

  • env_vars – Environment variables for spark-submit. It supports yarn and k8s mode too.

  • status_poll_interval – Seconds to wait between polls of driver status in cluster mode (Default: 1)

  • name – The operator’s name.

  • kwargs – Operator’s extended parameters.

await_termination(context: ai_flow.model.context.Context, timeout: Optional[int] = None)[source]

Wait for a task instance to finish. :param context: The context in which the operator is executed. :param timeout: If timeout is None, wait until the task ends.

If timeout is not None, wait for the task to end or the time exceeds timeout(seconds).

start(context: ai_flow.model.context.Context)[source]

Start a task instance.

stop(context: ai_flow.model.context.Context)[source]

Stop a task instance.