Skip to main content

types

ApplicationAttempt

id : string

ID of YARN application attempt

amContainerId : string

ID of YARN Application Master container

ApplicationInfo

id : string

ID of YARN application

applicationAttempts : ApplicationAttempt

YARN application attempts

AutoscalingConfig

maxHostsCount : int64

Upper limit for total instance subcluster count.

preemptible : bool

Preemptible instances are stopped at least once every 24 hours, and can be stopped at any time if their resources are needed by Compute. For more information, see Preemptible Virtual Machines.

measurementDuration : google.protobuf.Duration

Time in seconds allotted for averaging metrics.

warmupDuration : google.protobuf.Duration

The warmup time of the instance in seconds. During this time, traffic is sent to the instance, but instance metrics are not collected.

stabilizationDuration : google.protobuf.Duration

Minimum amount of time in seconds allotted for monitoring before Instance Groups can reduce the number of instances in the group. During this time, the group size doesn't decrease, even if the new metric values indicate that it should.

cpuUtilizationTarget : double

Defines an autoscaling rule based on the average CPU utilization of the instance group.

decommissionTimeout : int64

Timeout to gracefully decommission nodes during downscaling. In seconds. Default value: 120

Cluster

A Data Proc cluster. For details about the concept, see documentation.

Status

  • STATUS_UNKNOWN

    Cluster state is unknown.

  • CREATING

    Cluster is being created.

  • RUNNING

    Cluster is running normally.

  • ERROR

    Cluster encountered a problem and cannot operate.

  • STOPPING

    Cluster is stopping.

  • STOPPED

    Cluster stopped.

  • STARTING

    Cluster is starting.

id : string

ID of the cluster. Generated at creation time.

folderId : string

ID of the folder that the cluster belongs to.

createdAt : google.protobuf.Timestamp

Creation timestamp.

name : string

Name of the cluster. The name is unique within the folder.

description : string

Description of the cluster.

labels : string

Cluster labels as key:value pairs.

monitoring : Monitoring

Monitoring systems relevant to the cluster.

config : ClusterConfig

Configuration of the cluster.

health : Health

Aggregated cluster health.

status : Status

Cluster status.

zoneId : string

ID of the availability zone where the cluster resides.

serviceAccountId : string

ID of service account for the Data Proc manager agent.

bucket : string

Object Storage bucket to be used for Data Proc jobs that are run in the cluster.

uiProxy : bool

Whether UI Proxy feature is enabled.

securityGroupIds : string

User security groups.

hostGroupIds : string

Host groups hosting VMs of the cluster.

deletionProtection : bool

Deletion Protection inhibits deletion of the cluster

logGroupId : string

ID of the cloud logging log group to write logs. If not set, default log group for the folder will be used. To prevent logs from being sent to the cloud set cluster property dataproc:disable_cloud_logging = true

ClusterConfig

versionId : string

Image version for cluster provisioning. All available versions are listed in the documentation.

hadoop : HadoopConfig

Data Proc specific configuration options.

HadoopConfig

Hadoop configuration that describes services installed in a cluster, their properties and settings.

Service

  • SERVICE_UNSPECIFIED

  • HDFS

  • YARN

  • MAPREDUCE

  • HIVE

  • TEZ

  • ZOOKEEPER

  • HBASE

  • SQOOP

  • FLUME

  • SPARK

  • ZEPPELIN

  • OOZIE

  • LIVY

services : Service

Set of services used in the cluster (if empty, the default set is used).

properties : string

Properties set for all hosts in *-site.xml configurations. The key should indicate the service and the property.

For example, use the key 'hdfs:dfs.replication' to set the dfs.replication property in the file /etc/hadoop/conf/hdfs-site.xml.

sshPublicKeys : string

List of public SSH keys to access to cluster hosts.

initializationActions : InitializationAction

Set of init-actions

HiveJob

properties : string

Property names and values, used to configure Data Proc and Hive.

continueOnFailure : bool

Flag indicating whether a job should continue to run if a query fails.

scriptVariables : string

Query variables and their values.

jarFileUris : string

JAR file URIs to add to CLASSPATH of the Hive driver and each task.

One of queryType

  • queryFileUri : string

    URI of the script with all the necessary Hive queries.

  • queryList : QueryList

    List of Hive queries to be used in the job.

Host

A Data Proc host. For details about the concept, see documentation.

name : string

Name of the Data Proc host. The host name is assigned by Data Proc at creation time and cannot be changed. The name is generated to be unique across all Data Proc hosts that exist on the platform, as it defines the FQDN of the host.

subclusterId : string

ID of the Data Proc subcluster that the host belongs to.

health : Health

Status code of the aggregated health of the host.

computeInstanceId : string

ID of the Compute virtual machine that is used as the Data Proc host.

role : Role

Role of the host in the cluster.

InitializationAction

uri : string

URI of the executable file

args : string

Arguments to the initialization action

timeout : int64

Execution timeout

Job

A Data Proc job. For details about the concept, see documentation.

Status

  • STATUS_UNSPECIFIED

  • PROVISIONING

    Job is logged in the database and is waiting for the agent to run it.

  • PENDING

    Job is acquired by the agent and is in the queue for execution.

  • RUNNING

    Job is being run in the cluster.

  • ERROR

    Job failed to finish the run properly.

  • DONE

    Job is finished.

  • CANCELLED

    Job is cancelled.

  • CANCELLING

    Job is waiting for cancellation.

id : string

ID of the job. Generated at creation time.

clusterId : string

ID of the Data Proc cluster that the job belongs to.

createdAt : google.protobuf.Timestamp

Creation timestamp.

startedAt : google.protobuf.Timestamp

The time when the job was started.

finishedAt : google.protobuf.Timestamp

The time when the job was finished.

name : string

Name of the job, specified in the JobService.Create request.

createdBy : string

The id of the user who created the job

status : Status

Job status.

One of jobSpec

Specification for the job.

  • mapreduceJob : MapreduceJob

    Specification for a MapReduce job.

  • sparkJob : SparkJob

    Specification for a Spark job.

  • pysparkJob : PysparkJob

    Specification for a PySpark job.

  • hiveJob : HiveJob

    Specification for a Hive job.

applicationInfo : ApplicationInfo

Attributes of YARN application.

MapreduceJob

args : string

Optional arguments to pass to the driver.

jarFileUris : string

JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.

fileUris : string

URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.

archiveUris : string

URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.

properties : string

Property names and values, used to configure Data Proc and MapReduce.

One of driver

  • mainJarFileUri : string

    HCFS URI of the .jar file containing the driver class.

  • mainClass : string

    The name of the driver class.

Monitoring

Metadata of a monitoring system for a Data Proc cluster.

name : string

Name of the monitoring system.

description : string

Description of the monitoring system.

Link to the monitoring system.

PysparkJob

args : string

Optional arguments to pass to the driver.

jarFileUris : string

JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.

fileUris : string

URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.

archiveUris : string

URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.

properties : string

Property names and values, used to configure Data Proc and PySpark.

mainPythonFileUri : string

URI of the file with the driver code. Must be a .py file.

pythonFileUris : string

URIs of Python files to pass to the PySpark framework.

packages : string

List of maven coordinates of jars to include on the driver and executor classpaths.

repositories : string

List of additional remote repositories to search for the maven coordinates given with --packages.

excludePackages : string

List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

QueryList

queries : string

List of Hive queries.

ResourcePreset

A ResourcePreset resource for describing hardware configuration presets.

id : string

ID of the ResourcePreset resource.

zoneIds : string

IDs of availability zones where the resource preset is available.

cores : int64

Number of CPU cores for a Data Proc host created with the preset.

memory : int64

RAM volume for a Data Proc host created with the preset, in bytes.

Resources

resourcePresetId : string

ID of the resource preset for computational resources available to a host (CPU, memory etc.). All available presets are listed in the documentation.

diskTypeId : string

Type of the storage environment for the host. Possible values:

  • network-hdd - network HDD drive,
  • network-ssd - network SSD drive.
diskSize : int64

Volume of the storage available to a host, in bytes.

SparkJob

args : string

Optional arguments to pass to the driver.

jarFileUris : string

JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.

fileUris : string

URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.

archiveUris : string

URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.

properties : string

Property names and values, used to configure Data Proc and Spark.

mainJarFileUri : string

The HCFS URI of the JAR file containing the main class for the job.

mainClass : string

The name of the driver class.

packages : string

List of maven coordinates of jars to include on the driver and executor classpaths.

repositories : string

List of additional remote repositories to search for the maven coordinates given with --packages.

excludePackages : string

List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

Subcluster

A Data Proc subcluster. For details about the concept, see documentation.

id : string

ID of the subcluster. Generated at creation time.

clusterId : string

ID of the Data Proc cluster that the subcluster belongs to.

createdAt : google.protobuf.Timestamp

Creation timestamp.

name : string

Name of the subcluster. The name is unique within the cluster.

role : Role

Role that is fulfilled by hosts of the subcluster.

resources : Resources

Resources allocated for each host in the subcluster.

subnetId : string

ID of the VPC subnet used for hosts in the subcluster.

hostsCount : int64

Number of hosts in the subcluster.

assignPublicIp : bool

Assign public ip addresses for all hosts in subcluter.

autoscalingConfig : AutoscalingConfig

Configuration for instance group based subclusters

instanceGroupId : string

ID of Compute Instance Group for autoscaling subclusters