types

ApplicationAttempt

`id` : string

ID of YARN application attempt

`amContainerId` : string

ID of YARN Application Master container

ApplicationInfo

`id` : string

ID of YARN application

`applicationAttempts` : ApplicationAttempt

YARN application attempts

AutoscalingConfig

`maxHostsCount` : int64

Upper limit for total instance subcluster count.

`preemptible` : bool

Preemptible instances are stopped at least once every 24 hours, and can be stopped at any time if their resources are needed by Compute. For more information, see Preemptible Virtual Machines.

`measurementDuration` : google.protobuf.Duration

Time in seconds allotted for averaging metrics.

`warmupDuration` : google.protobuf.Duration

The warmup time of the instance in seconds. During this time, traffic is sent to the instance, but instance metrics are not collected.

`stabilizationDuration` : google.protobuf.Duration

Minimum amount of time in seconds allotted for monitoring before Instance Groups can reduce the number of instances in the group. During this time, the group size doesn't decrease, even if the new metric values indicate that it should.

`cpuUtilizationTarget` : double

Defines an autoscaling rule based on the average CPU utilization of the instance group.

`decommissionTimeout` : int64

Timeout to gracefully decommission nodes during downscaling. In seconds. Default value: 120

Cluster

A Data Proc cluster. For details about the concept, see documentation.

Status

STATUS_UNKNOWN
Cluster state is unknown.
CREATING
Cluster is being created.
RUNNING
Cluster is running normally.
ERROR
Cluster encountered a problem and cannot operate.
STOPPING
Cluster is stopping.
STOPPED
Cluster stopped.
STARTING
Cluster is starting.

`id` : string

ID of the cluster. Generated at creation time.

`folderId` : string

ID of the folder that the cluster belongs to.

`createdAt` : google.protobuf.Timestamp

Creation timestamp.

`name` : string

Name of the cluster. The name is unique within the folder.

`description` : string

Description of the cluster.

`labels` : string

Cluster labels as key:value pairs.

`monitoring` : Monitoring

Monitoring systems relevant to the cluster.

`config` : ClusterConfig

Configuration of the cluster.

`health` : Health

Aggregated cluster health.

`status` : Status

Cluster status.

`zoneId` : string

ID of the availability zone where the cluster resides.

`serviceAccountId` : string

ID of service account for the Data Proc manager agent.

`bucket` : string

Object Storage bucket to be used for Data Proc jobs that are run in the cluster.

`uiProxy` : bool

Whether UI Proxy feature is enabled.

`securityGroupIds` : string

User security groups.

`hostGroupIds` : string

Host groups hosting VMs of the cluster.

`deletionProtection` : bool

Deletion Protection inhibits deletion of the cluster

`logGroupId` : string

ID of the cloud logging log group to write logs. If not set, default log group for the folder will be used. To prevent logs from being sent to the cloud set cluster property dataproc:disable_cloud_logging = true

ClusterConfig

`versionId` : string

Image version for cluster provisioning. All available versions are listed in the documentation.

`hadoop` : HadoopConfig

Data Proc specific configuration options.

HadoopConfig

Hadoop configuration that describes services installed in a cluster, their properties and settings.

Service

SERVICE_UNSPECIFIED
HDFS
YARN
MAPREDUCE
HIVE
TEZ
ZOOKEEPER
HBASE
SQOOP
FLUME
SPARK
ZEPPELIN
OOZIE
LIVY

`services` : Service

Set of services used in the cluster (if empty, the default set is used).

`properties` : string

Properties set for all hosts in *-site.xml configurations. The key should indicate the service and the property.

For example, use the key 'hdfs:dfs.replication' to set the dfs.replication property in the file /etc/hadoop/conf/hdfs-site.xml.

`sshPublicKeys` : string

List of public SSH keys to access to cluster hosts.

`initializationActions` : InitializationAction

Set of init-actions

HiveJob

`properties` : string

Property names and values, used to configure Data Proc and Hive.

`continueOnFailure` : bool

Flag indicating whether a job should continue to run if a query fails.

`scriptVariables` : string

Query variables and their values.

`jarFileUris` : string

JAR file URIs to add to CLASSPATH of the Hive driver and each task.

One of queryType

queryFileUri : string
URI of the script with all the necessary Hive queries.
queryList : QueryList
List of Hive queries to be used in the job.

Host

A Data Proc host. For details about the concept, see documentation.

`name` : string

Name of the Data Proc host. The host name is assigned by Data Proc at creation time and cannot be changed. The name is generated to be unique across all Data Proc hosts that exist on the platform, as it defines the FQDN of the host.

`subclusterId` : string

ID of the Data Proc subcluster that the host belongs to.

`health` : Health

Status code of the aggregated health of the host.

`computeInstanceId` : string

ID of the Compute virtual machine that is used as the Data Proc host.

`role` : Role

Role of the host in the cluster.

InitializationAction

`uri` : string

URI of the executable file

`args` : string

Arguments to the initialization action

`timeout` : int64

Execution timeout

Job

A Data Proc job. For details about the concept, see documentation.

Status

STATUS_UNSPECIFIED
PROVISIONING
Job is logged in the database and is waiting for the agent to run it.
PENDING
Job is acquired by the agent and is in the queue for execution.
RUNNING
Job is being run in the cluster.
ERROR
Job failed to finish the run properly.
DONE
Job is finished.
CANCELLED
Job is cancelled.
CANCELLING
Job is waiting for cancellation.

`id` : string

ID of the job. Generated at creation time.

`clusterId` : string

ID of the Data Proc cluster that the job belongs to.

`createdAt` : google.protobuf.Timestamp

Creation timestamp.

`startedAt` : google.protobuf.Timestamp

The time when the job was started.

`finishedAt` : google.protobuf.Timestamp

The time when the job was finished.

`name` : string

Name of the job, specified in the JobService.Create request.

`createdBy` : string

The id of the user who created the job

`status` : Status

Job status.

One of jobSpec

Specification for the job.

mapreduceJob : MapreduceJob
Specification for a MapReduce job.
sparkJob : SparkJob
Specification for a Spark job.
pysparkJob : PysparkJob
Specification for a PySpark job.
hiveJob : HiveJob
Specification for a Hive job.

`applicationInfo` : ApplicationInfo

Attributes of YARN application.

MapreduceJob

`args` : string

Optional arguments to pass to the driver.

`jarFileUris` : string

JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.

`fileUris` : string

URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.

`archiveUris` : string

URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.

`properties` : string

Property names and values, used to configure Data Proc and MapReduce.

One of driver

mainJarFileUri : string
HCFS URI of the .jar file containing the driver class.
mainClass : string
The name of the driver class.

Monitoring

Metadata of a monitoring system for a Data Proc cluster.

`name` : string

Name of the monitoring system.

`description` : string

Description of the monitoring system.

`link` : string

Link to the monitoring system.

PysparkJob

`args` : string

Optional arguments to pass to the driver.

`jarFileUris` : string

JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.

`fileUris` : string

URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.

`archiveUris` : string

URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.

`properties` : string

Property names and values, used to configure Data Proc and PySpark.

`mainPythonFileUri` : string

URI of the file with the driver code. Must be a .py file.

`pythonFileUris` : string

URIs of Python files to pass to the PySpark framework.

`packages` : string

List of maven coordinates of jars to include on the driver and executor classpaths.

`repositories` : string

List of additional remote repositories to search for the maven coordinates given with --packages.

`excludePackages` : string

List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

QueryList

`queries` : string

List of Hive queries.

ResourcePreset

A ResourcePreset resource for describing hardware configuration presets.

`id` : string

ID of the ResourcePreset resource.

`zoneIds` : string

IDs of availability zones where the resource preset is available.

`cores` : int64

Number of CPU cores for a Data Proc host created with the preset.

`memory` : int64

RAM volume for a Data Proc host created with the preset, in bytes.

Resources

`resourcePresetId` : string

ID of the resource preset for computational resources available to a host (CPU, memory etc.). All available presets are listed in the documentation.

`diskTypeId` : string

Type of the storage environment for the host. Possible values:

network-hdd - network HDD drive,
network-ssd - network SSD drive.

`diskSize` : int64

Volume of the storage available to a host, in bytes.

SparkJob

`args` : string

Optional arguments to pass to the driver.

`jarFileUris` : string

JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.

`fileUris` : string

URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.

`archiveUris` : string

URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.

`properties` : string

Property names and values, used to configure Data Proc and Spark.

`mainJarFileUri` : string

The HCFS URI of the JAR file containing the main class for the job.

`mainClass` : string

The name of the driver class.

`packages` : string

List of maven coordinates of jars to include on the driver and executor classpaths.

`repositories` : string

List of additional remote repositories to search for the maven coordinates given with --packages.

`excludePackages` : string

List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

Subcluster

A Data Proc subcluster. For details about the concept, see documentation.

`id` : string

ID of the subcluster. Generated at creation time.

`clusterId` : string

ID of the Data Proc cluster that the subcluster belongs to.

`createdAt` : google.protobuf.Timestamp

Creation timestamp.

`name` : string

Name of the subcluster. The name is unique within the cluster.

`role` : Role

Role that is fulfilled by hosts of the subcluster.

`resources` : Resources

Resources allocated for each host in the subcluster.

`subnetId` : string

ID of the VPC subnet used for hosts in the subcluster.

`hostsCount` : int64

Number of hosts in the subcluster.

`assignPublicIp` : bool

Assign public ip addresses for all hosts in subcluter.

`autoscalingConfig` : AutoscalingConfig

Configuration for instance group based subclusters

`instanceGroupId` : string

ID of Compute Instance Group for autoscaling subclusters

ApplicationAttempt​

id : string​

amContainerId : string​

ApplicationInfo​

id : string​

applicationAttempts : ApplicationAttempt​

AutoscalingConfig​

maxHostsCount : int64​

preemptible : bool​

measurementDuration : google.protobuf.Duration​

warmupDuration : google.protobuf.Duration​

stabilizationDuration : google.protobuf.Duration​

cpuUtilizationTarget : double​

decommissionTimeout : int64​

Cluster​

Status​

id : string​

folderId : string​

createdAt : google.protobuf.Timestamp​

name : string​

description : string​

labels : string​

monitoring : Monitoring​

config : ClusterConfig​

health : Health​

status : Status​

zoneId : string​

serviceAccountId : string​

bucket : string​

uiProxy : bool​

securityGroupIds : string​

hostGroupIds : string​

deletionProtection : bool​

logGroupId : string​

ClusterConfig​

versionId : string​

hadoop : HadoopConfig​

HadoopConfig​

Service​

services : Service​

properties : string​

sshPublicKeys : string​

initializationActions : InitializationAction​

HiveJob​

properties : string​

continueOnFailure : bool​

scriptVariables : string​

jarFileUris : string​

One of queryType​

queryFileUri : string​

queryList : QueryList​

Host​

name : string​

subclusterId : string​

health : Health​

computeInstanceId : string​

role : Role​

InitializationAction​

uri : string​

args : string​

timeout : int64​

Job​

Status​

id : string​

clusterId : string​

createdAt : google.protobuf.Timestamp​

startedAt : google.protobuf.Timestamp​

finishedAt : google.protobuf.Timestamp​

name : string​

createdBy : string​

status : Status​

One of jobSpec​

mapreduceJob : MapreduceJob​

sparkJob : SparkJob​

pysparkJob : PysparkJob​

hiveJob : HiveJob​

applicationInfo : ApplicationInfo​

MapreduceJob​

args : string​

jarFileUris : string​

ApplicationAttempt

`id` : string

`amContainerId` : string

ApplicationInfo

`id` : string

`applicationAttempts` : ApplicationAttempt

AutoscalingConfig

`maxHostsCount` : int64

`preemptible` : bool

`measurementDuration` : google.protobuf.Duration

`warmupDuration` : google.protobuf.Duration

`stabilizationDuration` : google.protobuf.Duration

`cpuUtilizationTarget` : double

`decommissionTimeout` : int64

Cluster

Status

`id` : string

`folderId` : string

`createdAt` : google.protobuf.Timestamp

`name` : string

`description` : string

`labels` : string

`monitoring` : Monitoring

`config` : ClusterConfig

`health` : Health

`status` : Status

`zoneId` : string

`serviceAccountId` : string

`bucket` : string

`uiProxy` : bool

`securityGroupIds` : string

`hostGroupIds` : string

`deletionProtection` : bool

`logGroupId` : string

ClusterConfig

`versionId` : string

`hadoop` : HadoopConfig

HadoopConfig

Service

`services` : Service

`properties` : string

`sshPublicKeys` : string

`initializationActions` : InitializationAction

HiveJob

`properties` : string

`continueOnFailure` : bool

`scriptVariables` : string

`jarFileUris` : string

One of queryType

`queryFileUri` : string

`queryList` : QueryList

Host

`name` : string

`subclusterId` : string

`health` : Health

`computeInstanceId` : string

`role` : Role

InitializationAction

`uri` : string

`args` : string

`timeout` : int64

Job

Status

`id` : string

`clusterId` : string

`createdAt` : google.protobuf.Timestamp

`startedAt` : google.protobuf.Timestamp

`finishedAt` : google.protobuf.Timestamp

`name` : string

`createdBy` : string

`status` : Status

One of jobSpec

`mapreduceJob` : MapreduceJob

`sparkJob` : SparkJob

`pysparkJob` : PysparkJob

`hiveJob` : HiveJob

`applicationInfo` : ApplicationInfo

MapreduceJob

`args` : string

`jarFileUris` : string