types
ApplicationAttempt
id
: string
ID of YARN application attempt
amContainerId
: string
ID of YARN Application Master container
ApplicationInfo
id
: string
ID of YARN application
applicationAttempts
: ApplicationAttempt
YARN application attempts
AutoscalingConfig
maxHostsCount
: int64
Upper limit for total instance subcluster count.
preemptible
: bool
Preemptible instances are stopped at least once every 24 hours, and can be stopped at any time if their resources are needed by Compute. For more information, see Preemptible Virtual Machines.
measurementDuration
: google.protobuf.Duration
Time in seconds allotted for averaging metrics.
warmupDuration
: google.protobuf.Duration
The warmup time of the instance in seconds. During this time, traffic is sent to the instance, but instance metrics are not collected.
stabilizationDuration
: google.protobuf.Duration
Minimum amount of time in seconds allotted for monitoring before Instance Groups can reduce the number of instances in the group. During this time, the group size doesn't decrease, even if the new metric values indicate that it should.
cpuUtilizationTarget
: double
Defines an autoscaling rule based on the average CPU utilization of the instance group.
decommissionTimeout
: int64
Timeout to gracefully decommission nodes during downscaling. In seconds. Default value: 120
Cluster
A Data Proc cluster. For details about the concept, see documentation.
Status
STATUS_UNKNOWN
Cluster state is unknown.
CREATING
Cluster is being created.
RUNNING
Cluster is running normally.
ERROR
Cluster encountered a problem and cannot operate.
STOPPING
Cluster is stopping.
STOPPED
Cluster stopped.
STARTING
Cluster is starting.
id
: string
ID of the cluster. Generated at creation time.
folderId
: string
ID of the folder that the cluster belongs to.
createdAt
: google.protobuf.Timestamp
Creation timestamp.
name
: string
Name of the cluster. The name is unique within the folder.
description
: string
Description of the cluster.
labels
: string
Cluster labels as key:value
pairs.
monitoring
: Monitoring
Monitoring systems relevant to the cluster.
config
: ClusterConfig
Configuration of the cluster.
health
: Health
Aggregated cluster health.
status
: Status
Cluster status.
zoneId
: string
ID of the availability zone where the cluster resides.
serviceAccountId
: string
ID of service account for the Data Proc manager agent.
bucket
: string
Object Storage bucket to be used for Data Proc jobs that are run in the cluster.
uiProxy
: bool
Whether UI Proxy feature is enabled.
securityGroupIds
: string
User security groups.
hostGroupIds
: string
Host groups hosting VMs of the cluster.
deletionProtection
: bool
Deletion Protection inhibits deletion of the cluster
logGroupId
: string
ID of the cloud logging log group to write logs. If not set, default log group for the folder will be used. To prevent logs from being sent to the cloud set cluster property dataproc:disable_cloud_logging = true
ClusterConfig
versionId
: string
Image version for cluster provisioning. All available versions are listed in the documentation.
hadoop
: HadoopConfig
Data Proc specific configuration options.
HadoopConfig
Hadoop configuration that describes services installed in a cluster, their properties and settings.
Service
SERVICE_UNSPECIFIED
HDFS
YARN
MAPREDUCE
HIVE
TEZ
ZOOKEEPER
HBASE
SQOOP
FLUME
SPARK
ZEPPELIN
OOZIE
LIVY
services
: Service
Set of services used in the cluster (if empty, the default set is used).
properties
: string
Properties set for all hosts in *-site.xml
configurations. The key should indicate
the service and the property.
For example, use the key 'hdfs:dfs.replication' to set the dfs.replication
property
in the file /etc/hadoop/conf/hdfs-site.xml
.
sshPublicKeys
: string
List of public SSH keys to access to cluster hosts.
initializationActions
: InitializationAction
Set of init-actions
HiveJob
properties
: string
Property names and values, used to configure Data Proc and Hive.
continueOnFailure
: bool
Flag indicating whether a job should continue to run if a query fails.
scriptVariables
: string
Query variables and their values.
jarFileUris
: string
JAR file URIs to add to CLASSPATH of the Hive driver and each task.
One of queryType
queryFileUri
: stringURI of the script with all the necessary Hive queries.
queryList
: QueryListList of Hive queries to be used in the job.
Host
A Data Proc host. For details about the concept, see documentation.
name
: string
Name of the Data Proc host. The host name is assigned by Data Proc at creation time and cannot be changed. The name is generated to be unique across all Data Proc hosts that exist on the platform, as it defines the FQDN of the host.
subclusterId
: string
ID of the Data Proc subcluster that the host belongs to.
health
: Health
Status code of the aggregated health of the host.
computeInstanceId
: string
ID of the Compute virtual machine that is used as the Data Proc host.
role
: Role
Role of the host in the cluster.
InitializationAction
uri
: string
URI of the executable file
args
: string
Arguments to the initialization action
timeout
: int64
Execution timeout
Job
A Data Proc job. For details about the concept, see documentation.
Status
STATUS_UNSPECIFIED
PROVISIONING
Job is logged in the database and is waiting for the agent to run it.
PENDING
Job is acquired by the agent and is in the queue for execution.
RUNNING
Job is being run in the cluster.
ERROR
Job failed to finish the run properly.
DONE
Job is finished.
CANCELLED
Job is cancelled.
CANCELLING
Job is waiting for cancellation.
id
: string
ID of the job. Generated at creation time.
clusterId
: string
ID of the Data Proc cluster that the job belongs to.
createdAt
: google.protobuf.Timestamp
Creation timestamp.
startedAt
: google.protobuf.Timestamp
The time when the job was started.
finishedAt
: google.protobuf.Timestamp
The time when the job was finished.
name
: string
Name of the job, specified in the JobService.Create request.
createdBy
: string
The id of the user who created the job
status
: Status
Job status.
One of jobSpec
Specification for the job.
mapreduceJob
: MapreduceJobSpecification for a MapReduce job.
sparkJob
: SparkJobSpecification for a Spark job.
pysparkJob
: PysparkJobSpecification for a PySpark job.
hiveJob
: HiveJobSpecification for a Hive job.
applicationInfo
: ApplicationInfo
Attributes of YARN application.
MapreduceJob
args
: string
Optional arguments to pass to the driver.
jarFileUris
: string
JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
fileUris
: string
URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
archiveUris
: string
URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
properties
: string
Property names and values, used to configure Data Proc and MapReduce.
One of driver
mainJarFileUri
: stringHCFS URI of the .jar file containing the driver class.
mainClass
: stringThe name of the driver class.
Monitoring
Metadata of a monitoring system for a Data Proc cluster.
name
: string
Name of the monitoring system.
description
: string
Description of the monitoring system.
link
: string
Link to the monitoring system.
PysparkJob
args
: string
Optional arguments to pass to the driver.
jarFileUris
: string
JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
fileUris
: string
URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
archiveUris
: string
URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
properties
: string
Property names and values, used to configure Data Proc and PySpark.
mainPythonFileUri
: string
URI of the file with the driver code. Must be a .py file.
pythonFileUris
: string
URIs of Python files to pass to the PySpark framework.
packages
: string
List of maven coordinates of jars to include on the driver and executor classpaths.
repositories
: string
List of additional remote repositories to search for the maven coordinates given with --packages.
excludePackages
: string
List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.
QueryList
queries
: string
List of Hive queries.
ResourcePreset
A ResourcePreset resource for describing hardware configuration presets.
id
: string
ID of the ResourcePreset resource.
zoneIds
: string
IDs of availability zones where the resource preset is available.
cores
: int64
Number of CPU cores for a Data Proc host created with the preset.
memory
: int64
RAM volume for a Data Proc host created with the preset, in bytes.
Resources
resourcePresetId
: string
ID of the resource preset for computational resources available to a host (CPU, memory etc.). All available presets are listed in the documentation.
diskTypeId
: string
Type of the storage environment for the host. Possible values:
- network-hdd - network HDD drive,
- network-ssd - network SSD drive.
diskSize
: int64
Volume of the storage available to a host, in bytes.
SparkJob
args
: string
Optional arguments to pass to the driver.
jarFileUris
: string
JAR file URIs to add to CLASSPATH of the Data Proc driver and each task.
fileUris
: string
URIs of resource files to be copied to the working directory of Data Proc drivers and distributed Hadoop tasks.
archiveUris
: string
URIs of archives to be extracted to the working directory of Data Proc drivers and tasks.
properties
: string
Property names and values, used to configure Data Proc and Spark.
mainJarFileUri
: string
The HCFS URI of the JAR file containing the main
class for the job.
mainClass
: string
The name of the driver class.
packages
: string
List of maven coordinates of jars to include on the driver and executor classpaths.
repositories
: string
List of additional remote repositories to search for the maven coordinates given with --packages.
excludePackages
: string
List of groupId:artifactId, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.
Subcluster
A Data Proc subcluster. For details about the concept, see documentation.
id
: string
ID of the subcluster. Generated at creation time.
clusterId
: string
ID of the Data Proc cluster that the subcluster belongs to.
createdAt
: google.protobuf.Timestamp
Creation timestamp.
name
: string
Name of the subcluster. The name is unique within the cluster.
role
: Role
Role that is fulfilled by hosts of the subcluster.
resources
: Resources
Resources allocated for each host in the subcluster.
subnetId
: string
ID of the VPC subnet used for hosts in the subcluster.
hostsCount
: int64
Number of hosts in the subcluster.
assignPublicIp
: bool
Assign public ip addresses for all hosts in subcluter.
autoscalingConfig
: AutoscalingConfig
Configuration for instance group based subclusters
instanceGroupId
: string
ID of Compute Instance Group for autoscaling subclusters