Lifecycle of Managing Application Cluster
This section demonstrates the typical working flows of managing an application including creating/deleting cluster, adding/deleting cluster node, and horizontal scaling.
Create Cluster
- Prepare resources for this cluster, including creating instance, volume, network, etc.
- Register cluster info to metadata service.
- Start confd agent on all nodes, which watches metadata service, updates the corresponding configuration files that are defined in the template files (toml & tmpl) under /etc/confd if any cluster information is changed. If reload_cmd is defined in toml file and related meta data is changed, the command will be executed.
- Execute the command defined in init and start service according to the order defined in init service post_start_service. Init command will execute after start command if post_start_service is set to true. The same service command on the nodes with different roles should execute in order by ‘order’ property defined in mustache file, the smaller value of ‘order’, the higher priority the command is assigned with. The default value of ‘order’ property is zero(the highest priority), which means all nodes execute same service command in parallel.
Delete Cluster
- If destroy service is defined for the node to be deleted, skip step 2 below and start from step 3, otherwise execute step 2 and step 8.
- On each node, execute stop command in ascending order based on the value of ‘order’ property.
- If post_stop_service of destroy service is set to False for the node to be deleted, skip step 4 and start from step 5 below, otherwise execute step 4 and step 8.
- On each node to be deleted, execute stop command in ascending order based on the value of ‘order’ property,then execute destroy command in the same way.
- On each node to be deleted, execute destroy command in ascending order based on the value of ‘order’ property, move to corresponding step according to the exit code which is success (zero) or failed (non zero). This step is to prevent losing data when deleting cluster by mistake.
- If destroy command returns non zero and end user does not choose ‘Force Delete’, terminate this ‘Deleting’ task as failed.
- If destroy command returns zero or end user selects ‘Force Delete’ when the destroy command returns non zero, execute stop command in ascending order based on the value of ‘order’ property.
- Delete all resources of this cluster and deregistrate cluster information from metadata service.
Add Node
To support adding node, please define scale_horizontal in advanced_actions, refer to Development Specification - Full Version
- Prepare resources for new nodes, including instance, volume, network,etc.
- Register the new nodes info into metadata service under /hosts and /adding-hosts. The latter is a temporary folder for some pre-processing operations for scaling out.
- As cluster informration in metadata service is changed, the files related to that information on existing nodes(not the ones just added) may be updated, and reload command is executed if reload_cmd is defined in the related toml file.
- Start confd agent on each added node, also update the configuration and execute reload command if necessary.
- Execute init and start command on the added nodes according to the order defined by post_start_service in init service, The same service command on the nodes with different roles should execute in order by ‘order’ property defined in mustache file, the smaller value of ‘order’ is set, the higher priority the command is assigned with. The default value of ‘order’ property is zero(the highest priority), which means all nodes execute same service command in parallel.
- Execute scale_out command defined on the existing nodes(not the one just added). Please be aware the command is only executed on some nodes that is defined by the property ‘nodes_to_execute_on’.
- Delete the temporary content under /adding-hosts in metadata service.
Delete Node
To support deleting node, please define scale_horizontal in advanced_actions, refer to Development Specification - Full Version
- Register the nodes to be deleted into metadata service under /deleting-hosts, which is the temporary folder for some pre-processing operations for scaling in.
- As cluster information in metadata service is changed, the files related to the information on each node would be updated and reload command is executed if reload_cmd is defined in the related toml file.
- Start from step 5 if destroy service is defined on the nodes to be deleted, otherwise execute steps 4 and 10 - 12.
- Execute scale_in command in ascending order on the existing nodes(not the deleted ones) of different roles according to ‘order’ property, then execute stop_cmd in ascending order on the nodes to be deleted.
- If post_stop_service is set to False on the nodes to be deleted, then go to step 7, otherwise execute steps 6 and 10 - 12.
- Execute scale_in command in ascending order on the existing nodes(not the deleted ones) of different roles. Execute stop_cmd in ascending order on the nodes to be deleted, then execute destroy_cmd in ascending order on the nodes to be deleted.
- Execute destroy command in ascending order on the nodes to be deleted of different roles, move to corresponding step according to the exit code which is success (zero) or failed (non zero). This step is to prevent losing data when deleting nodes by mistake.
- Stop deleting nodes if destroy command return non zero and end user does not select ‘Force Delete’, terminate this ‘Deleting’ task as failed, and delete the data under /deleting-hosts in metadata service.
- If destroy command return zero or end user selects ‘Force Delete’ when the destroy command returns non zero, execute scale_in command in ascending order on nodes of different roles, then execute stop command in the same way.
- Delete the resource of the nodes to be deleted.
- Deregistrate the deleted nodes from metadata service and the info under /deleting-hosts.
- As cluster information in metadata service is changed, the files related to the information on existing nodes(not the ones just deleted) would be updated. Execute reload command if reload_cmd is defined in the related toml file.
Scale Vertical
- Register the node role to be scaled into vertical-scaling-roles in metadata service.
- Scale vertical online if just scaling volume, then execute steps 6 - 7.
- If stop service is defined on the nodes to be scaled,execute steps 4 and 6 - 7, otherwise execute steps 5 and 6 - 7.
- According to the defined vertical_scaling_policy, execute the following operations in sequential or in parallel: execute stop command on the nodes to be scaled, scale nodes vertical, then execute start command on the nodes to be scaled.
- Execute stop command on the nodes not to be scaled, then scale nodes vertical, then execute start command on the nodes not to be scaled.
- Update the data in metadata service for the nodes scaled vertical and delete vertical-scaling-roles.
- As cluster information in metadata service is changed, the files related to the information on all nodes would be updated, and reload command is executed if reload_cmd is defined in the related toml file.
Note: vertical-scaling-roles will be deleted even any exception happens during scaling vertical.
Upgrade
- Stop cluster node.
- Start cluster nodes with image of new app_version.
- Start confd agent on all nodes, which watches metadata service, updates the corresponding configuration files that are defined in the template files (toml & tmpl) under /etc/confd if any cluster information is changed. If reload_cmd is defined in toml file and related meta data is changed, the command will be executed.
- Execute start and upgrade command on the nodes according to the order defined by post_start_service in upgrade service, The same service command on the nodes with different roles should execute in ascending order based on the value of ‘order’ property.
- If the exit code of any command in step 4 is non zero(failed), the upgrading task would be terminated and marked as failed, otherwise the upgrading task finished with success. In case of upgrading failure, please stop the cluster and start downgrading from step 6.
- Start cluster nodes with image of old app_version.
- Start confd agent on all nodes, which watches metadata service, updates the corresponding configuration files that are defined in the template files (toml & tmpl) under /etc/confd if any cluster information is changed. If reload_cmd is defined in toml file and related meta data is changed, the command will be executed.
- Execute start command in ascending order on nodes of different roles.
Note: The workflow from step 1 to step 4 would be executed sequentially or parallelly according to the defined upgrading_policy.