In the first two parts of this discussion, I have thoroughly explored various prerequisites necessary for the successful deployment of automation and machine learning (ML) models. These prerequisites encompass a range of considerations, including the selection of appropriate tools, understanding the underlying infrastructure, and ensuring that all necessary dependencies are in place for the automation process to function seamlessly. Each of these elements plays a critical role in laying a solid foundation for the deployment pipeline, which ultimately enhances the efficiency and reliability of the automation and ML model operations.
In this third part, we will delve into the specifics of Oracle Cloud Infrastructure (OCI) functions, which are instrumental in driving the entire automation process that we are implementing. OCI functions allow for serverless computing, enabling developers to execute code in response to events without the need for provisioning or managing servers. In this context, I have leveraged two distinct OCI functions to streamline our operations. The first function is designed to conduct a comprehensive scan for all vulnerabilities present within the system. This is a crucial step in maintaining the integrity and security of our environment, as identifying vulnerabilities allows us to take proactive measures to mitigate potential risks.
Following the vulnerability assessment, the first function also schedules a second function that is responsible for performing an Instance Boot Volume Backup. This backup process is essential for ensuring data safety and recovery in case of any unforeseen issues during the upgrade process. Additionally, the second function is tasked with scheduling an update job, which will systematically apply necessary updates to the OCI instances. This ensures that all instances are running the latest software versions, thereby enhancing performance and security. Once the updates have been successfully installed, the function will also initiate a restart of the OCI instance. This restart is a critical step, as it allows the new configurations and updates to take full effect, ensuring that the instance operates optimally.
Moreover, my first function is designed to schedule multiple individual functions concurrently, which is a significant advantage. By orchestrating these functions to run in parallel, we can ensure that all instances are upgraded simultaneously, thereby reducing the overall time required for the upgrade process. This parallel execution not only improves efficiency but also minimizes downtime, allowing for a smoother transition and less disruption to ongoing operations. Through this approach, we can effectively manage the complexity of the upgrade process while maintaining high standards of reliability and performance across our OCI instances.

Let's explore the first function in detail:
To begin with, we need to import all the necessary libraries that will be crucial for the successful execution of our function. It is imperative that we specifically choose the numpy library, as its absence will lead to the failure of this function. Numpy provides powerful tools for numerical computing, which are essential for handling large datasets efficiently and performing mathematical operations seamlessly.
Now, let's delve into the various components and functions that are integral to our implementation:
get_boot_volume_id: This function is responsible for retrieving the unique identifier associated with the boot volume of a particular instance. The boot volume is critical as it contains the operating system and essential files required for the instance to run.
boot_volume_details: This function provides detailed information about the boot volume, including its size, type, and state. Understanding these details is vital for monitoring and managing the resources allocated to an instance.
get_running_windows_instances: This function identifies and lists all the currently running instances that are operating on the Windows platform. It is particularly useful for administrators who need to manage Windows environments and ensure that all instances are functioning correctly.
clean_os_version: This function cleans and standardizes the operating system version strings to ensure consistency across different instances. This is particularly important when comparing versions or applying updates, as discrepancies in version formatting can lead to errors.
get_instance_details: This function retrieves comprehensive details about a specific instance, including its configuration, status, and resource usage. Having access to these details allows for better management and optimization of cloud resources.
fetch_metric: This function is designed to gather various performance metrics related to the instance. These metrics can include CPU usage, memory consumption, and disk I/O statistics, which are crucial for performance monitoring and troubleshooting.
storage_fetch_metric: Similar to the fetch_metric function, this one specifically focuses on retrieving metrics related to free storage space.
list_available_patches_osmh: This function lists all the available patches for the operating system, which is essential for maintaining security and stability. Keeping the system updated with the latest patches helps protect against vulnerabilities and bugs.
get_msrc_api_url: This function retrieves the API URL for the Microsoft Security Response Center (MSRC), which is a key resource for obtaining information about security updates and patches. Accessing this API allows for automated retrieval of security-related data.
get_cve_from_kb: This function extracts Common Vulnerabilities and Exposures (CVE) information from the Microsoft Knowledge Base (KB). This is important for organizations that need to assess their security posture and address known vulnerabilities.
get_highest_cvss_score: This function calculates and returns the highest Common Vulnerability Scoring System (CVSS) score from a list of vulnerabilities. The CVSS score is a crucial metric for prioritizing which vulnerabilities to address first based on their severity.
convert_size_to_mb: This utility function converts sizes from bytes to megabytes, providing a more user-friendly representation of data sizes. This is particularly useful when dealing with storage metrics or data transfer sizes.
get_patch_size: This function determines the size of a specific patch that needs to be applied to an instance. Knowing the patch size helps in planning for bandwidth usage and storage requirements during the update process.
get_last_patch_status: This function checks and retrieves the status of the last patch that was applied to an instance. Understanding the patch status is critical for ensuring that the system is up-to-date and secure.
create_oci_function: This function is used to create an Oracle Cloud Infrastructure (OCI) function, which can be utilized for various serverless operations. This allows for the execution of code without the need for managing servers, thereby enhancing scalability and efficiency.
create_schedule: Finally, this function is responsible for creating a schedule for tasks such as applying patches or running maintenance scripts. Scheduling these tasks helps automate processes, ensuring that they are performed regularly and without manual intervention.
import io
import json
import logging
import oci
import requests
import re
import pandas
from datetime import datetime, timedelta, timezone
from oci.exceptions import ServiceError
from bs4 import BeautifulSoup
from oci.signer import Signer
import random
from io import StringIO
import numpy
from fdk import response
def get_boot_volume_id(compartment_id, instance_id,compute_client,availability_domain):
"""Fetches the Boot Volume ID attached to the instance."""
try:
attachments = compute_client.list_boot_volume_attachments(
compartment_id=compartment_id, availability_domain=availability_domain
).data
for attachment in attachments:
if attachment.instance_id == instance_id:
return attachment.boot_volume_id
raise Exception(f"No boot volume found for instance {instance_id}")
except ServiceError as e:
print(f"Error fetching boot volume ID: {e}")
raise
def boot_volume_details(boot_volume_id,block_storage_client):
"""Fetches the Boot Volume details."""
try:
boot_volume = block_storage_client.get_boot_volume(boot_volume_id).data
return boot_volume
except ServiceError as e:
print(f"Error fetching boot volume details: {e}")
raise
def get_running_windows_instances(compute_client,block_storage_client,compartment_id):
"""Fetches all running Windows instances and their details."""
instances = compute_client.list_instances(compartment_id).data
powered_on_instances = [inst for inst in instances if inst.lifecycle_state == "RUNNING"]
print(f"Total Running Instances: {len(powered_on_instances)}")
windows_instances = []
for inst in powered_on_instances:
try:
image = compute_client.get_image(inst.image_id).data
os_name = image.operating_system.lower()
os_version = image.operating_system_version
availability_domain = inst.availability_domain
if "windows" in os_name:
boot_id = get_boot_volume_id(compartment_id, inst.id,compute_client,availability_domain)
boot_details = boot_volume_details(boot_id,block_storage_client).size_in_mbs
windows_instances.append((inst.display_name, inst.id, os_version, availability_domain, boot_details))
except oci.exceptions.ServiceError as e:
print(f"Error fetching image details for instance {inst.display_name}: {e}")
return windows_instances
def clean_os_version(os_version):
"""Extracts the main OS version (e.g., 'Server 2022' from 'Server 2022 Standard')."""
match = re.match(r"Server (2022|2019|2016)", os_version)
return match.group(1) if match else os_version
def get_instance_details(compute_client, instance_ocid):
"""Fetches instance metadata and tags."""
instance = compute_client.get_instance(instance_ocid).data
image_data = compute_client.get_image(instance.image_id).data
freeform_tags = instance.freeform_tags
def clean_value(value):
return value.strip("\"") if value else None
return {
"Instance Name": instance.display_name,
"Instance OCID": instance_ocid,
"OS Name": image_data.operating_system,
"OS Type": clean_os_version(image_data.operating_system_version),
"Application Owner": freeform_tags.get("application-owner"),
"Downtime Start": clean_value(freeform_tags.get("downtime-start-time")),
"Downtime End": clean_value(freeform_tags.get("downtime-end-time")),
"OS Owner": freeform_tags.get("os-owner"),
"Defined Tags": instance.defined_tags
}
def fetch_metric(metric_name, namespace, hostname, monitoring_client,compartment_id, statistic="mean"):
"""Fetches performance metrics from OCI Monitoring."""
try:
response = monitoring_client.summarize_metrics_data(
compartment_id=compartment_id,
summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
namespace=namespace,
query=f"{metric_name}[15m]{{resourceDisplayName = \"{hostname}\"}}.{statistic}()"
)
)
if response.data:
return round(response.data[0].aggregated_datapoints[-1].value, 2)
else:
return None
except Exception as e:
print(f"Error fetching {metric_name}: {e}")
return None
def storage_fetch_metric(metric_name, namespace, hostname,monitoring_client,compartment_id, statistic="mean"):
"""Fetches storage metrics from OCI Monitoring."""
try:
response = monitoring_client.summarize_metrics_data(
compartment_id=compartment_id,
summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
namespace=namespace,
query=f"{metric_name}[5m]{{agentHostName = \"{hostname}\"}}.{statistic}()"
)
)
if response.data:
return float(response.data[0].aggregated_datapoints[-1].value)
else:
return None
except Exception as e:
print(f"Error fetching {metric_name}: {e}")
return None
def list_available_patches_osmh(instance_id,osmh_client):
# Fetch available updates for the instance
response = osmh_client.list_managed_instance_available_windows_updates(
managed_instance_id = instance_id
)
patches = response.data
patch_list = []
for patch in patches.items:
patch_list.append(patch.name)
return patch_list
def extract_kb_numbers(package_list):
kb_pattern = r"KB\d+" # Regular expression to match KB numbers
kb_numbers = []
for package in package_list:
match = re.search(kb_pattern, package)
if match:
kb_numbers.append(match.group())
return kb_numbers
def get_msrc_api_url():
today = datetime.today()
year = today.year
month = today.month
# Find the first Tuesday of the month
first_day = datetime(year, month, 1)
first_tuesday = first_day + timedelta(days=(1 - first_day.weekday() + 7) % 7)
# Second Tuesday of the month
second_tuesday = first_tuesday + timedelta(days=7)
# If today is before the second Tuesday, go to the previous month's Patch Tuesday
if today < second_tuesday:
previous_month = month - 1 if month > 1 else 12
year = year if month > 1 else year - 1
else:
previous_month = month
# Format month as "Jan", "Feb", etc.
month_str = datetime(year, previous_month, 1).strftime("%b")
return f"https://api.msrc.microsoft.com/cvrf/v3.0/cvrf/{year}-{month_str}"
def get_cve_from_kb(kb_numbers, os_name):
current_year = datetime.now().year
current_month = datetime.now().strftime("%b") # e.g., "Feb"
api_url = get_msrc_api_url()
headers = {"Accept": "application/json"}
cve_list = set()
try:
response = requests.get(api_url, headers=headers)
response.raise_for_status()
data = response.json()
vulnerabilities = data.get("Vulnerability", [])
for vuln in vulnerabilities:
for remediation in vuln.get("Remediations", []):
remediation_value = remediation.get("Description", {}).get("Value") # Extract KB number
if remediation_value and any(kb in remediation_value for kb in kb_numbers):
# Check if OS matches in "DocumentNotes" or "ProductTree"
document_notes = data.get("DocumentNotes", [])
product_names = data.get("ProductTree", {}).get("FullProductName", [])
os_match = any(os_name.lower() in note.get("Value", "").lower() for note in document_notes)
os_match |= any(os_name.lower() in product.get("Value", "").lower() for product in product_names)
if os_match:
cve_list.add(vuln.get("CVE"))
return list(cve_list) if cve_list else None
except requests.exceptions.RequestException as e:
print(f"Error fetching data: {e}")
return None
def get_highest_cvss_score(cve_list):
if not cve_list:
return None
current_year = datetime.now().year
current_month = datetime.now().strftime("%b")
api_url = get_msrc_api_url()
headers = {"Accept": "application/json"}
highest_score = 0
try:
response = requests.get(api_url, headers=headers)
response.raise_for_status()
data = response.json()
vulnerabilities = data.get("Vulnerability", [])
for vuln in vulnerabilities:
if vuln.get("CVE") in cve_list:
for score_set in vuln.get("CVSSScoreSets", []):
base_score = score_set.get("BaseScore")
if base_score is not None:
highest_score = max(highest_score, base_score)
return highest_score if highest_score > 0 else None
except requests.exceptions.RequestException as e:
print(f"Error fetching CVSS scores: {e}")
return None
def convert_size_to_mb(size_str):
"""
Convert patch size from KB, MB, GB to MB.
"""
size_str = size_str.upper()
size_match = re.search(r"([\d.]+)\s*(KB|MB|GB)", size_str)
if size_match:
size_value, size_unit = float(size_match.group(1)), size_match.group(2)
if size_unit == "KB":
return size_value / 1024 # Convert KB to MB
elif size_unit == "GB":
return size_value * 1024 # Convert GB to MB
return size_value # MB remains the same
return None # Return None if no valid size found
def get_patch_size(kb_numbers):
patch_sizes = {}
headers = {"User-Agent": "Mozilla/5.0"}
for kb in kb_numbers:
url = f"https://www.catalog.update.microsoft.com/Search.aspx?q={kb}"
response = requests.get(url, headers=headers)
if response.status_code != 200:
continue # Skip this KB if the request fails
soup = BeautifulSoup(response.text, "html.parser")
size_texts = soup.find_all(string=lambda text: text and ("MB" in text or "KB" in text or "GB" in text))
if size_texts and len(size_texts) > 2:
size_mb = convert_size_to_mb(size_texts[2].strip())
if size_mb is not None:
patch_sizes[kb] = size_mb
return patch_sizes
def get_last_patch_status(compartment_id, instance_id, osmh_work_client):
try:
# Fetch work requests related to the instance
response = osmh_work_client.list_work_requests(
compartment_id=compartment_id,
resource_id=instance_id
)
# Get the list of work requests
work_requests = response.data.items
if not work_requests:
print("No patch history found. Marking as null.")
return "Not Available", "Not Available" # No patch history
# Sort by latest completion time (descending order)
work_requests.sort(key=lambda wr: wr.time_created, reverse=True)
# Get the most recent work request status
last_patch_date = work_requests[0].time_created
last_patch_status = work_requests[0].status
return last_patch_date, last_patch_status
except oci.exceptions.ServiceError as e:
print(f"OCI Service Error: {e}")
except Exception as e:
print(f"Unexpected Error: {e}")
return "Not Available", "Not Available" # Return None if there's an error
def create_oci_function(functions_client, display_name, vm_name):
try:
application_id = "ocid1.fnapp.oc1.iad.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
function_details = oci.functions.models.CreateFunctionDetails(
display_name=display_name,
application_id=application_id,
image="iad.ocir.io/XXXXXXXXXXXXX/XXXXXXXX/patchrolloutconfig01:0.0.6",
memory_in_mbs=256,
timeout_in_seconds=300,
config={"instance_name": vm_name}
)
response = functions_client.create_function(function_details)
function_id = response.data.id
print(f"Function created successfully: {function_id}")
return function_id
except oci.exceptions.ServiceError as e:
print(f"OCI Service Error: {e.message}")
except Exception as e:
print(f"General Error: {str(e)}")
return None
def create_schedule(resource_scheduler_client, start_time, display_name, function_id):
try:
# Define the required parameters
compartment_id = "ocid1.tenancy.oc1..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
action = "START_RESOURCE"
recurrence_details = "FREQ=DAILY;COUNT=1"
recurrence_type = "ICAL"
d1_id = function_id
# Create schedule details
schedule_details = oci.resource_scheduler.models.CreateScheduleDetails(
compartment_id=compartment_id,
action=action,
recurrence_details=recurrence_details,
recurrence_type=recurrence_type,
display_name=display_name,
resources=[oci.resource_scheduler.models.Resource(id=function_id)],
time_starts=start_time
)
# Call OCI API to create the schedule
create_schedule_response = resource_scheduler_client.create_schedule(
create_schedule_details=schedule_details
)
print(f"Schedule '{display_name}' created successfully: {create_schedule_response.data}")
return create_schedule_response.data
except oci.exceptions.ServiceError as e:
print(f"OCI Service Error: {e.message}")
except Exception as e:
print(f"General Error: {str(e)}")
return None
def parse_quartz_cron(cron_expr):
pattern = r"(\d+)\s+(\d+)\s+\?\s+\*\s+(\d+)#(\d+)"
match = re.match(pattern, cron_expr)
if not match:
raise ValueError("Invalid Quartz cron expression. Expected format: '0 20 ? * 6#3'")
minute = int(match.group(1))
hour = int(match.group(2))
quartz_day_of_week = int(match.group(3)) # Quartz: 0=Sunday, 6=Saturday
nth_occurrence = int(match.group(4)) # 1st, 2nd, 3rd, 4th, 5th
# Convert Quartz weekday (0=Sunday, 6=Saturday) to Python weekday (0=Monday, 6=Sunday)
python_day_of_week = (quartz_day_of_week - 1) % 7 # Sunday (0) in Quartz → Python's Sunday (6)
return minute, hour, python_day_of_week, nth_occurrence
def get_nth_weekday_of_month(year, month, day_of_week, nth_occurrence):
first_day = datetime(year, month, 1)
# List all occurrences of the given weekday in the month
weekdays = [
first_day + timedelta(days=i) for i in range(31)
if (first_day + timedelta(days=i)).weekday() == day_of_week and (first_day + timedelta(days=i)).month == month
]
# If nth occurrence does not exist, return None
return weekdays[nth_occurrence - 1] if len(weekdays) >= nth_occurrence else None
def get_next_downtime(cron_expr):
minute, hour, day_of_week, nth_occurrence = parse_quartz_cron(cron_expr)
now = pandas.Timestamp.utcnow()
# Ensure `now` is timezone-aware
if now.tzinfo is None:
now = now.tz_localize("UTC")
current_year, current_month = now.year, now.month
# Find the next occurrence in this month
next_downtime = get_nth_weekday_of_month(current_year, current_month, day_of_week, nth_occurrence)
# If this month doesn't have the required nth occurrence OR it's in the past, move to next month
if not next_downtime or pandas.Timestamp(next_downtime).tz_localize("UTC") < now:
next_month = current_month + 1 if current_month < 12 else 1
next_year = current_year if current_month < 12 else current_year + 1
next_downtime = get_nth_weekday_of_month(next_year, next_month, day_of_week, nth_occurrence)
if not next_downtime:
raise ValueError("Invalid Nth occurrence, no such date exists in this month or next.")
# Set the exact time and localize to UTC
next_downtime = pandas.Timestamp(next_downtime.replace(hour=hour, minute=minute, second=0))
if next_downtime.tzinfo is None:
next_downtime = next_downtime.tz_localize("UTC")
else:
next_downtime = next_downtime.tz_convert("UTC")
return next_downtime
def handler(ctx, data: io.BytesIO = None):
try:
signer = oci.auth.signers.get_resource_principals_signer()
compartment_id = "ocid1.compartment.oc1..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
compute_client = oci.core.ComputeClient(config={},signer=signer)
monitoring_client = oci.monitoring.MonitoringClient(config={},signer=signer)
block_storage_client = oci.core.BlockstorageClient(config={},signer=signer)
osmh_client = oci.os_management_hub.ManagedInstanceClient(config={},signer=signer)
osmh_schedule_client=oci.os_management_hub.WorkRequestClient(config={},signer=signer)
osmh_work_client = oci.os_management_hub.WorkRequestClient(config={},signer=signer)
functions_client = oci.functions.FunctionsManagementClient(config={}, signer=signer)
resource_scheduler_client = oci.resource_scheduler.ScheduleClient(config={}, signer=signer)
namespace_compute = "oci_computeagent"
namespace_storage = "oci_managementagent"
# Fetch all running Windows instances
windows_instances = get_running_windows_instances(compute_client,block_storage_client, compartment_id)
# Store details in Pandas DataFrame
data = []
for instance_name, instance_ocid, os_version, availability_domain, boot_volume_size in windows_instances:
instance_details = get_instance_details(compute_client, instance_ocid)
patch_details = list_available_patches_osmh(instance_ocid,osmh_client)
kb_numbers = extract_kb_numbers(patch_details)
patch_sizes = get_patch_size(kb_numbers)
rm_kb = [kb.replace("KB", "") for kb in kb_numbers]
os_name = instance_details["OS Name"]+" "+"Server"+" "+ instance_details["OS Type"]
cve_ids = get_cve_from_kb(rm_kb, os_name)
if cve_ids:
highest_cvss = get_highest_cvss_score(cve_ids)
print(f"Highest CVSS Base Score for {kb_numbers} on {os_name}: {highest_cvss}")
else:
highest_cvss = None
if patch_sizes:
# Get the KB with the maximum patch size
max_kb = max(patch_sizes, key=patch_sizes.get)
max_size = patch_sizes[max_kb]
else:
print("No valid patch sizes found.")
max_size = None
hostname = instance_name + ".horizonvdi.local"
cpu_utilization = fetch_metric("CpuUtilization", namespace_compute, instance_name,monitoring_client,compartment_id)
memory_utilization = fetch_metric("MemoryUtilization", namespace_compute, instance_name,monitoring_client,compartment_id)
free_disk_utilization = storage_fetch_metric("diskUsageFree", namespace_storage, hostname,monitoring_client,compartment_id)
last_patch_date, last_patch_status = get_last_patch_status(compartment_id, instance_ocid,osmh_work_client)
data.append({
"instance_id": instance_name,
"os_type": instance_details["OS Name"],
"os_version": instance_details["OS Type"],
"patch_id": kb_numbers,
"patch_severity": highest_cvss or 0,
"patch_size": max_size or 0,
"Availability Domain": availability_domain,
"Boot Volume Size (MB)": boot_volume_size,
"OS Owner": instance_details["OS Owner"],
"Application Owner": instance_details["Application Owner"],
"downtime_start": instance_details["Downtime Start"],
"downtime_end": instance_details["Downtime End"],
"cpu_usage": cpu_utilization,
"memory_usage": memory_utilization,
"Free Disk (GB)": free_disk_utilization,
"network_latency": 5,
"scan_date": datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f"),
"last_patch_date": last_patch_date,
"last_patch_status": last_patch_status
})
# Convert data to DataFrame
df = pandas.DataFrame(data)
df["disk_usage"] = round((100 - (df["Free Disk (GB)"]/df["Boot Volume Size (MB)"])*100),2)
df['downtime_start'] = df['downtime_start'].apply(get_next_downtime)
df['downtime_end'] = df['downtime_end'].apply(get_next_downtime)
df['downtime_start'] = pandas.to_datetime(df['downtime_start'], format='%Y-%m-%d %H:%M %Z')
df['downtime_end'] = pandas.to_datetime(df['downtime_end'], format='%Y-%m-%d %H:%M %Z')
# Calculate duration
df['downtime_duration'] = df['downtime_end'] - df['downtime_start']
# Convert duration to minutes or hours if needed
df['downtime_duration'] = df['downtime_duration'].dt.total_seconds() / 60
# Delete rows where "Patches" column is empty
df = df[df["patch_id"].astype(bool)]
df['last_patch_date'] = df['last_patch_date'].apply(lambda x: datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f") if x == 'Not Available' else x)
# Convert all 'last_patch_date' values to datetime and strip timezone info
df['last_patch_date'] = pandas.to_datetime(df['last_patch_date'],utc=True, errors='coerce')
# Format the datetime values to match the desired format
df['last_patch_date'] = df['last_patch_date'].dt.strftime('%Y-%m-%d %H:%M:%S.%f')
df['os_version'] = df['os_version'].astype('int64')
df['downtime_start'] = df['downtime_start'].astype(str)
df['downtime_end'] = df['downtime_end'].astype(str)
final_df = df[["instance_id", "os_type", "os_version", "patch_id", "patch_severity", "cpu_usage","memory_usage","disk_usage","network_latency", "patch_size", "scan_date","last_patch_date","downtime_start","downtime_end","downtime_duration","last_patch_status"]]
final_df = final_df.reset_index(drop=True)
print(final_df)
data = final_df.to_dict(orient="records")
endpoint = "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.xxxxxxxxxxxxxxxxxxxxxxxxxx/predict"
result=requests.post(endpoint, json=data, auth=signer).json()
print(result)
df_result = pandas.read_json(StringIO(result))
df = pandas.concat([final_df,df_result["Prediction"]],axis=1)
df = df[["instance_id","downtime_start","Prediction"]]
filtered_df = df[df["Prediction"] == "Success"]
now = datetime.now(timezone.utc)
filtered_df['downtime_start'] = pandas.to_datetime(filtered_df['downtime_start'])
within_48_hours = filtered_df[(filtered_df['downtime_start'] >= now) & (filtered_df['downtime_start'] <= now + timedelta(hours=48))]
t1 = within_48_hours.to_dict(orient="records")
for item in t1:
vm_name = item.get("instance_id")
start_time = item.get("downtime_start")
display_name = f"Schedule-{random.randint(1000, 9999)}"
function_id = create_oci_function(functions_client, display_name, vm_name)
schedule_data = create_schedule(resource_scheduler_client, start_time, display_name, function_id)
print(schedule_data)
except Exception as handler_error:
logging.getLogger().error(handler_error)
return response.Response(
ctx,
response_data=json.dumps({"status": "Patch Scheduled"}),
headers={"Content-Type": "application/json"}
)
Func.yaml
schema_version: 20180708
name: intelligentpatchupdate01
version: 0.0.16
runtime: python
build_image: fnproject/python:3.11-dev
run_image: fnproject/python:3.11
entrypoint: /python/bin/fdk /function/func.py handler
memory: 512
Requirements.txt
fdk>=0.1.89
oci
pandas==2.2.3
beautifulsoup4==4.12.3
requests
numpy==1.26.4
Let's explore the second function in greater detail, examining its various components and their significance in the overall process.
generate_backup_name
This function is crucial as it creates a unique and descriptive name for the backup that is about to be generated. By incorporating elements such as the instance ID, timestamp, and perhaps the type of backup being performed, this function ensures that each backup can be easily identified and retrieved later. The naming convention can also aid in organizing backups chronologically or by instance type, which is particularly useful for system administrators managing multiple instances.
get_instance_id_by_name
Next, the function get_instance_id_by_name plays a vital role in identifying the specific instance that needs to be backed up. By taking the name of the instance as an input, this function queries the relevant cloud infrastructure to retrieve the unique instance ID associated with that name. This step is essential as it acts as a bridge between human-readable names and the machine-readable identifiers that are necessary for executing further operations.
get_boot_volume_id
Following this, we have the function get_boot_volume_id, which is responsible for fetching the boot volume ID of the specified instance. The boot volume is the storage that contains the operating system and is critical for the instance's operation. By obtaining the boot volume ID, this function ensures that the subsequent backup processes are targeting the correct storage, thus safeguarding the integrity of the backup.
create_boot_volume_backup
The function create_boot_volume_backup is where the actual backup process takes place. Utilizing the boot volume ID obtained earlier, this function initiates the procedure to create a backup of the boot volume. This may involve copying the data to a secure storage location, ensuring that it is preserved in a state that can be restored later if needed. The efficiency and reliability of this function are paramount, as it directly impacts the recoverability of the instance in case of failure or data loss.
trigger_os_management_hub_patch
Once the backup is successfully created, the function trigger_os_management_hub_patch is invoked. This function is designed to manage the operating system updates and patches for the instance. By triggering the OS management hub, it ensures that the instance is up-to-date with the latest security patches and features, which is critical for maintaining the overall health and security of the system. This step is particularly important in environments where security vulnerabilities need to be mitigated promptly.
reboot_instance
Finally, the function reboot_instance is executed to restart the instance after the updates have been applied. Rebooting is often necessary to finalize the installation of updates and ensure that all changes take effect properly. This function not only helps in refreshing the system but also in applying any configuration changes that may have occurred during the update process. The rebooting phase is a critical step in ensuring that the instance runs smoothly and efficiently after all maintenance tasks have been completed.
import io
import json
import logging
import oci
import time
from oci.exceptions import ServiceError
from datetime import datetime, timezone, timedelta
from oci.signer import Signer
import os
from fdk import response
signer = oci.auth.signers.get_resource_principals_signer()
compute_client = oci.core.ComputeClient(config={}, signer=signer)
block_storage_client = oci.core.BlockstorageClient(config={}, signer=signer)
os_management_hub_client = oci.os_management_hub.ScheduledJobClient(config={}, signer=signer)
os_management_hub_client_work = oci.os_management_hub.WorkRequestClient(config={}, signer=signer)
compute_attachments_client = oci.core.ComputeManagementClient(config={}, signer=signer)
instance_name = os.getenv("instance_name")
print(instance_name)
if not instance_name:
raise ValueError("ERROR: Missing configuration key instance_name")
def generate_backup_name(prefix="OS_Boot_Volume_Backup"):
timestamp = datetime.utcnow().strftime("%Y%m%d") # Format: YYYYMMDD
return f"{prefix}_{timestamp}"
def get_instance_id_by_name(compartment_id, instance_name):
"""Fetches the instance ID based on the instance name."""
try:
instances = compute_client.list_instances(compartment_id).data
for instance in instances:
if instance.display_name == instance_name:
print(f"Found Instance: {instance_name} (ID: {instance.id})")
return instance.id, instance.availability_domain
raise Exception(f"Instance '{instance_name}' not found in compartment {compartment_id}")
except ServiceError as e:
print(f"Error fetching instance: {e}")
raise
def get_boot_volume_id(compartment_id, instance_id, availability_domain):
"""Fetches the Boot Volume ID attached to the instance."""
try:
attachments = compute_client.list_boot_volume_attachments(
compartment_id=compartment_id, availability_domain=availability_domain
).data
for attachment in attachments:
if attachment.instance_id == instance_id:
print(f"Boot Volume ID: {attachment.boot_volume_id}")
return attachment.boot_volume_id
raise Exception(f"No boot volume found for instance {instance_id}")
except ServiceError as e:
print(f"Error fetching boot volume ID: {e}")
raise
def create_boot_volume_backup(boot_volume_id,backup_name):
"""Creates a full backup of the boot volume and waits for completion."""
print(boot_volume_id)
try:
# Create backup
create_boot_volume_backup_response = block_storage_client.create_boot_volume_backup(
oci.core.models.CreateBootVolumeBackupDetails(
display_name=backup_name,
boot_volume_id=boot_volume_id,
type = "FULL"
)
)
#backup_response = block_storage_client.create_volume_backup(backup_details)
backup_id = create_boot_volume_backup_response.data.id
print(f"Backup initiated: {backup_id}")
# Wait for backup to complete
while True:
backup = block_storage_client.get_boot_volume_backup(backup_id).data
print
if backup.lifecycle_state == 'AVAILABLE':
print(f"Backup completed successfully: {backup_id}")
break
elif backup.lifecycle_state == 'FAILED':
raise Exception(f"Backup failed: {backup_id}")
time.sleep(10) # Poll every 10 seconds
return backup_id
except ServiceError as e:
print(f"Error creating backup: {e}")
raise
def trigger_os_management_hub_patch(instance_id,compartment_id):
"""Triggers an OS Management Hub patch job for a specific managed instance group."""
try:
time_next_execution = datetime.now(timezone.utc) + timedelta(minutes=1)
patch_job_details = oci.os_management_hub.models.CreateScheduledJobDetails(
display_name="Patch Update Job",
compartment_id = compartment_id,
managed_instance_ids = [instance_id],
schedule_type="ONETIME",
time_next_execution=time_next_execution,
operations = [
oci.os_management_hub.models.ScheduledJobOperation(
operation_type= 'INSTALL_ALL_WINDOWS_UPDATES')
]
)
patch_job_response = os_management_hub_client.create_scheduled_job(patch_job_details)
job_id = patch_job_response.data.id
print(f"Patch job scheduled: {job_id}")
# Wait for patch job completion
while True:
job = os_management_hub_client.get_scheduled_job(job_id).data
print(job)
#print("Patch installation in progress")
#print(job.lifecycle_state)
if len(job.work_request_ids) > 0:
#print(job.work_request_ids)
cjob_id = job.work_request_ids
wjob = os_management_hub_client_work.get_work_request(
work_request_id=cjob_id
)
wjob_data=wjob.data
if wjob_data.status == "SUCCEEDED":
print("Patch job completed successfully.")
break
elif wjob_data.status == "FAILED":
raise Exception(f"Patch job failed: {wjob}")
elif job.lifecycle_state == 'FAILED':
raise Exception(f"Patch job failed: {job_id}")
time.sleep(10) # Poll every 10 seconds
return job_id
except ServiceError as e:
print(f"Error triggering OS Management Hub patch job: {e}")
raise
def reboot_instance(instance_id):
"""Reboots the OCI instance."""
try:
print(f"Rebooting instance: {instance_id}")
compute_client.instance_action(instance_id, "SOFTRESET")
print(f"Instance reboot initiated.")
except ServiceError as e:
print(f"Error rebooting instance: {e}")
raise
def handler(ctx, data: io.BytesIO = None):
try:
logging.getLogger().info("function handler start")
COMPARTMENT_ID = "ocid1.compartment.oc1..xxxxxxxxxxxxxxxxxxxxxxx"
INSTANCE_NAME = instance_name
print(INSTANCE_NAME)
BACKUP_NAME = generate_backup_name()
instance_id, availability_domain = get_instance_id_by_name(COMPARTMENT_ID, INSTANCE_NAME)
boot_volume_id = get_boot_volume_id(COMPARTMENT_ID, instance_id, availability_domain)
backup_id = create_boot_volume_backup(boot_volume_id, BACKUP_NAME)
patch_job_id = trigger_os_management_hub_patch(instance_id,COMPARTMENT_ID)
except Exception as handler_error:
logging.getLogger().error(handler_error)
return response.Response(
ctx,
response_data=json.dumps({"status": "Patch Successful"}),
headers={"Content-Type": "application/json"}
)
Func.yaml
schema_version: 20180708
name: patchrolloutconfig01
version: 0.0.6
runtime: python
build_image: fnproject/python:3.11-dev
run_image: fnproject/python:3.11
entrypoint: /python/bin/fdk /function/func.py handler
memory: 256
Requirements.txt
fdk>=0.1.89
oci
In Next part we will schedule first function with OCI resource scheduler and validates all steps we performed till now.
Comments