top of page
  • Twitter Social Icon
  • LinkedIn Social Icon
  • Facebook Social Icon
Search

Intelligent Vulnerability Management On OCI(Part 03)

Writer's picture: Nikhil VermaNikhil Verma

In the first two parts of this discussion, I have thoroughly explored various prerequisites necessary for the successful deployment of automation and machine learning (ML) models. These prerequisites encompass a range of considerations, including the selection of appropriate tools, understanding the underlying infrastructure, and ensuring that all necessary dependencies are in place for the automation process to function seamlessly. Each of these elements plays a critical role in laying a solid foundation for the deployment pipeline, which ultimately enhances the efficiency and reliability of the automation and ML model operations.

In this third part, we will delve into the specifics of Oracle Cloud Infrastructure (OCI) functions, which are instrumental in driving the entire automation process that we are implementing. OCI functions allow for serverless computing, enabling developers to execute code in response to events without the need for provisioning or managing servers. In this context, I have leveraged two distinct OCI functions to streamline our operations. The first function is designed to conduct a comprehensive scan for all vulnerabilities present within the system. This is a crucial step in maintaining the integrity and security of our environment, as identifying vulnerabilities allows us to take proactive measures to mitigate potential risks.

Following the vulnerability assessment, the first function also schedules a second function that is responsible for performing an Instance Boot Volume Backup. This backup process is essential for ensuring data safety and recovery in case of any unforeseen issues during the upgrade process. Additionally, the second function is tasked with scheduling an update job, which will systematically apply necessary updates to the OCI instances. This ensures that all instances are running the latest software versions, thereby enhancing performance and security. Once the updates have been successfully installed, the function will also initiate a restart of the OCI instance. This restart is a critical step, as it allows the new configurations and updates to take full effect, ensuring that the instance operates optimally.

Moreover, my first function is designed to schedule multiple individual functions concurrently, which is a significant advantage. By orchestrating these functions to run in parallel, we can ensure that all instances are upgraded simultaneously, thereby reducing the overall time required for the upgrade process. This parallel execution not only improves efficiency but also minimizes downtime, allowing for a smoother transition and less disruption to ongoing operations. Through this approach, we can effectively manage the complexity of the upgrade process while maintaining high standards of reliability and performance across our OCI instances.



Flowchart
Flowchart

Let's explore the first function in detail:

To begin with, we need to import all the necessary libraries that will be crucial for the successful execution of our function. It is imperative that we specifically choose the numpy library, as its absence will lead to the failure of this function. Numpy provides powerful tools for numerical computing, which are essential for handling large datasets efficiently and performing mathematical operations seamlessly.

Now, let's delve into the various components and functions that are integral to our implementation:

get_boot_volume_id: This function is responsible for retrieving the unique identifier associated with the boot volume of a particular instance. The boot volume is critical as it contains the operating system and essential files required for the instance to run.

boot_volume_details: This function provides detailed information about the boot volume, including its size, type, and state. Understanding these details is vital for monitoring and managing the resources allocated to an instance.

get_running_windows_instances: This function identifies and lists all the currently running instances that are operating on the Windows platform. It is particularly useful for administrators who need to manage Windows environments and ensure that all instances are functioning correctly.

clean_os_version: This function cleans and standardizes the operating system version strings to ensure consistency across different instances. This is particularly important when comparing versions or applying updates, as discrepancies in version formatting can lead to errors.

get_instance_details: This function retrieves comprehensive details about a specific instance, including its configuration, status, and resource usage. Having access to these details allows for better management and optimization of cloud resources.

fetch_metric: This function is designed to gather various performance metrics related to the instance. These metrics can include CPU usage, memory consumption, and disk I/O statistics, which are crucial for performance monitoring and troubleshooting.

storage_fetch_metric: Similar to the fetch_metric function, this one specifically focuses on retrieving metrics related to free storage space.

list_available_patches_osmh: This function lists all the available patches for the operating system, which is essential for maintaining security and stability. Keeping the system updated with the latest patches helps protect against vulnerabilities and bugs.

get_msrc_api_url: This function retrieves the API URL for the Microsoft Security Response Center (MSRC), which is a key resource for obtaining information about security updates and patches. Accessing this API allows for automated retrieval of security-related data.

get_cve_from_kb: This function extracts Common Vulnerabilities and Exposures (CVE) information from the Microsoft Knowledge Base (KB). This is important for organizations that need to assess their security posture and address known vulnerabilities.

get_highest_cvss_score: This function calculates and returns the highest Common Vulnerability Scoring System (CVSS) score from a list of vulnerabilities. The CVSS score is a crucial metric for prioritizing which vulnerabilities to address first based on their severity.

convert_size_to_mb: This utility function converts sizes from bytes to megabytes, providing a more user-friendly representation of data sizes. This is particularly useful when dealing with storage metrics or data transfer sizes.

get_patch_size: This function determines the size of a specific patch that needs to be applied to an instance. Knowing the patch size helps in planning for bandwidth usage and storage requirements during the update process.

get_last_patch_status: This function checks and retrieves the status of the last patch that was applied to an instance. Understanding the patch status is critical for ensuring that the system is up-to-date and secure.

create_oci_function: This function is used to create an Oracle Cloud Infrastructure (OCI) function, which can be utilized for various serverless operations. This allows for the execution of code without the need for managing servers, thereby enhancing scalability and efficiency.

create_schedule: Finally, this function is responsible for creating a schedule for tasks such as applying patches or running maintenance scripts. Scheduling these tasks helps automate processes, ensuring that they are performed regularly and without manual intervention.


import io
import json
import logging
import oci
import requests
import re
import pandas
from datetime import datetime, timedelta, timezone
from oci.exceptions import ServiceError
from bs4 import BeautifulSoup
from oci.signer import Signer
import random
from io import StringIO
import numpy

from fdk import response

def get_boot_volume_id(compartment_id, instance_id,compute_client,availability_domain):
    """Fetches the Boot Volume ID attached to the instance."""
    try:
        attachments = compute_client.list_boot_volume_attachments(
            compartment_id=compartment_id, availability_domain=availability_domain
        ).data

        for attachment in attachments:
            if attachment.instance_id == instance_id:
                return attachment.boot_volume_id

        raise Exception(f"No boot volume found for instance {instance_id}")
    except ServiceError as e:
        print(f"Error fetching boot volume ID: {e}")
        raise


def boot_volume_details(boot_volume_id,block_storage_client):
    """Fetches the Boot Volume details."""
    try:
        boot_volume = block_storage_client.get_boot_volume(boot_volume_id).data
        return boot_volume
    except ServiceError as e:
        print(f"Error fetching boot volume details: {e}")
        raise


def get_running_windows_instances(compute_client,block_storage_client,compartment_id):
    """Fetches all running Windows instances and their details."""
    instances = compute_client.list_instances(compartment_id).data
    powered_on_instances = [inst for inst in instances if inst.lifecycle_state == "RUNNING"]
    print(f"Total Running Instances: {len(powered_on_instances)}")

    windows_instances = []

    for inst in powered_on_instances:
        try:
            image = compute_client.get_image(inst.image_id).data
            os_name = image.operating_system.lower()
            os_version = image.operating_system_version
            availability_domain = inst.availability_domain

            if "windows" in os_name:
                boot_id = get_boot_volume_id(compartment_id, inst.id,compute_client,availability_domain)
                boot_details = boot_volume_details(boot_id,block_storage_client).size_in_mbs
                windows_instances.append((inst.display_name, inst.id, os_version, availability_domain, boot_details))

        except oci.exceptions.ServiceError as e:
            print(f"Error fetching image details for instance {inst.display_name}: {e}")

    return windows_instances


def clean_os_version(os_version):
    """Extracts the main OS version (e.g., 'Server 2022' from 'Server 2022 Standard')."""
    match = re.match(r"Server (2022|2019|2016)", os_version)
    return match.group(1) if match else os_version


def get_instance_details(compute_client, instance_ocid):
    """Fetches instance metadata and tags."""
    instance = compute_client.get_instance(instance_ocid).data
    image_data = compute_client.get_image(instance.image_id).data
    freeform_tags = instance.freeform_tags
    def clean_value(value):
        return value.strip("\"") if value else None
    return {
        "Instance Name": instance.display_name,
        "Instance OCID": instance_ocid,
        "OS Name": image_data.operating_system,
        "OS Type": clean_os_version(image_data.operating_system_version),
        "Application Owner": freeform_tags.get("application-owner"),
        "Downtime Start": clean_value(freeform_tags.get("downtime-start-time")),
        "Downtime End": clean_value(freeform_tags.get("downtime-end-time")),
        "OS Owner": freeform_tags.get("os-owner"),
        "Defined Tags": instance.defined_tags
    }


def fetch_metric(metric_name, namespace, hostname, monitoring_client,compartment_id, statistic="mean"):
    """Fetches performance metrics from OCI Monitoring."""
    try:
        response = monitoring_client.summarize_metrics_data(
            compartment_id=compartment_id,
            summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
                namespace=namespace,
                query=f"{metric_name}[15m]{{resourceDisplayName = \"{hostname}\"}}.{statistic}()"
            )
        )

        if response.data:
            return round(response.data[0].aggregated_datapoints[-1].value, 2)
        else:
            return None
    except Exception as e:
        print(f"Error fetching {metric_name}: {e}")
        return None


def storage_fetch_metric(metric_name, namespace, hostname,monitoring_client,compartment_id, statistic="mean"):
    """Fetches storage metrics from OCI Monitoring."""
    try:
        response = monitoring_client.summarize_metrics_data(
            compartment_id=compartment_id,
            summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
                namespace=namespace,
                query=f"{metric_name}[5m]{{agentHostName = \"{hostname}\"}}.{statistic}()"
            )
        )

        if response.data:
            return float(response.data[0].aggregated_datapoints[-1].value)
        else:
            return None
    except Exception as e:
        print(f"Error fetching {metric_name}: {e}")
        return None
    
def list_available_patches_osmh(instance_id,osmh_client):


    # Fetch available updates for the instance
    response = osmh_client.list_managed_instance_available_windows_updates(
        managed_instance_id = instance_id
        )

    patches = response.data
    patch_list = []
    for patch in patches.items:
        patch_list.append(patch.name)
    return patch_list

def extract_kb_numbers(package_list):

    kb_pattern = r"KB\d+"  # Regular expression to match KB numbers
    kb_numbers = []
    for package in package_list:
        match = re.search(kb_pattern, package)
        if match:
            kb_numbers.append(match.group())

    return kb_numbers

def get_msrc_api_url():
    today = datetime.today()
    year = today.year
    month = today.month

    # Find the first Tuesday of the month
    first_day = datetime(year, month, 1)
    first_tuesday = first_day + timedelta(days=(1 - first_day.weekday() + 7) % 7)

    # Second Tuesday of the month
    second_tuesday = first_tuesday + timedelta(days=7)

    # If today is before the second Tuesday, go to the previous month's Patch Tuesday
    if today < second_tuesday:
        previous_month = month - 1 if month > 1 else 12
        year = year if month > 1 else year - 1
    else:
        previous_month = month

    # Format month as "Jan", "Feb", etc.
    month_str = datetime(year, previous_month, 1).strftime("%b")

    return f"https://api.msrc.microsoft.com/cvrf/v3.0/cvrf/{year}-{month_str}"

def get_cve_from_kb(kb_numbers, os_name):

    current_year = datetime.now().year
    current_month = datetime.now().strftime("%b")  # e.g., "Feb"

    api_url = get_msrc_api_url()
    headers = {"Accept": "application/json"}

    cve_list = set()

    try:
        response = requests.get(api_url, headers=headers)
        response.raise_for_status()
        
        data = response.json()
        vulnerabilities = data.get("Vulnerability", [])

        for vuln in vulnerabilities:
            for remediation in vuln.get("Remediations", []):
                remediation_value = remediation.get("Description", {}).get("Value")  # Extract KB number
                
                if remediation_value and any(kb in remediation_value for kb in kb_numbers):
                    # Check if OS matches in "DocumentNotes" or "ProductTree"
                    document_notes = data.get("DocumentNotes", [])
                    product_names = data.get("ProductTree", {}).get("FullProductName", [])

                    os_match = any(os_name.lower() in note.get("Value", "").lower() for note in document_notes)
                    os_match |= any(os_name.lower() in product.get("Value", "").lower() for product in product_names)

                    if os_match:
                        cve_list.add(vuln.get("CVE"))

        return list(cve_list) if cve_list else None

    except requests.exceptions.RequestException as e:
        print(f"Error fetching data: {e}")
        return None


def get_highest_cvss_score(cve_list):

    if not cve_list:
        return None

    current_year = datetime.now().year
    current_month = datetime.now().strftime("%b")

    api_url = get_msrc_api_url()
    headers = {"Accept": "application/json"}

    highest_score = 0

    try:
        response = requests.get(api_url, headers=headers)
        response.raise_for_status()

        data = response.json()
        vulnerabilities = data.get("Vulnerability", [])

        for vuln in vulnerabilities:
            if vuln.get("CVE") in cve_list:
                for score_set in vuln.get("CVSSScoreSets", []):
                    base_score = score_set.get("BaseScore")
                    if base_score is not None:
                        highest_score = max(highest_score, base_score)

        return highest_score if highest_score > 0 else None

    except requests.exceptions.RequestException as e:
        print(f"Error fetching CVSS scores: {e}")
        return None
    
def convert_size_to_mb(size_str):
    """
    Convert patch size from KB, MB, GB to MB.
    """
    size_str = size_str.upper()
    size_match = re.search(r"([\d.]+)\s*(KB|MB|GB)", size_str)
    
    if size_match:
        size_value, size_unit = float(size_match.group(1)), size_match.group(2)
        
        if size_unit == "KB":
            return size_value / 1024  # Convert KB to MB
        elif size_unit == "GB":
            return size_value * 1024  # Convert GB to MB
        return size_value  # MB remains the same
    
    return None  # Return None if no valid size found

def get_patch_size(kb_numbers):

    patch_sizes = {}
    headers = {"User-Agent": "Mozilla/5.0"}
    
    for kb in kb_numbers:
        url = f"https://www.catalog.update.microsoft.com/Search.aspx?q={kb}"
        response = requests.get(url, headers=headers)
        if response.status_code != 200:
            continue  # Skip this KB if the request fails

        soup = BeautifulSoup(response.text, "html.parser")
        size_texts = soup.find_all(string=lambda text: text and ("MB" in text or "KB" in text or "GB" in text))
        
        if size_texts and len(size_texts) > 2:
            size_mb = convert_size_to_mb(size_texts[2].strip())
            if size_mb is not None:
                patch_sizes[kb] = size_mb
        
    return patch_sizes

def get_last_patch_status(compartment_id, instance_id, osmh_work_client):


    try:


        # Fetch work requests related to the instance
        response = osmh_work_client.list_work_requests(
            compartment_id=compartment_id,
            resource_id=instance_id
        )

        # Get the list of work requests
        work_requests = response.data.items

        if not work_requests:
            print("No patch history found. Marking as null.")
            return "Not Available", "Not Available"  # No patch history

        # Sort by latest completion time (descending order)
        work_requests.sort(key=lambda wr: wr.time_created, reverse=True)

        # Get the most recent work request status
        last_patch_date = work_requests[0].time_created
        last_patch_status = work_requests[0].status

        return last_patch_date, last_patch_status

    except oci.exceptions.ServiceError as e:
        print(f"OCI Service Error: {e}")
    except Exception as e:
        print(f"Unexpected Error: {e}")

    return "Not Available", "Not Available"  # Return None if there's an error

def create_oci_function(functions_client, display_name, vm_name):
    try:
        application_id = "ocid1.fnapp.oc1.iad.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
        
        function_details = oci.functions.models.CreateFunctionDetails(
            display_name=display_name,
            application_id=application_id,
            image="iad.ocir.io/XXXXXXXXXXXXX/XXXXXXXX/patchrolloutconfig01:0.0.6",
            memory_in_mbs=256,
            timeout_in_seconds=300,
            config={"instance_name": vm_name}
        )
        
        response = functions_client.create_function(function_details)
        function_id = response.data.id
        print(f"Function created successfully: {function_id}")
        return function_id  
    
    except oci.exceptions.ServiceError as e:
        print(f"OCI Service Error: {e.message}")
    except Exception as e:
        print(f"General Error: {str(e)}")
    
    return None

def create_schedule(resource_scheduler_client, start_time, display_name, function_id):
    try:
        # Define the required parameters
        compartment_id = "ocid1.tenancy.oc1..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
        action = "START_RESOURCE"
        recurrence_details = "FREQ=DAILY;COUNT=1"
        recurrence_type = "ICAL"
        d1_id = function_id

        # Create schedule details
        schedule_details = oci.resource_scheduler.models.CreateScheduleDetails(
            compartment_id=compartment_id,
            action=action,
            recurrence_details=recurrence_details,
            recurrence_type=recurrence_type,
            display_name=display_name,
            resources=[oci.resource_scheduler.models.Resource(id=function_id)],
            time_starts=start_time
        )

        # Call OCI API to create the schedule
        create_schedule_response = resource_scheduler_client.create_schedule(
            create_schedule_details=schedule_details
        )
        
        print(f"Schedule '{display_name}' created successfully: {create_schedule_response.data}")
        return create_schedule_response.data

    except oci.exceptions.ServiceError as e:
        print(f"OCI Service Error: {e.message}")
    except Exception as e:
        print(f"General Error: {str(e)}")
    
    return None

def parse_quartz_cron(cron_expr):

    pattern = r"(\d+)\s+(\d+)\s+\?\s+\*\s+(\d+)#(\d+)"
    match = re.match(pattern, cron_expr)

    if not match:
        raise ValueError("Invalid Quartz cron expression. Expected format: '0 20 ? * 6#3'")

    minute = int(match.group(1))
    hour = int(match.group(2))
    quartz_day_of_week = int(match.group(3))  # Quartz: 0=Sunday, 6=Saturday
    nth_occurrence = int(match.group(4))  # 1st, 2nd, 3rd, 4th, 5th

    # Convert Quartz weekday (0=Sunday, 6=Saturday) to Python weekday (0=Monday, 6=Sunday)
    python_day_of_week = (quartz_day_of_week - 1) % 7  # Sunday (0) in Quartz → Python's Sunday (6)

    return minute, hour, python_day_of_week, nth_occurrence

def get_nth_weekday_of_month(year, month, day_of_week, nth_occurrence):

    first_day = datetime(year, month, 1)

    # List all occurrences of the given weekday in the month
    weekdays = [
        first_day + timedelta(days=i) for i in range(31)
        if (first_day + timedelta(days=i)).weekday() == day_of_week and (first_day + timedelta(days=i)).month == month
    ]

    # If nth occurrence does not exist, return None
    return weekdays[nth_occurrence - 1] if len(weekdays) >= nth_occurrence else None

def get_next_downtime(cron_expr):

    minute, hour, day_of_week, nth_occurrence = parse_quartz_cron(cron_expr)

    now = pandas.Timestamp.utcnow()
    
    # Ensure `now` is timezone-aware
    if now.tzinfo is None:
        now = now.tz_localize("UTC")

    current_year, current_month = now.year, now.month

    # Find the next occurrence in this month
    next_downtime = get_nth_weekday_of_month(current_year, current_month, day_of_week, nth_occurrence)

    # If this month doesn't have the required nth occurrence OR it's in the past, move to next month
    if not next_downtime or pandas.Timestamp(next_downtime).tz_localize("UTC") < now:
        next_month = current_month + 1 if current_month < 12 else 1
        next_year = current_year if current_month < 12 else current_year + 1
        next_downtime = get_nth_weekday_of_month(next_year, next_month, day_of_week, nth_occurrence)

    if not next_downtime:
        raise ValueError("Invalid Nth occurrence, no such date exists in this month or next.")

    # Set the exact time and localize to UTC
    next_downtime = pandas.Timestamp(next_downtime.replace(hour=hour, minute=minute, second=0))

    if next_downtime.tzinfo is None:
        next_downtime = next_downtime.tz_localize("UTC")
    else:
        next_downtime = next_downtime.tz_convert("UTC")

    return next_downtime


def handler(ctx, data: io.BytesIO = None):

    try:
        signer = oci.auth.signers.get_resource_principals_signer()
        compartment_id = "ocid1.compartment.oc1..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
        compute_client = oci.core.ComputeClient(config={},signer=signer)
        monitoring_client = oci.monitoring.MonitoringClient(config={},signer=signer)
        block_storage_client = oci.core.BlockstorageClient(config={},signer=signer)
        osmh_client = oci.os_management_hub.ManagedInstanceClient(config={},signer=signer)
        osmh_schedule_client=oci.os_management_hub.WorkRequestClient(config={},signer=signer)
        osmh_work_client = oci.os_management_hub.WorkRequestClient(config={},signer=signer)
        functions_client = oci.functions.FunctionsManagementClient(config={}, signer=signer)
        resource_scheduler_client = oci.resource_scheduler.ScheduleClient(config={}, signer=signer)
        namespace_compute = "oci_computeagent"
        namespace_storage = "oci_managementagent"
    
        # Fetch all running Windows instances
        windows_instances = get_running_windows_instances(compute_client,block_storage_client, compartment_id)
    
    
        # Store details in Pandas DataFrame
        data = []
    
        for instance_name, instance_ocid, os_version, availability_domain, boot_volume_size in windows_instances:
            instance_details = get_instance_details(compute_client, instance_ocid)
            patch_details = list_available_patches_osmh(instance_ocid,osmh_client)
            kb_numbers = extract_kb_numbers(patch_details)
            patch_sizes = get_patch_size(kb_numbers)
            rm_kb = [kb.replace("KB", "") for kb in kb_numbers]
            os_name = instance_details["OS Name"]+" "+"Server"+" "+ instance_details["OS Type"]
            cve_ids = get_cve_from_kb(rm_kb, os_name)
            if cve_ids:
               highest_cvss = get_highest_cvss_score(cve_ids)
               print(f"Highest CVSS Base Score for {kb_numbers} on {os_name}: {highest_cvss}")
            else:
                  highest_cvss = None
            if patch_sizes:
                # Get the KB with the maximum patch size
                max_kb = max(patch_sizes, key=patch_sizes.get)
                max_size = patch_sizes[max_kb]
            else:
                print("No valid patch sizes found.")
                max_size = None
            hostname = instance_name + ".horizonvdi.local"
            cpu_utilization = fetch_metric("CpuUtilization", namespace_compute, instance_name,monitoring_client,compartment_id)
            memory_utilization = fetch_metric("MemoryUtilization", namespace_compute, instance_name,monitoring_client,compartment_id)
            free_disk_utilization = storage_fetch_metric("diskUsageFree", namespace_storage, hostname,monitoring_client,compartment_id)
            last_patch_date, last_patch_status = get_last_patch_status(compartment_id, instance_ocid,osmh_work_client)
            data.append({
                "instance_id": instance_name,
                "os_type": instance_details["OS Name"],
                "os_version": instance_details["OS Type"],
                "patch_id": kb_numbers,
                "patch_severity": highest_cvss or 0,
                "patch_size": max_size or 0,
                "Availability Domain": availability_domain,
                "Boot Volume Size (MB)": boot_volume_size,
                "OS Owner": instance_details["OS Owner"],
                "Application Owner": instance_details["Application Owner"],
                "downtime_start": instance_details["Downtime Start"],
                "downtime_end": instance_details["Downtime End"],
                "cpu_usage": cpu_utilization,
                "memory_usage": memory_utilization,
                "Free Disk (GB)": free_disk_utilization,
                "network_latency": 5,
                "scan_date": datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f"),
                "last_patch_date": last_patch_date,
                "last_patch_status": last_patch_status
            })
    
        # Convert data to DataFrame
        df = pandas.DataFrame(data)
        df["disk_usage"] = round((100 - (df["Free Disk (GB)"]/df["Boot Volume Size (MB)"])*100),2)
        df['downtime_start'] = df['downtime_start'].apply(get_next_downtime)
        df['downtime_end'] = df['downtime_end'].apply(get_next_downtime)
        df['downtime_start'] = pandas.to_datetime(df['downtime_start'], format='%Y-%m-%d %H:%M %Z')
        df['downtime_end'] = pandas.to_datetime(df['downtime_end'], format='%Y-%m-%d %H:%M %Z')
        
        # Calculate duration
        df['downtime_duration'] = df['downtime_end'] - df['downtime_start']
        
        # Convert duration to minutes or hours if needed
        df['downtime_duration'] = df['downtime_duration'].dt.total_seconds() / 60
        # Delete rows where "Patches" column is empty
        df = df[df["patch_id"].astype(bool)]
        df['last_patch_date'] = df['last_patch_date'].apply(lambda x: datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f") if x == 'Not Available' else x)
        # Convert all 'last_patch_date' values to datetime and strip timezone info
        df['last_patch_date'] = pandas.to_datetime(df['last_patch_date'],utc=True, errors='coerce')
        
        # Format the datetime values to match the desired format
        df['last_patch_date'] = df['last_patch_date'].dt.strftime('%Y-%m-%d %H:%M:%S.%f')
        df['os_version'] = df['os_version'].astype('int64')
        df['downtime_start'] = df['downtime_start'].astype(str)
        df['downtime_end'] = df['downtime_end'].astype(str)
        final_df = df[["instance_id", "os_type", "os_version", "patch_id", "patch_severity", "cpu_usage","memory_usage","disk_usage","network_latency", "patch_size", "scan_date","last_patch_date","downtime_start","downtime_end","downtime_duration","last_patch_status"]]
        final_df = final_df.reset_index(drop=True)
        print(final_df)
        data = final_df.to_dict(orient="records")
        endpoint = "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.xxxxxxxxxxxxxxxxxxxxxxxxxx/predict"
        result=requests.post(endpoint, json=data, auth=signer).json()
        print(result)
        df_result = pandas.read_json(StringIO(result))
        df = pandas.concat([final_df,df_result["Prediction"]],axis=1)
        df = df[["instance_id","downtime_start","Prediction"]]
        filtered_df = df[df["Prediction"] == "Success"] 
        now = datetime.now(timezone.utc)
        filtered_df['downtime_start'] = pandas.to_datetime(filtered_df['downtime_start'])
        within_48_hours = filtered_df[(filtered_df['downtime_start'] >= now) & (filtered_df['downtime_start'] <= now + timedelta(hours=48))]
        t1 = within_48_hours.to_dict(orient="records")
        for item in t1:
            vm_name = item.get("instance_id")
            start_time = item.get("downtime_start")
            display_name = f"Schedule-{random.randint(1000, 9999)}"
            function_id = create_oci_function(functions_client, display_name, vm_name)
            schedule_data = create_schedule(resource_scheduler_client, start_time, display_name, function_id)
            print(schedule_data)
        
    except Exception as handler_error:
        logging.getLogger().error(handler_error)

    return response.Response(
        ctx, 
        response_data=json.dumps({"status": "Patch Scheduled"}),
        headers={"Content-Type": "application/json"}
        )

Func.yaml

schema_version: 20180708
name: intelligentpatchupdate01
version: 0.0.16
runtime: python
build_image: fnproject/python:3.11-dev
run_image: fnproject/python:3.11
entrypoint: /python/bin/fdk /function/func.py handler
memory: 512

Requirements.txt

fdk>=0.1.89
oci
pandas==2.2.3
beautifulsoup4==4.12.3
requests
numpy==1.26.4

Let's explore the second function in greater detail, examining its various components and their significance in the overall process.

generate_backup_name

This function is crucial as it creates a unique and descriptive name for the backup that is about to be generated. By incorporating elements such as the instance ID, timestamp, and perhaps the type of backup being performed, this function ensures that each backup can be easily identified and retrieved later. The naming convention can also aid in organizing backups chronologically or by instance type, which is particularly useful for system administrators managing multiple instances.

get_instance_id_by_name

Next, the function get_instance_id_by_name plays a vital role in identifying the specific instance that needs to be backed up. By taking the name of the instance as an input, this function queries the relevant cloud infrastructure to retrieve the unique instance ID associated with that name. This step is essential as it acts as a bridge between human-readable names and the machine-readable identifiers that are necessary for executing further operations.

get_boot_volume_id

Following this, we have the function get_boot_volume_id, which is responsible for fetching the boot volume ID of the specified instance. The boot volume is the storage that contains the operating system and is critical for the instance's operation. By obtaining the boot volume ID, this function ensures that the subsequent backup processes are targeting the correct storage, thus safeguarding the integrity of the backup.

create_boot_volume_backup

The function create_boot_volume_backup is where the actual backup process takes place. Utilizing the boot volume ID obtained earlier, this function initiates the procedure to create a backup of the boot volume. This may involve copying the data to a secure storage location, ensuring that it is preserved in a state that can be restored later if needed. The efficiency and reliability of this function are paramount, as it directly impacts the recoverability of the instance in case of failure or data loss.

trigger_os_management_hub_patch

Once the backup is successfully created, the function trigger_os_management_hub_patch is invoked. This function is designed to manage the operating system updates and patches for the instance. By triggering the OS management hub, it ensures that the instance is up-to-date with the latest security patches and features, which is critical for maintaining the overall health and security of the system. This step is particularly important in environments where security vulnerabilities need to be mitigated promptly.

reboot_instance

Finally, the function reboot_instance is executed to restart the instance after the updates have been applied. Rebooting is often necessary to finalize the installation of updates and ensure that all changes take effect properly. This function not only helps in refreshing the system but also in applying any configuration changes that may have occurred during the update process. The rebooting phase is a critical step in ensuring that the instance runs smoothly and efficiently after all maintenance tasks have been completed.


import io
import json
import logging
import oci
import time
from oci.exceptions import ServiceError
from datetime import datetime, timezone, timedelta
from oci.signer import Signer
import os 

from fdk import response

signer = oci.auth.signers.get_resource_principals_signer()
compute_client = oci.core.ComputeClient(config={}, signer=signer)
block_storage_client = oci.core.BlockstorageClient(config={}, signer=signer)
os_management_hub_client = oci.os_management_hub.ScheduledJobClient(config={}, signer=signer)
os_management_hub_client_work = oci.os_management_hub.WorkRequestClient(config={}, signer=signer)
compute_attachments_client = oci.core.ComputeManagementClient(config={}, signer=signer)
instance_name = os.getenv("instance_name")
print(instance_name)
if not instance_name:
    raise ValueError("ERROR: Missing configuration key instance_name")

def generate_backup_name(prefix="OS_Boot_Volume_Backup"):
    timestamp = datetime.utcnow().strftime("%Y%m%d")  # Format: YYYYMMDD
    return f"{prefix}_{timestamp}"

def get_instance_id_by_name(compartment_id, instance_name):
    """Fetches the instance ID based on the instance name."""
    try:
        instances = compute_client.list_instances(compartment_id).data
        for instance in instances:
            if instance.display_name == instance_name:
                print(f"Found Instance: {instance_name} (ID: {instance.id})")
                return instance.id, instance.availability_domain
        raise Exception(f"Instance '{instance_name}' not found in compartment {compartment_id}")
    except ServiceError as e:
        print(f"Error fetching instance: {e}")
        raise

def get_boot_volume_id(compartment_id, instance_id, availability_domain):
    """Fetches the Boot Volume ID attached to the instance."""
    try:
        attachments = compute_client.list_boot_volume_attachments(
            compartment_id=compartment_id, availability_domain=availability_domain
        ).data

        for attachment in attachments:
            if attachment.instance_id == instance_id:
                print(f"Boot Volume ID: {attachment.boot_volume_id}")
                return attachment.boot_volume_id

        raise Exception(f"No boot volume found for instance {instance_id}")
    except ServiceError as e:
        print(f"Error fetching boot volume ID: {e}")
        raise

def create_boot_volume_backup(boot_volume_id,backup_name):
    """Creates a full backup of the boot volume and waits for completion."""
    print(boot_volume_id)
    try:
        # Create backup
        create_boot_volume_backup_response = block_storage_client.create_boot_volume_backup(
            oci.core.models.CreateBootVolumeBackupDetails(
                display_name=backup_name,
                boot_volume_id=boot_volume_id,
                type = "FULL"
                )
        )
        #backup_response = block_storage_client.create_volume_backup(backup_details)
        backup_id = create_boot_volume_backup_response.data.id
        print(f"Backup initiated: {backup_id}")

        # Wait for backup to complete
        while True:
            backup = block_storage_client.get_boot_volume_backup(backup_id).data
            print
            if backup.lifecycle_state == 'AVAILABLE':
                print(f"Backup completed successfully: {backup_id}")
                break
            elif backup.lifecycle_state == 'FAILED':
                raise Exception(f"Backup failed: {backup_id}")
            time.sleep(10)  # Poll every 10 seconds

        return backup_id

    except ServiceError as e:
        print(f"Error creating backup: {e}")
        raise

def trigger_os_management_hub_patch(instance_id,compartment_id):
    """Triggers an OS Management Hub patch job for a specific managed instance group."""
    try:
        time_next_execution = datetime.now(timezone.utc) + timedelta(minutes=1)
        patch_job_details = oci.os_management_hub.models.CreateScheduledJobDetails(
            display_name="Patch Update Job",
            compartment_id = compartment_id,
            managed_instance_ids = [instance_id],
            schedule_type="ONETIME",
            time_next_execution=time_next_execution,
            operations = [
                oci.os_management_hub.models.ScheduledJobOperation(
                    operation_type= 'INSTALL_ALL_WINDOWS_UPDATES')
            ]

        )

        patch_job_response = os_management_hub_client.create_scheduled_job(patch_job_details)
        job_id = patch_job_response.data.id
        print(f"Patch job scheduled: {job_id}")

        # Wait for patch job completion
        while True:
            job = os_management_hub_client.get_scheduled_job(job_id).data
            print(job)
            #print("Patch installation in progress")
            #print(job.lifecycle_state)
            if len(job.work_request_ids) > 0:
                #print(job.work_request_ids)
                cjob_id = job.work_request_ids
                wjob = os_management_hub_client_work.get_work_request(
                    work_request_id=cjob_id
                )
                wjob_data=wjob.data
                if wjob_data.status == "SUCCEEDED":
                    print("Patch job completed successfully.")
                    break
                elif wjob_data.status == "FAILED":
                    raise Exception(f"Patch job failed: {wjob}")
            elif job.lifecycle_state == 'FAILED':
                raise Exception(f"Patch job failed: {job_id}")
            time.sleep(10)  # Poll every 10 seconds

        return job_id

    except ServiceError as e:
        print(f"Error triggering OS Management Hub patch job: {e}")
        raise

def reboot_instance(instance_id):
    """Reboots the OCI instance."""
    try:
        print(f"Rebooting instance: {instance_id}")
        compute_client.instance_action(instance_id, "SOFTRESET")
        print(f"Instance reboot initiated.")
    except ServiceError as e:
        print(f"Error rebooting instance: {e}")
        raise

def handler(ctx, data: io.BytesIO = None):
    try:
        logging.getLogger().info("function handler start")
        COMPARTMENT_ID = "ocid1.compartment.oc1..xxxxxxxxxxxxxxxxxxxxxxx"
        INSTANCE_NAME = instance_name
        print(INSTANCE_NAME)
        BACKUP_NAME = generate_backup_name()
        instance_id, availability_domain = get_instance_id_by_name(COMPARTMENT_ID, INSTANCE_NAME)
        boot_volume_id = get_boot_volume_id(COMPARTMENT_ID, instance_id, availability_domain)
        backup_id = create_boot_volume_backup(boot_volume_id, BACKUP_NAME)
        patch_job_id = trigger_os_management_hub_patch(instance_id,COMPARTMENT_ID)

    except Exception as handler_error:
        logging.getLogger().error(handler_error)

    return response.Response(
        ctx, 
        response_data=json.dumps({"status": "Patch Successful"}),
        headers={"Content-Type": "application/json"}
    )

Func.yaml

schema_version: 20180708
name: patchrolloutconfig01
version: 0.0.6
runtime: python
build_image: fnproject/python:3.11-dev
run_image: fnproject/python:3.11
entrypoint: /python/bin/fdk /function/func.py handler
memory: 256

Requirements.txt

fdk>=0.1.89
oci

In Next part we will schedule first function with OCI resource scheduler and validates all steps we performed till now.

Comments


  • Grey Twitter Icon
  • Grey LinkedIn Icon
  • Grey Facebook Icon
bottom of page