Azure Cloud

How to monitor Azure Databricks with Grafana.

Azure Databricks and Grafana

Let's set up Databricks monitoring with Grafana

I assume you have a Databricks cluster running.

First, we'll create a new resource group called "grafana-databricks" to put in everything we make. It'll help you clear everything and manage the costs.

Now you need: Docker and two CLIs working locally:

For monitoring, we use this library: spark-monitoring

Send logs to the Log Analytics workspace

Please create a new Log Analytics workspace, and remember to use the resource group we set up earlier.

az monitor log-analytics workspace create --resource-group <myGroup> \
       --workspace-name <myWorkspace>

Once created, go to the Agent Management menu and note the ID workspace and the primary key we'll use.

Clone the project https://github.com/mspnp/spark-monitoring and run one of the following commands, depending on your operating system. https://github.com/mspnp/spark-monitoring#option-1-docker

During execution, change to the src/spark-listeners/scripts/spark-monitoring.sh directory and configure the log workspace:

export LOG_ANALYTICS_WORKSPACE_ID=
export LOG_ANALYTICS_WORKSPACE_KEY=

Also, change the following values, which you can easily find in your cloud.

export AZ_SUBSCRIPTION_ID=11111111-5c17-4032-ae54-fc33d56047c2
export AZ_RSRC_GRP_NAME=myAzResourceGroup
export AZ_RSRC_PROV_NAMESPACE=Microsoft.Databricks
export AZ_RSRC_TYPE=workspaces
export AZ_RSRC_NAME=myDatabricks

The script will create two new JAR files in the src/target directory if you wait for Docker to run. We upload these files to the Databricks cluster:

First, we upload the sh we set up earlier with Databricks CLI:

dbfs cp src/spark-listeners/scripts/spark-monitoring.sh dbfs:/databricks/spark-monitoring/spark-monitoring.sh

Now we upload the JAR files:

dbfs cp --overwrite --recursive src/target/ dbfs:/databricks/spark-monitoring/

Enable the library in Cluster

Inside your Databricks, go to the menu Computer, and select the Cluster you want monitoring.

Edit the Cluster

Go to advanced options

Go to Init Scripts

Select DBFS and write the value: dbfs:/databricks/spark-monitoring/spark-monitoring.sh

Save and restart the Cluster

Check it is working

Go to your Log Analytics workspace query and use one query like that.

let results=SparkListenerEvent_CL
| where TimeGenerated > ago(7d)
| where  Event_s == "SparkListenerJobStart"
| extend metricsns=column_ifexists("Properties_spark_metrics_namespace_s",Properties_spark_app_id_s)
| extend apptag=iif(isnotempty(metricsns),metricsns,Properties_spark_app_id_s)
| project Job_ID_d,apptag,Properties_spark_databricks_clusterUsageTags_clusterName_s,TimeGenerated
| order by TimeGenerated asc nulls last
| join kind= inner (
    SparkListenerEvent_CL
    | where Event_s == "SparkListenerJobEnd"
    | where Job_Result_Result_s == "JobSucceeded"
    | project Event_s,Job_ID_d,TimeGenerated
) on Job_ID_d;
results
| extend slice=strcat("#JobsCompleted ",Properties_spark_databricks_clusterUsageTags_clusterName_s,"-",apptag)
| summarize count() by bin(TimeGenerated, 1h),slice
| order by TimeGenerated asc nulls last

You will see some lines in the result.

Let's create a Grafana.

First, accept the Grafana terms:

az vm image terms accept --publisher bitnami --offer grafana --plan default

Now create a VM with a template.

export DATA_SOURCE="https://raw.githubusercontent.com/mspnp/spark-monitoring/master/perftools/deployment/grafana/AzureDataSource.sh"
az deployment group create \
    --resource-group <resource-group-name> \
    --template-file grafanaDeploy.json \
    --parameters adminPass='<vm password>' dataSource=$DATA_SOURCE

Now, you need to take note of the admin password printed in the deploy log. Go to VM created before, look to the Boot diagnostics menu, select Serial Log, and find "Setting the Bitnami App Password" there is an admin password. Take note to use it later.

Create a Grafana Dash

Create a Grafana Dash

Get a VM IP and open http://<IP address>:3000. Login with the admin password noted before.

Go back to the repository you cloned locally, open the "perftools/dashboards/grafana" directory, put your Azure Log Analytics workspace ID in the variable WORKSPACE and execute the DashGen.sh :

export WORKSPACE=<your Azure Log Analytics workspace ID>
export LOGTYPE=SparkListenerEvent_CL

sh DashGen.sh

It'll create the SparkMonitoringDash.json file.

Come back to Grafana, click the plus icon to create a new dash, and import the json file.

We done. You can see your monitoring data in the Dash

To improve your security, you can configure the firewall of the VM to accept connections just from your VPN IP.


December 20, 2022

Background image credits forSigmund

Copyright © Gatsby-starter-grayscale 2019