My Lab Setup for Azure Databricks practice

Azure Databricks is a Microsoft Azure-based version of the popular open-source Databricks platform. Similarly to Azure Synapse Analytics, an Azure Databricks workspace provides a central point for managing Databricks clusters, data, and resources on Azure.

Step 1. Signup for Azure Free trial

Link: https://azure.microsoft.com/en-in/free

Step 2. Sign in to the Azure portal

Link: https://portal.azure.com

Step 3. Create Azure Resource Group

Step 4. Open Cloud Shell in Azure Portal

Use the [>_] button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a PowerShell environment and creating storage if prompted. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal, as shown here:

selecting a PowerShell environment and creating storage if prompted

We will get the option to select a subscription, create/ use a resource group, storage account, file share

Let us create using Cloudshell, though we can create from GUI as well without much hustle.

cloud shell provides a command line interface in a pane at the bottom of the Azure portal,

Step 5. Clone repository from git hub

Here we can clone the existing repository to create Databricks services or we can do manually using the Azure portal also this has some resources to be used in the future like datasets.

In the PowerShell pane, enter the following commands to clone this repo:

rm -r mslearn-databricks -f
git clone https://github.com/MicrosoftLearning/mslearn-databricks

After the repo has been cloned, enter the following command to run the setup.ps1 script, which provisions an Azure Databricks workspace in an available region:

./mslearn-databricks/setup.ps1

I have tried executing the setup.ps1 file in the folder "mslearn-databricks" but no luck. I got a warning Insufficient resources

Step 6. Create Azure Databricks Service

This step would not have been required if setup.ps1 in the previous step had been successful.

setup,ps1 not able to create Azure resources

I can see Azure Dataribcks service is created along the managed resource group name required for Azure Databricks.

Azure myFirst resources from ms databricks

Step 7. Launch Azure Databricks

We will search Azure data bricks in the resource list and click the "Launch Workspace button"

Step 8. Create Cluster in Azure DataBricks

Azure Databricks is a distributed processing platform that uses Apache Spark clusters to process data in parallel on multiple nodes. Each cluster consists of a driver node to coordinate the work, and worker nodes to perform processing tasks. For practice, we’ll create a single-node cluster to minimize the compute resources used in the lab environment (in which resources may be constrained). In a production environment, we typically create a cluster with multiple worker nodes.

Databricks runtime version: 13.3 LTS (Spark 3.4.1, Scala 2.12) or later

In the sidebar on the left, select the (+) New task, and then select Cluster.
In the New Cluster page, create a new cluster with the following settings:

Cluster name: User Name’s cluster (the default cluster name)
Policy: Unrestricted
Cluster mode: Single Node
Access mode: Single user (with your user account selected)
Databricks runtime version: 13.3 LTS (Spark 3.4.1, Scala 2.12) or later
Use Photon Acceleration: Selected
Node type: Standard_DS3_v2
Terminate after 20 minutes of inactivity

Wait for the cluster to be created. It may take a minute or two.

Step 9. Create Pyspark Notebook

In the sidebar, use the (+) New link to create a Notebook.
Change the default notebook name (Untitled Notebook [date]) to a new name
In the Connect drop-down list, select your cluster if it is not already selected. If the cluster is not running, it may take a minute or so to start.

Change the default notebook name (Untitled Notebook [date]) to a new name

Now you can practice all the code you want and practice for your interview at least for a month

Step Last. Cleanup Resources

In the Azure Databricks portal, on the Compute page, select your cluster and select ■ Terminate to shut it down.
If you’ve finished exploring Azure Databricks, you can delete the resources you’ve created to avoid unnecessary Azure costs and free up capacity in your subscription.

Knowledge Sharing

Pages

Azure Databricks Lab, How to Start with ?