Azure Databricks Lab, How to Start with ?


 My Lab Setup for Azure Databricks practice


Azure Databricks is a Microsoft Azure-based version of the popular open-source Databricks platform. Similarly to Azure Synapse Analytics, an Azure Databricks workspace provides a central point for managing Databricks clusters, data, and resources on Azure.


Step 1. Signup for Azure Free trial 

Link: https://azure.microsoft.com/en-in/free

https://azure.microsoft.com/en-in/free


Step 2. Sign in to the Azure portal


https://portal.azure.com



Step 3. Create Azure Resource Group

Create Azure Resource Group


Resource Group List


Step 4. Open Cloud Shell in Azure Portal


Use the [>_] button to the right of the search bar at the top of the page to create a new Cloud Shell in the Azure portal, selecting a PowerShell environment and creating storage if prompted. The cloud shell provides a command line interface in a pane at the bottom of the Azure portal, as shown here:

selecting a PowerShell environment and creating storage if prompted


We will get the option to select a subscription, create/ use a resource group, storage account, file share
Let us create using Cloudshell, though we can create from GUI as well without much hustle.


We will get the option to select a subscription, create/ use a resource group,

cloud shell provides a command line interface in a pane at the bottom of the Azure portal,


Step 5. Clone repository from git hub


Here we can clone the existing repository to create Databricks services or we can do manually using the Azure portal also this has some resources to be used in the future like datasets.

In the PowerShell pane, enter the following commands to clone this repo:
  •  rm -r mslearn-databricks -f
  •  git clone https://github.com/MicrosoftLearning/mslearn-databricks

rm -r mslearn-databricks -f


After the repo has been cloned, enter the following command to run the setup.ps1 script, which provisions an Azure Databricks workspace in an available region:
  •  ./mslearn-databricks/setup.ps1

./mslearn-databricks/setup.ps1

I have tried executing the setup.ps1 file in the folder "mslearn-databricks" but no luck.  I got a warning Insufficient resources



Step 6. Create Azure Databricks Service

This step would not have been required if setup.ps1 in the previous step had been successful.

setup,ps1 not able to create Azure resources


I can see Azure Dataribcks service is created along the managed resource group name required for Azure Databricks.

I can see Azure Dataribcks service is created along the managed resource group name required for Azure Databricks.

Azure myFirst resources from ms databricks



Azure synapses lab 1


Step 7. Launch Azure Databricks


We will search Azure data bricks in the resource list and click the "Launch Workspace button"

You can click url also

Opening my Azure drabricks studio

Azure Synapse workspace


Step 8. Create Cluster in Azure DataBricks


Azure Databricks is a distributed processing platform that uses Apache Spark clusters to process data in parallel on multiple nodes. Each cluster consists of a driver node to coordinate the work, and worker nodes to perform processing tasks. For practice, we’ll create a single-node cluster to minimize the compute resources used in the lab environment (in which resources may be constrained). In a production environment, we typically create a cluster with multiple worker nodes.



Databricks runtime version: 13.3 LTS (Spark 3.4.1, Scala 2.12) or later

  • In the sidebar on the left, select the (+) New task, and then select Cluster.
  • In the New Cluster page, create a new cluster with the following settings:
    • Cluster name: User Name’s cluster (the default cluster name)
    • Policy: Unrestricted
    • Cluster mode: Single Node
    • Access mode: Single user (with your user account selected)
    • Databricks runtime version: 13.3 LTS (Spark 3.4.1, Scala 2.12) or later
    • Use Photon Acceleration: Selected
    • Node type: Standard_DS3_v2
    • Terminate after 20 minutes of inactivity
  • Wait for the cluster to be created. It may take a minute or two.


Node type: Standard_DS3_v2


Step 9. Create Pyspark Notebook


  • In the sidebar, use the (+) New link to create a Notebook.
  • Change the default notebook name (Untitled Notebook [date]) to a new name
  • In the Connect drop-down list, select your cluster if it is not already selected. If the cluster is not running, it may take a minute or so to start.



Change the default notebook name (Untitled Notebook [date]) to a new name

MyFirst Azure Synapse Notebook




Now you can practice all the code you want and practice for your interview at least for a month



Step Last.  Cleanup Resources


  • In the Azure Databricks portal, on the Compute page, select your cluster and select ■ Terminate to shut it down.
  • If you’ve finished exploring Azure Databricks, you can delete the resources you’ve created to avoid unnecessary Azure costs and free up capacity in your subscription.

No comments:

Post a Comment