Explore Azure Databricks - Azure Databricks Lab 1

Explore Azure Databricks

Step 1. Setup Azure Databricks Workspace and open notebook

Steps are available: https://saurabhsinhainblogs.blogspot.com/2024/01/azure-databricks-lab-how-to-start-with.html

Connect to Azure Portal
Setup Azure Datbricks
Setup Cluster for Azure Databricks
Open Notebook

Step 2. Prepare Data to consume

Go to the URL: https://raw.githubusercontent.com/MicrosoftLearning/mslearn-databricks/main/data/products.csv
Download data in CSV file, I named the file product.csv
On the File menu, select Upload data to DBFS.
In the Upload Data dialog box, note the DBFS Target Directory to where the file will be uploaded.
Then select the Files area, and upload the products.csv file you downloaded to your computer.
When the file has been uploaded, select Next
In the Access files from notebooks pane, select the sample PySpark code and copy it to the clipboard.
You will use it to load the data from the file into a data frame. Then select Done.

In the Access files from notebooks pane, select the sample PySpark code and copy it to the clipboard.

Step 3. Execute code in Notebook

In the notebook, in the empty code cell, paste the code you copied; which should look similar to this:
Use the ▸ Run Cell menu option at the top-right of the cell to run it, starting and attaching the cluster if prompted.
Wait for the Spark job run by the code to complete. The code has created a data frame object named df1 from the data in the file you uploaded.

Code:

df1 = spark.read.format("csv").option("header", "true").load("dbfs:/FileStore/shared_uploads/a@b.com/products.csv")

df1 = spark.read.format("csv").option("header", "true").load("dbfs:/FileStore/shared_uploads/saurabh.f.sinha@accenture.com/products.csv")

Step 4. Display the contents of the data frame

Under the existing code cell, use the + icon to add a new code cell. Then in the new cell, enter the following code:
Use the ▸ Run Cell menu option at the top-right of the new cell to run it. This code displays the contents of the data frame, which should look similar to this:

Code:

print(df1)

df1.collect()

display(df1)

print(df1)

collect(df1)

display(df1)

Step 5. Create Data Visualization and Data Profile

Above the table of results, select + and then select Visualization to view the visualization editor, and then apply the following options:
Visualization type: Bar
X Column: Category
Y Column: Add a new column and select ProductID. Apply the Count aggregation.

Step 6. Create and query a table

Save the data frame as table object
SQL code to return the name and price of products in the Touring Bikes category.

Code:

df1.write.saveAsTable("products")

%sql

SELECT ProductName, ListPrice FROM products WHERE Category = 'Touring Bikes';

df1.write.saveAsTable("products")

%sql SELECT ProductName, ListPrice FROM products WHERE Category = 'Touring Bikes';

Now if you want, you can try a couple more options

Step Last. Cleanup Resources

In the Azure Databricks portal, on the Compute page, select your cluster and select ■ Terminate to shut it down.
If you’ve finished exploring Azure Databricks, you can delete the resources you’ve created to avoid unnecessary Azure costs and free up capacity in your subscription.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)