Connect to Azure Data Lake Store using MuleSoft


Azure Data Lake Store is an enterprise-wide hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.

This guide shows how easy it is to connect to Azure Data Lake Store using MuleSoft by using the webHDFS REST APIs that are exposed. Azure Data Lake Store is essentially a version of HDFS. While MuleSoft has a connector for HDFS, Azure requires authentication through Active Directory using OAuth. The guide below will show you how to setup an Azure Data Lake Store, setup an Azure Active Directory app, and connect to Azure using a pre-configured project in MuleSoft Anypoint Studio.


  • Microsoft Azure account
  • MuleSoft Anypoint Studio

Create an Azure Data Lake Store

If you don’t already have an Azure Data Lake Store, these are the steps to setup one yourself. It does require that you have an Azure account. You can sign up for a 30 day account or just use the Pay-As-You-Go account which is what I used.

  1. Login to the Azure portal and click on All services on the left hand navigation bar.
  2. In the search field, type in data lake and click on Data Lake Store

  3. In Data Lake Store, click on Create Data Lake Store in the center or click on Add in the top left.
  4. In the New Data Lake Store window, enter the following information:
    • Name: mulesoft
    • Subscription: This can depend on your account. If you recently signed up, you should have a 30 day subscription. I selected Pay-As-You-Go
    • Resource group:
      Either create a new resource group or use an existing one. (e.g. mulesoft)
    • Location: leave the default East US 2
    • Encryption settings: I set this to Do not enable encryption.
  5. Click on Create
  6. Once the Data Lake Store is created, click on Data explorer in the left hand navigation menu. We want to grant access to all users for this demo, so click on the Access button.
  7. In the Access screen, check the Read, Write, and Execute checkboxes under Everyone else and then click on Save

Create an Azure Active Directory “Web” Application

    1. In the left hand navigation menu, click on Azure Active Directory. If the menu item isn’t there, click on All services and search for it.
    2. In the navigation menu for Azure Active Directory, click on App registrations
    3. Click on New application registration
    4. In the Create window, enter the following data:
      • Name: mule
      • Application type: Keep the default Web app / API
      • Sign-on URL: Just enter http://localhost:8081. This can be changed later and doesn’t affect anything in this demo.
      • Click on Create
        Once the app has been created, copy down the Application ID: e.g. bdcabff5-af3c-4127-b69b-38bcf1792bfd
    5. Next, click on Settings and then click on Required permissions
    6. In the Required permissions screen, click on Add and then click on Select an API. Then select Azure Data Lake from the list of available APIs and then click on Select
    7. In the Enable Access screen, check the Have full access to the Azure Data Lake service checkbox and then click on Select
    8. Next we need to generate a key. Click on Keys on the left hand navigation bar for the app settings.
    9. In the Keys window, enter the following:
      • Description: mule-app-key
      • Expires: Set this to Never expires
      • And then click on Save. A value will appear for the key. Copy that value down. e.g. +zAbZQgXomvqsfHgCH32Yv+VCvkT3ZcxRyw5CWaw4dw=
    10. Next, let’s get your tenant ID. In Azure Active Directory click on Properties in the left hand navigation bar.
    11. Copy down the Directory ID value. That’s your tenant ID that you’ll need to make the OAuth call. e.g. 57744783-79ff-49ab-b27e-26245d4d97ef


Download and Run the Mule Project

  1. Download the project from Github
  2. Import the project into Anypoint Studio
  3. Open the and modify the following properties:
    • <name of your Data Lake Store>
    • oauth.path: <tenant ID from Step 2.11>/oauth2/token
    • adls.client_id: This is the ID from Step 2.5
    • adls.client_secret: This is the key from Step 2.9
  4. The first flow will list the files and folders from the root directory of the Data Lake Store. Once the flow receives the request it creates the parameters to be sent to the OAuth request. If the OAuth request is successful, it returns an access token that is used to make the HDFS request to list the folders.
  5. The second flow shows how to upload data to the Data Lake Store. Similar to the first flow, it makes an OAuth request and passes the access token to make the HDFS request.
  6. Run the project and open up Postman.
  7. Let’s test the first flow. Paste the following into the request URL: http://localhost:8081/liststatus and click Send. The screenshot below shows the results. If you add some folders and files, you’ll receive more data from the API call.
  8. Open another tab and paste the following URL: http://localhost:8081/create?name=list2.txt
  9. Change the method to PUT. Under the Body section, select the binary radio button and select the file name list.txt from the src/main/resources folder from the project. Click on Send.
  10. If successful, when you switch back the Data Lake Store, you should see the file at the root directory. Click to open the file.
  11. You should see the following data in the file if everything was configured successfully.

My Products I Can’t Live Without (2009)

In response to Michael Arrington’s list on TechCrunch, here’s my list of products that I can’t live without. This list is in no particular order:

  • Google Reader – I’ve been a fan since the start and their on-going improvements never dissappoint me. Along with Google Chrome, Google Reader can’t be beat.
  • Twitter – At first I was a little apprehensive about this tool but I’ve recently become hooked. TwitterBerry adds to the addiction by making it possible to follow my fellow Twitters and tweet when I’m on the road.
  • Facebook – At one point I was on at least a dozen social networking sites (Friendster, Bebo, MySpace, etc…) but in the end Facebook won. Primarily because it’s what all my friends use and has the mose user friendly interface (to some).
  • FriendFeed – I don’t think this tool has caught on yet in the mainstream yet partly because it tends to be information overload. But once you figure out how to manage the people that you follow and setup your lists correctly, it’s great.
  • – Four shows make this a product that I can’t live without: The Simpsons, The Daily Show, The Colbert Report, and Family Guy
  • – Mint is what I consider my FriendFeed to my finances. Since they added the ability to track investments this year, Mint has replaced Quicken as my tool of choice to analyze my financial data.
  • – I’ve used this site since it’s inception and it’s never failed me once. It makes organizing all my trips a piece of cake.
  • – The latest release of 2.8 makes WordPress the blogging tool of choice.
  • Blackberry Pearl – Though I’d love to upgrade to the latest Storm or Bold, the form factor of the 8100 can’t be beat. It’s nice not to have that cell-phone bulge in your coat or pants pocket.
  • Google Chrome / FireFox – It’s a toss up between these two right now because I’ve grown so accustomed to the FireFox extensions that make my web-browsing experience so easy and comfortable. But Chrome is fast, and with so many AJAX intensive sites out there today, once they add plugins to the mix, I’ll probably make the switch.
  • Gmail – I agree that Yahoo! Mail has the best UI experience but Gmail provides so much more functionality (Search, POP/IMAP, Chat, Tasks, etc…) making it the product of choice.
  • LinkedIn – The career networking site of choice, there’s really no other site out there that can match the number of users that are on LinkedIn.