Fetch web data using HTML Agility Pack in Windows Phone 8

In this article I’m going to explain how you can fetch data from web. It’s a very powerful tool I must say. We can make any static website into ‘app’ and also use this tricks in dynamic website to some extent. You must have a basic knowledge of HTML and have at least understanding of LINQ query. I’ll be using a website popularly known as “songs.pk” to fetch the album names. And I’ll also use data binding. If you’re not familiar with Data Binding, please go to this link.

How to bind ListBox to data in Windows phone application – Introduction

Okay let’s get started. Create an Empty Project (Windows Phone 8.1). Create a new class Album and put these lines as shown below.



Basically these will be the data structure for the list that will be shown in the MainPage.

And the next step is to add the reference of HtmlAgilityPack. Now let me tell you that there is no official version of HtmlAgilityPack for windows phone 8.1. It was built for WP 8.0 version. As this is free of cost, the developer didn’t make the WP 8.1 version. Still the original version works for WP8.1 Silverlight version. But in windows phone 8.0 and 8.1 Silverlight version doesn’t support SD Card writing feature. This is why I choose to work on 8.1 version so that you can make the best use of Web Scraping like downloading file from internet. So the question is how do I add the reference of HtmlAgilityPack in 8.1 project solution?

A modified version of HtmlAgilityPack has been released by some developer. I got it from stackoverflow. You can download it from here.
HtmlAgilityPack.src
You will have to manually add it to your solution.

First copy the HtmlAgilityPack.src folder into your project folder. You’ll get a HtmlAgilityPack.dll file in this directory

HtmlAgilityPack.src è HtmlAgilityPack.Universal è bin è Debug

Now all you’ve to do is right click on your reference of your visual studio project solution and then click the Add Reference. A pop up will appear, click browse and select the HtmlAgilityPack.dll file.

Done! Now you are ready to code finally. Put these lines into the main Grid of your MainPage.xaml file.



Here you can see that Databinding is implemented in a TextBlock. albumName is basically the variable name you declared in you Album class.

Now go to the MainPage.xaml.cs file and write this line to use HtmlAgilityPack.

using HtmlAgilityPack;

Before you use HtmlAgilityPack, you need to have some basic knowledge about async and await.

When the keyword async is written in function, it means that this function will run in parallel in another thread. This function won’t block UI of the app. Most of the case it’s used to write large file into system or download some information from net so that the user don’t feel the app is frozen.

Await keyword is written before a method which basically invokes/calls the method which is responsible for downloading/writing.

This is very top level description. You can Google to know more about this. By the way these keywords works as a pair like ‘as ….. If’ phrase.

Now put this function in your MainPage.xaml.cs file. You’ll see some error, don’t worry, eventually everything will be fixed.




Here you can see the use of async and await pair. We declared ObservableCollection data type and called a method scrape with a parameter.

ObservableCollection is just another version of List. Google yourself to find out more. You can also use List here. Now paste the core function scrape in this file which basically does the scraping.



Lots of things are unfamiliar to you now. I’m going to simplify every bit of code. So basically to use HtmlAgilityPack, we need to have HtmlDocument which can be used only by HtmlWeb Class.

The above line actually download the whole source code of your desired URL.

Now you’ve to make use of the other available function of HtmlAgilityPack. If you go to the source of songspk.name and you’ll see that all the latest album names are kept under a class songs-list1 which has an unordered list <ul> node. So from the downloaded document we’ll select that portion only. To select a part of the Document, we need to use the class HtmlNode.

We’ll talk more about later. Now if we closely look at the source of the <ul> node, we’ll see that there are some <li> node into this node. And the first <li> node is of no use for our example. We’ll remove that. Another problem is that when we try to fetch data from net via HtmlAgilityPack, we get a garbage node for each of the internal node. We’ll have to remove that nodes also.

After getting the desired nodes, we’ll have to filter these nodes again. Because these <li> contain several internal nodes. But we only need the one that has URL and the name of the Album. Using a foreach loop we’ll create objects of Album class and put the objects into our ObservableCollection.

Another thing to look for is the parameter of the Descendants function. It is a LINQ query. If you’re not familiar with this, please search on the web to learn some basic LINQ query. And the scrape function has Task<…> keyword. It actually supports the async-await keywords. The Task means that this function will create a new thread and destroy this thread after the function completes its execution.

Download WebScraping.zip

Disclaimer: Windows App Tutorials doesn’t encourage you to use unauthorized content from any website without prior permissions. The demo is just to explain the usage of HTMLAgilityPack.

Faysal

I'm an Undergraduate Student in North South University,Bangladesh studying Computer Science and Engineering. Work as a part time freelancer. I like to discuss about innovative ideas. To me,the most fascinating things about developing an app/product is working as a team continuously for hours and discussing to improve the efficiency of that program. I also enjoy teaching programming over Skype which gives me the feel of expressing myself.

  • Patrick

    An exception of type ‘System.TypeAccessException’ occurred in HtmlAgilityPack.DLL and wasn’t handled before a managed/native boundary

    Additional information: Attempt by security transparent method ‘HtmlAgilityPack.HtmlWeb.LoadFromWebAsync(System.Uri, System.Text.Encoding, System.Net.NetworkCredential)’ to access security critical type ‘System.Net.NetworkCredential’ failed.

  • anant

    Amazing and really helpful article . Thank u . And could u post another article wherein I can automatically login to a website like facebook then scrape the data .

    • well, there’s no way to login automatically into a website unless the website owners gives the access via API.

  • Priom Biswas

    Nice article Faysal, MVVM light use korle aro better hbe, eitake modify korte paren

Read more:
How to create your first Windows Phone app
How to change the App name and Tile name of your Windows phone app
Data binding in Windows Phone app – One way binding
Close