In this article I’m going to explain how you can fetch data from web. It’s a very powerful tool I must say. We can make any static website into ‘app’ and also use this tricks in dynamic website to some extent. You must have a basic knowledge of HTML and have at least understanding of LINQ query. I’ll be using a website popularly known as “songs.pk” to fetch the album names. And I’ll also use data binding. If you’re not familiar with Data Binding, please go to this link.
Okay let’s get started. Create an Empty Project (Windows Phone 8.1). Create a new class
Album and put these lines as shown below.
Basically these will be the data structure for the list that will be shown in the MainPage.
And the next step is to add the reference of
HtmlAgilityPack. Now let me tell you that there is no official version of
HtmlAgilityPack for windows phone 8.1. It was built for WP 8.0 version. As this is free of cost, the developer didn’t make the WP 8.1 version. Still the original version works for WP8.1 Silverlight version. But in windows phone 8.0 and 8.1 Silverlight version doesn’t support SD Card writing feature. This is why I choose to work on 8.1 version so that you can make the best use of Web Scraping like downloading file from internet. So the question is how do I add the reference of
HtmlAgilityPack in 8.1 project solution?
A modified version of
HtmlAgilityPack has been released by some developer. I got it from stackoverflow. You can download it from here.
You will have to manually add it to your solution.
First copy the HtmlAgilityPack.src folder into your project folder. You’ll get a HtmlAgilityPack.dll file in this directory
HtmlAgilityPack.src è HtmlAgilityPack.Universal è bin è Debug
Now all you’ve to do is right click on your reference of your visual studio project solution and then click the Add Reference. A pop up will appear, click browse and select the HtmlAgilityPack.dll file.
Done! Now you are ready to code finally. Put these lines into the main Grid of your MainPage.xaml file.
Here you can see that Databinding is implemented in a
albumName is basically the variable name you declared in you
Now go to the
MainPage.xaml.cs file and write this line to use
Before you use
HtmlAgilityPack, you need to have some basic knowledge about
When the keyword
async is written in function, it means that this function will run in parallel in another thread. This function won’t block UI of the app. Most of the case it’s used to write large file into system or download some information from net so that the user don’t feel the app is frozen.
Await keyword is written before a method which basically invokes/calls the method which is responsible for downloading/writing.
This is very top level description. You can Google to know more about this. By the way these keywords works as a pair like ‘as ….. If’ phrase.
Now put this function in your MainPage.xaml.cs file. You’ll see some error, don’t worry, eventually everything will be fixed.
Here you can see the use of
await pair. We declared
ObservableCollection data type and called a method scrape with a parameter.
ObservableCollection is just another version of
List. Google yourself to find out more. You can also use
List here. Now paste the core function scrape in this file which basically does the scraping.
Lots of things are unfamiliar to you now. I’m going to simplify every bit of code. So basically to use
HtmlAgilityPack, we need to have
HtmlDocument which can be used only by
The above line actually download the whole source code of your desired URL.
Now you’ve to make use of the other available function of
HtmlAgilityPack. If you go to the source of songspk.name and you’ll see that all the latest album names are kept under a class
songs-list1 which has an unordered list <ul> node. So from the downloaded document we’ll select that portion only. To select a part of the
Document, we need to use the class
We’ll talk more about later. Now if we closely look at the source of the <ul> node, we’ll see that there are some <li> node into this node. And the first <li> node is of no use for our example. We’ll remove that. Another problem is that when we try to fetch data from net via
HtmlAgilityPack, we get a garbage node for each of the internal node. We’ll have to remove that nodes also.
After getting the desired nodes, we’ll have to filter these nodes again. Because these <li> contain several internal nodes. But we only need the one that has URL and the name of the Album. Using a
foreach loop we’ll create objects of Album class and put the objects into our
Another thing to look for is the parameter of the
Descendants function. It is a LINQ query. If you’re not familiar with this, please search on the web to learn some basic LINQ query. And the scrape function has Task<…> keyword. It actually supports the async-await keywords. The Task means that this function will create a new thread and destroy this thread after the function completes its execution.
Disclaimer: Windows App Tutorials doesn’t encourage you to use unauthorized content from any website without prior permissions. The demo is just to explain the usage of HTMLAgilityPack.