Tuesday, July 31, 2012

Microsoft's all new Outlook

Microsoft today launched the all new email service "Outlook" (outlook.com). Microsoft has refurbished its old email service hotmail and gives an absolutely stunning interface. It has a very elegant, simple yet sophisticated interface. It gives a never assumed emailing experience. It has a different strategy to manage all your annoying newsletters which you subscribed out of curiosity.

Outlook also helps you to be "connected" (as VP of cloud services at MSFT calls it). You can see all the information from your social networking accounts and have that right on your email account and actually respond to your friend's comments or photos. Looks like MSFT is making its deal with Facebook count as much as possible. Not to forget "tagging your friends with Bing search" feature, which Microsoft introduced not very long ago.







Interview of Chris Jones, VP cloud services at Microsoft

Techcrunch indicated that Outlook comes with a 7 GB free cloud storage. Even before I logged in to check it out, I assumed MSFT is bringing in its already existing feature of 7 GB free storage it gives with Skydrive and I wasn't wrong. I had to look around for a while though, to learn how to access my Skydrive from outlook.com. However, it is neatly integrated with the email service. You can log-in to outlook.com using your existing Windows Live username or what Microsoft calls as Microsoft Passport. It need not necessarily be a @hotmail.com or a @outlook.com account which is a good thing as no one needs or wants to create a new email address for every email service. You can integrate any of your email accounts with outlook and use outlook.com forever (at least until yet another mind-blowing email service is made available).

Microsoft also promises to integrate skype with the new outlook, which means we will have a gmail video chat experience in outlook (without the delay though :)). The news comes out the day after gmail's announcement of  integrating Google Hangout with gmail (which didn't really make me a happy man). Hope the skype integration doesn't come at an expense of access speed to emails. Though, I feel MSFT has taken a while to get here, I should say it has got here in style. Have to wait and see what would be its strategy to pull Gmail users towards outlook.

The only thing that keeps me worried about this new Outlook is the irrelevant Bing ads in the side bar. I got an immediately reply from an Outlook representative for one of my techcrunch comments  saying "One of the core pillars of Outlook.com is that we don't read your email in order to show you ads. The Bing deals and offers on the right hand side of your inbox are based on the info you provide us; your gender, age, location etc. We're working hard to improve the relevancy of the Bing deals and offers you see, and these will improve over time, but we want to make sure that is not at the expense of your personal privacy. If you'd like to see more relevant Bing deals and offers make sure your account information is up to date with you age, gender, location etc." Though, I should respect the privacy policy of MSFT, I am wondering how could the quality of ads improve by just knowing a person's age, gender, location, etc. May be I am not paying much attention to the word "etc" here. :) Let's wait and watch the game. 

Wednesday, July 25, 2012

HTML Agility Pack

Problems with loading ad parsing HTML file as XML document

Recently I had a situation where I had to parse entire live HTML pages. I implemented it by loading the HTML file as a XMLDocument and parsed it without any problem. It worked perfectly for months until one day when I received an error "XML parsing error at line 46 position 260". I tried analyzing my piece of code but couldn't find anything weird or meaningless. After digging around and hitting the bush around the problem for a while, I figured that the HTML page had an "&" and so XMLDocument was not able to load the document as "&" has an whole different meaning in the XML world. Unfortunately, no XML library gets this right. I tried using various options like XMLResolver, XMLReaderSettings, ValidationOptions, etc to solve the problem but couldn't come up with what I wanted and what would have made me happy.


Problem: The XML DTDs are particularly serious, because DTDs usually include general entity declarations (especially for things like &) which the XML file will rely on. So, if a parser chooses to neglect loading the DTD, and XML makes use of general entity reference, the parser will fail doing the job what it was intended to. The only solution is to create a transparent caching entity resolver, which would put the downloaded files into an archive in the library search path, so that the archive would be dynamically created and automatically bundled with any software distributed. I guess, even in the JAVA world, there isn't an impressive EntityResolver which could do the job (the java gurus shouldn't get pissed off!)

So, I decided to load the HTML just as HTML and parse it. Google introduced me to what is called as HTML Agility pack. This library provides all the necessary methods that a XML namespace has in .Net. Loading and parsing the HTML page was very similar to the XMLDocument parsing, and I altered my parser in no time.



The parsing looks very clean and does the job exactly as how I wanted it to do. 

The code snippet above explains how I loaded the HTML page and parsed it. As you could see, it is as simple as you would parse a XML document. Using the HTML Agility pack, you can also parse the document using LINQ. As of now I am not a huge enthusiast of the LINQ feature in .NET, so I stuck to the traditional parsing methodology.