FF Plugin: News articles on Twitter hash tags

I wanted to try my hand on firefox plugin programming. And, guess what, I did! 🙂 This is why, what & how I achieved. Typically my technical blogs contain a lot of information just the way I prefer to read/learn from other blogs. So you can expect details… details… details.. and more words! 🙂

Well, it all started with this. While twittering I noticed a hashtag about a recent international event and realized I did not know anything about it. Now I am forced to change my tab and open google (well, I don’t bing! 😉 ) for this. This got me thinking. Why? Isn’t twitter actually stream of news? Yeah, it could be news about my ex to news about my friend’s neighbor’s boring cousin to news about exciting product launch. But it is nothing but for news, isn’t it? So why then I am forced to move out of twitter to view news?

Thus took birth my plugin development task.

My requirement is simple: “News articles about hashtags

Of course, this requires news sites to provide APIs that I can use to query. My obvious choice was Google. Then went to Yahoo and to Bing. Well either they weren’t free or weren’t useful or was too complex for my simple requirement. Anyways, it is an entirely different blog article to write about my hunt in finding a free search API. Finally I found ‘Faroo.’ Thankfully they provide searches based on news as well as web search. With that in hand, I ventured into understanding the firefox plugin world.

Lemme break down the requirement into smaller logical units. The algorithm/pseudo-code is going to be simple:

  1. Wait for the page to be loaded.
  2. Extract all the hash keys
  3. Query faroo for news articles
  4. pop up a tooltip when mouse-over.

Simple, yeah?! 😉

Enter inside

As per my understanding, which is limited by the way, any plugin is event-based. In other words, a plugin kicks in & gets control because an event happened and this plugin had registered for that event. So, the first thing to do is to register for an event. But what event? Well, the events that I considered were:

  • DOMContentLoaded event: The DOMContentLoaded event is fired when the document has been completely loaded and parsed, without waiting for stylesheets, images, and subframes to finish loading
  • Load event: the load event can be used to detect a fully-loaded page

Before I step into how & what events I listened to, here are a few technical jargons paraphrased using my understandings:

  • window‘ : a tab is a window. The window object has the majority of the properties like length, innerWidth, innerHeight, name, if it has been closed, its parents, and more.
  • browser‘: where the web pages are loaded. (the way I look, Firefox is an application for browsing INTERNET. where as this ‘browser’ object is the one that does the actual rendering of the HTML pages.)
  • document‘: The document object is your html, aspx, php, or other document that will be loaded into the browser. The document actually gets loaded inside the window object and has properties available to it like title, URL, cookie, etc.

Okies. I am assuming so far it has been _not_ difficult :). Now the very first step to do is to wait for the ‘window’ to be loaded.

window.addEventListener("load", FS.Core.init, false)

This would call our function ‘FS.Core.init’ (i.e., init function of Core class inside FS name space) whenever the window object (as defined above) is entirely loaded.

The next step is to get hold of the browser’s document. Once done then we can manipulate the document (i.e., HTML page). So, we shall do that by listening for an event (again!) inside the FS.Core.init(). Now all I need is just the DOM content and not the peripherals like the CSS or images, etc.. Hence, here I listen to the DOMContentLoaded event:

var appcontent = document.getElementById("appcontent");
appcontent.addEventListener("DOMContentLoaded", FS.Core.onPageLoad, true);

As described above, the onPageLoad is called whenever the DOM content is loaded and ready for manipulation. 🙂 With this we have completely entered into the plugin with all the required data.

Ensure the URL is twitter

Now, having entered in let us understand what this means! We get an “entry in” every time a web page is opened. In other words, we get an handle _AFTER_ the document (i.e., HTML & related data) has been received from the server but _BEFORE_ they have been displayed to the user! This is exactly what position we want to be in. This gives us an option to edit the document before the user gets to see this!

Now what this also means is that we get handle to _EVERY_ web page and we obviously don’t want that. So we gotta filter out only the twitter web pages. How to get hold of domain? All we have is just the event handle. So, now…

var doc = aEvent.originalTarget;
JS Tips

Original target is the document without any of the alterations.

A simple javascript knowledge should tell you how to retrieve the domain 🙂

if (doc.location.host != 'twitter.com') {
    return;
}

Grab & extracts all the Hashtags

Thankfully, twitter’s HTML page puts an anchor tag around the hashtags. So, all we gotta do is just to look out for a specifically formatted anchor tag. For that let us extract all the links in a given document. Actually this fairly simple:

doc.evaluate('//a[@href][not(@faroosearch)]',
              doc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
              null);

This returns an array of links in the same order that it appeared on the page. Just loop through the array and ensure that the domain is twitter and ignore all the non-hash twitter links like user-profiles, trends, etc.. Now for each of these links register for events of interest, i.e., ‘mouseover’ and ‘mouseout’ events.

Query the search engine

A quick assessment on where we are. We have a hook onto the twitter page’s DOM content. Also we have modified the hash links to register for mouseover & mouseout events. Cool! Next step is to prepare a query using the Hashtag and actually make that query with our search engine of choice.

Fortunately, twitter’s hashtag links are simple & follow a certain pattern – ‘https://twitter.com/search?q=%23&src=hash‘. Hence, extracting the hashtag from that link isn’t difficult.

var prefix = 'https://twitter.com/search?q=%23';
var hash_with_extra = url.substring(prefix.length);
var hash = hash_with_extra.substring(hash_with_extra.length - 9, 0);

Voila! Given any URL, we can extract the hashkey as show above! 🙂

Next step is to query our search engine for news articles based on the above extracted hashkey. At the time of this article, faroo search had the following API format for querying:

http://www.faroo.com/api?q=&start=&length=&l=&src=<news/web/blog>&f=&key=

Now the best way to send this query is to use XMLHttpRequest, which is an easy way to retrieve the data from the URL without having to do a full page refresh. For a more compact understanding check here. And, a nice quick tutorial on how to XMLHttpRequest from AJAX is useful to understand what happens here.

FS.AJAX.request( {
    url: FS.API.searchurl(hash),
    onError: function onError(status, statusText) {

    },
    onSuccess: function onSuccess(responseText) {
        1. Parse JSON
        2. For every result entry, extract 'title,' 'domain,' & 'url' to prepare the tooltip.
    }

Bingo!! Whenever a mouseover event happens for an url, the tooltip is popped up like shown below! 🙂

Grand finale!

I decided 5 or less latest news articles are more than enough. Also I provided the link to each of these news articles. So, my pop looks as ugly as this:

faroo search pop up
faroo search pop up

 

VERY IMPORTANT!

For this plugin to work in your environment. Please get yourself a developer key from faroo search and input that in the options window as show below.

faroo search options
faroo search options

Optimization

I have done plenty of optimization in the code. So if you planning to browse through my code you better read this section as well.

Marking each URL

Note that twitter is an auto-reload page, i.e., newer tweets appear without entirely reloading the page. Since our plugin waits on the event ‘load’ and ‘DOMContentloaded’ we will _NEVER_ get a callback when newer tweets are retrieved. In order to handle such scenarios, I reapply our core logic of retrieve & modify the hash tag URLs. But this poses a unique problem of retrieving _ALL_ hashtags and not just _NEW_ ones. We don’t want this, do we? So, I modify every link & add a new attribute called ‘faroosearch’ (should be unique!). Whenever I process a hashtag link I add this new attribute and set it to true as shown below:

anchor.setAttribute('faroosearch', 'true');

Now, all I gotta do is to ignore any link with faroosearch set. Check the ‘Grab & extracts all the Hashtags’ section to note that a ‘not (@faroosearch)’ is applied! 🙂

Cache

Simple. Once queried and responses retrieved. I don’t need to re-query for the same Hash Tag. So, I used a lookup table with hashtags as the key. Every time I get a response I store it in the cache for later retrieval.

Queue

To avoid querying while a previous query for the same hashtag is in progress. So before I send out the query I check if I this is in progress, if so then I don’t query it.

Logging

I have used Firebug console to log. And, logging can be controlled by toggling the _debug value inside each class. The Firebug console log looks like this:

firebug log
firebug log

My Code

As always all my personal code is available on github. This particular one is available here.

References

The blogs & articles I referenced are plenty. After all, I had to build my knowledge from grounds up. The most important of them are linked here. Feel free to check them out:

1 Comment

  • 76Flossie

    August 8, 2017 at 21:17

    Hello admin, i must say you have very interesting posts here.
    Your website should go viral. You need initial traffic only.
    How to get it? Search for; Mertiso’s tips go viral

Post a Comment

%d bloggers like this: