I was researching on social monitoring tools like Mention.com, Socialmention.com, and Nitrogram (http://nitrogr.am/) and noticed they leverage many different social media platforms to search for keywords. From an engineering perspective, do they simply just poll realtime information via the platform APIs, or do they cache/crawl the data in order to speed up searching and indexing? There are probably various levels of this, but wanted to just seek insights from a best-practices perspective.
It's hard to get a definite answer to this as these companies will not tell us how their algorithms work. Having said that: Based on my experience working with these APIs, it is totally possible to implement a real-time search like "socialmention.com" purely based on the APIs from Google, Twitter, Yahoo, or Facebook. On the other hand, crawling and saving all that data just for the eventuality that a customer might search for it is probably not economically viable and will also violate the Terms of Services of most of these APIs. Bottom line is: They are probably not crawling and caching.