I've spent the later part of my career studying search as well as text mining technologies. One reason why I've been so captivated by blogs is the significance with which their discussion flow and sentiment can provide unique and untapped insights. I'm also captivated by the social-networking aspect of blogs. Some have asked me about the "tech" backbone of BlogPulse. Essentially, BlogPulse employs multiple strategies to gather new blog postings every day from millions of weblogs which are extracted and transformed into XML and then full-text indexed along with some useful metadata. At this time, BlogPulse is aware of well between 4 - 5 million blogs, of which some 1.5 - 2 million blogs are "active", in that they've generated at least one new post in the past 30-60 days. The full-text search facility of BlogPulse searches only blog posts versus the full content of blog pages -- thus making it possible for users to find what other bloggers said about a topic without being inundated with useless and irrelevant results. We have applied our full advanced text mining technologies to analyze a large sample of blog posts everyday to determine trends in a completely automated setting:
-- "Phrase mining" using a background model of blog data identifies bursty phrases in blog postings, while "concept clustering" allows us to identify the key themes and stories of the day. Compared to others who have attempted to create so-called "word bursts", our technology is clearly superior because our toolkit includes many different phrase mining algorithms that are used in a pipeline to produce very readable phrases that indicate what is on top of the bloggers' mind on any given day.
-- "Entity extraction" technology spots people names in blog postings with over 95% accuracy. The system produces a list of "leaders" as well as "movers", which allows users to spot personalities that are increasing or decreasing in prominence in the blogosphere.
-- "Citation analysis" technology finds the top citations or links from blog postings every day. The top links are presented along with the contexts in which they were cited within blog postings.
We are only showing a very small set of our search, text mining and visualization capabilities on BlogPulse today. Stay tuned to our showcase on BlogPulse where we'll continually deliver new features and tools to get the most from the world of blogs.
Recent Comments