wsu National Science Foundation


 
Deriving Emergent Semantics From Users' Browsing Paths:
- DV Sreenath

The World Wide Web has become the primary repository for all information: scientific, business and entertainment. Too much information is worse than not enough information because of the lack of any central authority to validate and authenticate the content. The authors of the Web pages are not aware of the ways their content is used. The innocent information published on the Web could be used for a malicious purpose. Some implications of our approach are that the author of a webpage cannot completely define that document's semantics and that semantics emerge through use. Contextual document semantics emerge through identification of various users' browsing paths though this multimedia collection.

In this paper, we present techniques that use multimedia information as part of this determination. This effort is an attempt to derive the semantics of web pages using the users' browsing paths. The effort includes analysis of the link information along with the actual users' navigation paths to derive emergent semantics of the Web pages that may not have been intended by the author of the web page. Each Web page has some meaning that can be derived from the static link analysis of the connected graph generated using the incoming and outgoing links. This is the approach used by most search engines. Our research effort is focused on the dynamic link analysis of the users' browsing paths.

The primary goal is to derive the emergent semantics of the page(s) using the Web browsing patterns of the users. The ultimate goal is to derive the semantics of the browsing path of the user. In case of a search engine, a user enters a query string and the search engine retrieves a list of URLs that match the query in the order or relevance. Our research can be considered as the reciprocal of a search engine: where the problem is to derive the semantics given the ordered sequence of Web pages visited by a user. Using an iterative process, we derive the semantic breakpoints of long browsing paths. This identifies short sub-paths with uniform semantics. Using the coherent uniform semantics exhibited by the sub-path, we attempt to derive the high level semantics from the Web activity of a user. With additional training data, specific application of this research leads to the terrorist trend detection. The Web usage log data is used for this analysis. Using WordNet database for high-level concepts, we attempt to derive high-level semantics. Preliminary research results show promising results.

A typical user these days has several windows open and has several browser sessions, several instant messaging windows and at least one email client. The event driven activity of the user can be analyzed only when every single activity of the user is monitored instead of just analyzing the Web usage logs as studied by most Web mining efforts. The context of the browsing activity is entirely dependent on the various other events occurring from the various applications opened by the user. Our research currently focuses on the Web browsing paths only. It is just a matter of sniffing other ports to process other network activity like instant messaging and emails. However, we are ignoring the various other events that could contribute to the activity like multiple computers at the user's desk, a telephone call from another user, an event/alarm from a calendar/palm-pilot etc.

  
Wayne State University
IGERT High Performance
Computing Applications