WWW2007 Logo IW3C2 Logo

Proceedings of the Sixteenth International World Wide Web Conference
(WWW2007)
May 8-12, 2007
Banff, Alberta, CANADA



CHAIRS' MESSAGES

ORGANIZATION

SPONSORS

PAPERS

Track: Browsers and User Interfaces

Session: Personalization

Homepage Live: Automatic Block Tracing for Web Personalization         1
J. Han, D. Han (Shanghai Jiao-Tong University),
C.
Lin, H.-J. Zeng, Z. Chen (Microsoft Research Asia),
Y.
Yu (Shanghai Jiao-Tong University)

Open User Profiles for Adaptive News Systems: Help or Harm?         11
J.-w. Ahn, P. Brusilovsky, J. Grady, D. He, S. Y. Syn (University of Pittsburgh)

Investigating Behavioral Variability in Web Search         21
R. W. White (Microsoft Research), S. M. Drucker (Microsoft Live Laboratories)

Session: Smarter Browsing

CSurf: A Context-Driven Non-Visual Web-Browser         31
J. Mahmud, Y. Borodin, I. V. Ramakrishnan (Stony Brook University)

GeoTracker: Geospatial and Temporal RSS Navigation         41
Y.-F. Chen, G. Di Fabbrizio, D. Gibbon, R. Jana, S. Jora, B. Renger, B. Wei (AT&T Laboratories – Research)

Learning Information Intent via Observation         51
A. Tomasic, I. Simmons, J. Zimmerman (Carnegie Mellon University)

Track: Data Mining

Session: Identifying Structure in Web Pages

Page-level Template Detection via Isotonic Smoothing         61
D. Chakrabarti, R. Kumar (Yahoo! Research),
K.
Punera (University of Texas at Austin)

Towards Domain-Independent Information Extraction from Web Tables         71
W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krüpl, B. Pollak (Vienna University of Technology)

Web Object Retrieval         81
Z. Nie, Y. Ma, S. Shi, J.-R. Wen, W.-Y. Ma (Microsoft Research Asia)

Session: Mining Textual Data

Summarizing Email Conversations with Clue Words         91
G. Carenini, R. T. Ng, X. Zhou (University of British Columbia)

Organizing and Searching the World Wide Web of Facts — Step Two: Harnessing the Wisdom of the Crowds         101
M. Paşca (Google Inc.)

Do Not Crawl in the DUST: Different URLs with Similar Text         111
Z. Bar-Yossef (Technion and Google Haifa Engineering Center),
I.
Keidar (Technion),
U.
Schonfeld (University of California at Los Angeles)

Session: Similarity Search

A New Suffix Tree Similarity Measure for Document Clustering         121
H. Chim, X. Deng (City University of Hong Kong)

Scaling Up All Pairs Similarity Search         131
R. J. Bayardo (Google, Inc.), Y. Ma (University of California at Irvine), R. Srikant (Google, Inc.)

Detecting Near-Duplicates for Web Crawling         141
G. S. Manku, Jain (Google Inc.), A. D. Sarma (Stanford University)

Session: Predictive Modeling of Web Users 

Demographic Prediction Based on User's Browsing Behavior         151
J. Hu, H.-J. Zeng, H. Li, C. Niu, Z. Chen (Microsoft Research Asia)

Why We Search: Visualizing and Predicting User Behavior         161
E. Adar, D. S. Weld, B. N. Bershad, S. D. Gribble (University of Washington)

Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs         171
Q. Mei, X. Ling, M. Wondra (University of Illinois at Urbana-Champaign),
H. Su (Vanderbilt University),
CX. Zhai (University of Illinois at Urbana-Champaign)

Session: Mining in Social Networks 

Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography         181
L. Backstrom, (Cornell University),
C.
Dwork (Microsoft Research),
J.
Kleinberg (Cornell University)

Information Flow Modeling based on Diffusion Rate for Prediction and Ranking         191
X. Song, Y. Chi, K. Hino, B. L. Tseng (NEC Laboratories America)

NetProbe: A Fast& Scalable System for Fraud Detection in Online Auction Networks         201
S. Pandit, D. H. Chau, S. Wang, C. Faloutsos (Carnegie Mellon University)

Track: E* Applications

Session: E-Communities

The Complex Dynamics of Collaborative Tagging         211
H. Halpin (University of Edinburgh),
V.
Robu (CWI, Center for Mathematics and Computer Science),
H. Shepherd (Princeton University)

Expertise Networks in Online Communities: Structure and Algorithms         221
J. Zhang, M. S. Ackerman, L. Adamic (University of Michigan)

Internet-Scale Collection of Human-Reviewed Data         231
Q. Su, D. Pavlov, J.-H. Chow, W. C. Baker (Yahoo! Inc.)

Session: E-Commerce and E-Content 

DETECTIVES: DETEcting Coalition hiT Inflation attacks in adVertising nEtworks Streams         241
A. Metwally, D. Agrawal, A. El Abbadi (University of California at Santa Barbara)

Extraction and Search of Chemical Formulae in Text Documents on the Web         251
B. Sun, Q. Tan, P. Mitra, C. L. Giles (The Pennsylvania State University)

A Content-Driven Reputation System for the Wikipedia         261
B. T. Adler, L. de Alfaro (University of California at Santa Cruz)

Track: Industrial Practice & Experience

Google News Personalization: Scalable Online Collaborative Filtering         271
A. Das, M. Datar, A. Garg (Google Inc.),
S.
Rajaram (University of Illinois at Urbana-Champaign)

Exploring in the Weblog Space by Detecting Informative and Affective Articles         281
X. Ni, G.-R. Xue, X. Ling, Y. Yu (Shanghai Jiao-Tong University),
Q. Yang (Hong Kong University of Science & Technology)

Spam Double-Funnel: Connecting Web Spammers with Advertisers         291
Y.-M. Wang, M. Ma (Microsoft Research),
Y.
Niu, H. Chen (University of California at Davis)

Track: Performance and Scalability 

Session: Scalable Systems for Dynamic Content

GlobeTP: Template-Based Database Replication for Scalable Web Applications         301
T. Groothuyse, S. Sivasubramanian, G. Pierre (Vrije Universiteit)

Consistency-preserving Caching of Dynamic Database Content         311
N. Tolia, M. Satyanarayanan (Carnegie Mellon University)

Optimized Query Planning of Continuous Aggregation Queries in Dynamic Data Dissemination Networks         321
R. Gupta (IBM India Research Laboratory),
K.
Ramamritham (Indian Institute of Technology)

Session: Performance Engineering of Web Applications

A Scalable Application Placement Controller for Enterprise Data Centers         331
C. Tang, M. Steinder, M. Spreitzer, G. Pacifici (IBM T.J. Watson Research Center)

A Unified Platform for Data Driven Web Applications with Automatic Client-Server Partitioning         341
F. Yang, N. Gupta, N. Gerner, X. Qi, A. Demers, J. Gehrke (Cornell University),
J.
Shanmugasundaram (Yahoo!)

MyXDNS: A Request Routing DNS Server with Decoupled Server Selection         351
H. A. Alzoubi, M. Rabinovich (Case Western Reserve University),
O. Spatscheck (AT&T Research Laboratories)

Track: Pervasive Web and Mobility 

Robust Web Page Segmentation for Mobile Terminal Using Content-Distances and Page Layout Information         361
G. Hattori, K. Hoashi, K. Matsumoto, F. Sugaya (KDDI R&D Laboratories),

PRIVÉ: Anonymous Location-Based Queries in Distributed Mobile Systems         371
G. Ghinita, P. Kalnis (National University of Singapore),
S.
Skiadopoulos (University of Peloponnese)

A Mobile Application Framework for the Geospatial Web         381
R. Simon, P. Fröhlich (Telecommunications Research Center Vienna)

Track: Search 

Session: Search Potpourri

Navigation-Aided Retrieval         391
S. Pandit (Carnegie Mellon University),
C.
Olston (Yahoo! Research)

Efficient Search Engine Measurements         401
Z. Bar-Yossef, M. Gurevich (Technion - Israel Institute of Technology)

Efficient Search in Large Textual Collections with Redundancy         411
J. Zhang, T. Suel (Polytechnic University)

Session: Crawlers 

The Discoverability of the Web         421
A. Dasgupta, A. Ghosh, R. Kumar, C. Olston, S. Pandey, A. Tomkins (Yahoo! Research)

Combining Classifiers to Identify Online Databases         431
L. Barbosa, J. Freire (University of Utah)

An Adaptive Crawler for Locating Hidden-Web Entry Points         441
L. Barbosa, J. Freire (University of Utah)

Session: Web Graphs 

Random Web Crawls         451
T. Bennouas (Criteo R&D),
F.
de Montgolfier (LIAFA - Université Paris 7)

Extraction and Classification of Dense Communities in the Web         461
Y. Dourisboure, F. Geraci, M. Pellegrini (Istituto di Informatica e Telematica)

Web Projections: Learning from Contextual Subgraphs of the Web         471
J. Leskovec (Carnegie Mellon University),
S.
Dumais, E. Horvitz (Microsoft Research)

Session: Search Quality and Precision 

Supervised Rank Aggregation         481
Y.-T. Liu (Microsoft Research Asia & Beijing Jiaotong University),
T.-Y. Liu (Microsoft Research Asia),
T. Qin (Microsoft Research Asia & Tsinghua University),
Z.-M.
Ma (Chinese Academy of Science),
H.
Li (Microsoft Research Asia)

Navigating the Intranet with High Precision         491
H. Zhu (IBM Almaden Research Center),
A.
Löser (SAP Research CEC Dresden),
S.
Raghavan, S. Vaithyanathan (IBM Almaden Research Center)

Optimizing Web Search Using Social Annotations         501
S. Bao, X. Wu (Shanghai JiaoTong University),
B.
Fei (IBM China Research Laboratory),
G.
Xue (Shanghai JiaoTong University),
Z.
Su (IBM China Research Laboratory),
Y.
Yu (Shanghai JiaoTong University)

Session: Advertisements & Click Estimates

Robust Methodologies for Modeling Web Click Distributions         511
K. Ali, M. Scarr (Yahoo!)

Predicting Clicks: Estimating the Click-Through Rate for New Ads         521
M. Richardson (Microsoft Research),
E.
Dominowska (Microsoft),
R.
Ragno (Microsoft Research)

Dynamics of Bid Optimization in Online Advertisement Auctions         531
C. Borgs, J. Chayes (Microsoft Research),
O.
Etesami (University of California at Berkeley),
N.
Immorlica, K. A. Jain (Microsoft Research),
M.
Mahdian (Yahoo! Research)

Session: Knowledge Discovery 

Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories         541
J. Liu, E. Wagner, L. Birnbaum (Northwestern University)

Answering Bounded Continuous Search Queries in the World Wide Web         551
D. Kukulenz (Institute of Information Systems),
A.
Ntoulas (Microsoft Search Laboratories)

Answering Relationship Queries on the Web         561
G. Luo, C. Tan, Y.-l. Tian (IBM T.J. Watson Research Center)

Session: Personalization 

Dynamic Personalized Pagerank in Entity-Relation Graphs         571
S. Chakrabarti (IIT Bombay)

A Large-scale Evaluation and Analysis of Personalized Search Strategies         581
Z. Dou (Nankai University),
R.
Song, J.-R. Wen (Microsoft Research Asia)

Privacy-Enhancing Personalized Web Search         591
Y. Xu (Simon Fraser University),
B.
Zhang, Z. Chen (Microsoft Research Asia),
K.
Wang (Simon Fraser University)

Track: Security, Privacy, Reliability, & Ethics

Session: Defending Against Emerging Threats 

Defeating Script Injection Attacks with Browser-Enforced Embedded Policies         601
T. Jim (AT&T Laboratories – Research),
N.
Swamy, M. Hicks (University of Maryland)

Subspace: Secure Cross-Domain Communication for Web Mashups         611
C. Jackson (Stanford University),
H. J.
Wang (Microsoft Research)

Exposing Private Information by Timing Web Applications         621
A. Bortz, D. Boneh (Stanford University), P. Nandy

On Anonymizing Query Logs via Token-based Hashing         629
R. Kumar, J. Novak, B. Pang, A. Tomkins (Yahoo! Research)

Session: Passwords and Phishing 

CANTINA: A Content-Based Approach to Detecting Phishing Web Sites         639
Y. Zhang (University of Pittsburgh),
J.
Hong, L. Cranor (Carnegie Mellon University)

Learning to Detect Phishing Emails         649
I. Fette, N. Sadeh, A. Tomasic (Carnegie Mellon Univ.)

A Large-Scale Study of Web Password Habits         657
D. Florêncio, C. Herley (Microsoft Research)

Session: Access Control and Trust on the Web

A Fault Model and Mutation Testing of Access Control Policies         667
E. Martin, T. Xie (North Carolina State University)

Analyzing Web Access Control Policies         677
V. Kolovski, J. Hendler (University of Maryland),
B.
Parsia (University of Manchester)

Compiling Cryptographic Protocols for Deployment on the Web         687
J. McCarthy (Brown University),
J. D.
Guttman, J. D. Ramsdell (MITRE Corporation),
S. Krishnamurthi (Brown University)

Track: Semantic Web 

Session: Ontologies 

YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia         697
F. M. Suchanek, G. Kasneci, G. Weikum (Max-Planck-Institut)

Ontology Summarization Based on RDF Sentence Graph         707
X. Zhang, G. Cheng, Y. Qu (Southeast University)

Just the Right Amount: Extracting Modules from Ontologies         717
B. C. Grau, I. Horrocks, Y. Kazakov, U. Sattler (The University of Manchester)

Session: Applications

Toward Expressive Syndication on the Web         727
C. Halaschek-Wiener, J. Hendler (University of Maryland)

Exhibit: Lightweight Structured Data Publishing         737
D. F. Huynh, D. R. Karger, R. C. Miller (Massachusetts Institute of Technology)

Explorations in the Use of Semantic Web Technologies for Product Information Management         747
J.-S. Brunner, L. Ma, C. Wang, L. Zhang (IBM China Research Laboratory),
D. C.
Wolfson (IBM Software Group),
Y.
Pan (IBM China Research Laboratory),
K.
Srinivas (IBM T.J. Watson Research Center)

Session: Similarity and Extraction

Measuring Semantic Similarity between Words Using Web Search Engines         757
D. Bollegala (The University of Tokyo),
Y.
Matsuo (National Institute of Advanced Industrial Science & Technology),
M. Ishizuka (The University of Tokyo)

Using Google Distance to Weight Approximate Ontology Matches         767
R. Gligorov, Z. Aleksovski, W. ten Kate (Philips Research),
F. van Harmelen (Vrije Universiteit)

Hierarchical, Perceptron-like Learning for Ontology-Based Information Extraction         777
Y. Li, K. Bontcheva (University of Sheffield)

Session: Query Languages and DBs 

From SPARQL to Rules (and back)         787
A. Polleres (Universidad Rey Juan Carlos)

SPARQ2L: Towards Support for Subgraph Extraction Queries in RDF Databases         797
K. Anyanwu, A. Maduko (University of Georgia),
A. Sheth (Wright State University)

Bridging the Gap Between OWL and Relational Databases         807
B. Motik, I. Horrocks, U. Sattler (University of Manchester)

ActiveRDF: Object-Oriented Semantic Web Programming         817
E. Oren, R. Delbru, S. Gerke, A. Haller, S. Decker (National University of Ireland)

Session: Semantic Web and Web 2.0 

The Two Cultures: Mashing up Web 2.0 and the Semantic Web         825
A. Ankolekar, M. Krötzsch, T. Tran, D. Vrandecic (Universität Karlsruhe)

Analysis of Topological Characteristics of Huge Online Social Networking Services         835
Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, H. Jeong (KAIST)

P-TAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web         845
P.-A. Chirita, S. Costache (University of Hannover),
S.
Handschuh (National University of Ireland),
W.
Nejdl (University of Hannover)

Track: Technology for Developing Regions 

Session: Communication in Developing Regions

Connecting the 'Bottom of the Pyramid' – An Exploratory Case Study of India's Rural Communication Environment         855
S. Seshagiri, A. Sagar, D. Joshi (Motorola India Research Laboratories)

Communication as Information-Seeking: The Case for Mobile Social Software for Developing Regions         863
B. E. Kolko, E. J. Rose, E. Johnson (University of Washington)

Optimal Audio-Visual Representations for Illiterate Users of Computers         873
I. Medhi, A. Prasad, K. Toyama (Microsoft Research Laboratories India)

Session: Networking Issues in the Web

Identifying and Discriminating Between Web and Peer-to-Peer Traffic in the Network Core         883
J. Erman, A. Mahanti, M. Arlitt, C. Williamson (University of Calgary)

Long Distance Wireless Mesh Network Planning: Problem Formulation and Solution         893
S. Sen, B. Raman (IIT Kanpur)

Is High-Quality VoD Feasible using P2P Swarming?         903
S. Annapureddy (New York University),
S.
Guha (Cornell University),
C.
Gkantsidis, D. Gunawardena (Microsoft Research),
P.
Rodriguez (Telefonica Research)

Track: Web Engineering

Session: Web Modeling

Turning Portlets into Services: The Consumer Profile         913
O. Díaz, S. Trujillo, S. Pérez (University of the Basque Country)

A Framework for Rapid Integration of Presentation Components         923
J. Yu, B. Benatallah, R. Saint-Paul (University of New South Wales),
F. Casati (University of Trento),
F.
Daniel, M. Matera (Politecnico di Milano)

Integrating Value-based Requirement Engineering Models to WebML using VIP Business Modeling Framework         933
F. Azam, Z. Li, R. Ahmad (Beijing University of Aeronautics & Astronautics)

Session: End-User Perspectives and Measurement in Web Engineering

Towards Effective Browsing of Large Scale Social Annotations         943
R. Li, S. Bao (Shanghai JiaoTong University),
B.
Fei, Z. Su (IBM China Research Laboratory),
Y. Yu (Shanghai JiaoTong University)

Supporting End-Users in the Creation of Dependable Web Clips         953
S. Lingam, S. Elbaum (University of Nebraska-Lincoln)

Effort Estimation: How Valuable is it for a Web Company to Use a Cross-company Data Set, Compared to Using Its Own Single-company Data Set?         963
E. Mendes (The University of Auckland),
S.
Di Martino, F. Ferrucci, C. Gravino (Univ. di Salerno)

Track: Web Services

Session: Orchestration and Choreography 

Towards the Theoretical Foundation of Choreography         973
Z. Qiu, X. Zhao, C. Cai, H. Yang (Peking University)

Introduction and Evaluation of Martlet, a Scientific Workflow Language for Abstracted Parallelisation         983
D. Goodman (Oxford University Computing Laboratory)

Semi-Automated Adaptation of Service Interactions         993
H. R. Motahari Nezhad, B. Benatallah (University of New South Wales),
A. Martens, F. Curbera (IBM T.J. Watson Research Center),
F. Casati (University of Trento)

Session: SLAs and QoS

Reliable QoS Monitoring Based on Client Feedback         1003
R. Jurca (Ecole Polytechnique Fédérale de Lausanne),
W.
Binder (University of Lugano),
B.
Faltings (Ecole Polytechnique Fédérale de Lausanne)

Preference-based Selection of Highly Configurable Web Services         1013
S. Lamparter, A. Ankolekar, R. Studer (University of Karlsruhe),
S. Grimm (FZI Research Center for Information Technologies)

Speeding up Adaptation of Web Service Compositions Using Expiration Times         1023
J. Harney, P. Doshi (University of Georgia)

DIANE - An Integrated Approach to Automated Service Discovery, Matchmaking and Composition         1033
U. Küster, B. König-Ries (Friedrich-Schiller University Jena),
M. Stern, M. Klein (University of Karlsruhe)

Track: XML and Web Data

Session: Querying & Transforming XML

Multiway SLCA-based Keyword Search in XML Data         1043
C. Sun, C.-Y. Chan, A. K. Goenka (National University of Singapore)

Visibly Pushdown Automata for Streaming XML         1053
V. Kumar, P. Madhusudan, M. Viswanathan (University of Illinois at Urbana-Champaign)

Mapping-Driven XML Transformation         1063
H. Jiang, H. Ho, L. Popa (IBM Almaden Research Center),
W.-S. Han (Kyungpook National University),

Session: Parsing, Normalizing, & Storing XML

Querying and Maintaining a Compact XML Storage         1073
R. K. Wong, F. Lam, W. M. Shui (University of New South Wales & Green Pea Software)

XML Design for Relational Storage         1083
S. Kolahi (University of Toronto),
L.
Libkin (University of Edinburgh)

A High-Performance Interpretive Approach to Schema-Directed Parsing         1093
M. Matsa, E. Perkins, A. Heifets, M. Gaitatzes Kostoulas, D. Silva, N. Mendelsohn, M. Leger (IBM Corporation)

POSTERS

Topic: Developing Regions 

Collaborative ICT for Indian Business Clusters         1115
S. Roy, S. Biswas (Motorola India Research Laboratories)

Delay Tolerant Applications for Low Bandwidth and Intermittently Connected Users: the aAQUA Experience         1117
S. Sahni, K. Ramamritham (Indian Institute of Technology Bombay)

Topic: Search

A Cautious Surfer for PageRank         1119
L. Nie, B. Wu, B. D. Davison (Lehigh University)

A Clustering Method For Web Data with Multi-Type Interrelated Components         1121
L. Bolelli, S. Ertekin, D. Zhou, C. L. Giles (The Pennsylvania State University),

A Large-Scale Study of Robots.txt         1123
Y. Sun, Z. Zhuang, C. L. Giles (The Pennsylvania State University)

A Link-Based Ranking Scheme for Focused Search         1125
T. Abou-Assaleh, Y. Miao, T. Das, P. O'Brien ,W. Gao , Z. Zhen (GenieKnows.com)

A Link Classification Based Approach to Website Topic Hierarchy Generation         1127
N. Liu, C. C. Yang (The Chinese University of Hong Kong)

A Search-based Chinese Word Segmentation Method         1129
X.-J. Wang (IBM China Research Center),
W.
Liu (Huazhong University of Science & Technology),
Y. Qin (IBM China Research Center)

Anchor-based Proximity Measures         1131
A. Joshi, R. Kumar, B. Reed, A. Tomkins (Yahoo! Research)

Automatic Search Engine Performance Evaluation with Click-through Data Analysis         1133
Y. Liu, Y. Fu, M. Zhang, S. Ma (Tsinghua University),
L.
Ru (Sohu Incorporation)

Automatic Searching of Tables in Digital Libraries         1135
Y. Liu, K. Bai, P. Mitra, C. L. Giles (The Pennsylvania State University)

Bayesian Network based Sentence Retrieval Model         1137
K. Cai, J. Bu, C. Chen, K. Liu, W. Chen (Zhejiang University)

Brand Awareness and the Evaluation of Search Results         1139
B. J. Jansen, M. Zhang, Y. Zhang (The Pennsylvania State University)

Causal Relation of Queries from Temporal Logs         1141
Y. Sun (Peking University),
N.
Liu (Microsoft Research Asia),
K.
Xie (Peking University),
S.
Yan (University of Illinois at Urbana-Champaign),
B.
Zhang, Z. Chen (Microsoft Research Asia)

Classifying Web Sites         1143
C. Lindemann, L. Littig (University of Leipzig)

Comparing Apples and Oranges: Normalized PageRank for Evolving Graphs         1145
K. Berberich, S. Bedathur, G. Weikum (Max-Planck Institute for Informatics),
M. Vazirgiannis (INRIA/FUTURS)

Designing Efficient Sampling Techniques to Detect Webpage Updates         1147
Q. Tan, Z. Zhuang, P. Mitra, C. L. Giles (The Pennsylvania State University)

Determining the User Intent of Web Search Engine Queries         1149
B. J. Jansen, D. L. Booth (The Pennsylvania State University),
A. Spink (Queensland University of Technology)

EPCI: Extracting Potentially Copyright Infringement Texts from the Web         1151
T. Tashiro, T. Ueda, T. Hori, Y. Hirate, H. Yamana (Waseda University & National Institute of Informatics)

Efficient Training on Biased Minimax Probability Machine for Imbalanced Text Classification         1153
X. Peng, I. King (The Chinese University of Hong Kong)

Electoral Search Using the VerkiezingsKijker: An Experience Report         1155
V. Jijkoun, M. Marx, M. de Rijke, F. van Waveren (University of Amsterdam)

Exploration of Query Context for Information Retrieval         1157
K. Cai, C. Chen, J. Bu, P. Huang, Z. Kang (Zhejiang University)

First-order Focused Crawling         1159
Q. Xu, W. Zuo (Jilin University)

Academic Web Search Engine — Generating a Survey Automatically         1161
Y. Wang, Z. Geng, S. Huang, X. Wang, A. Zhou (Fudan University)

Generative Models for Name Disambiguation         1163
Y. Song, J. Huang, I. G. Councill, J. Li, C. L. Giles (The Pennsylvania State University)

GigaHash: Scalable Minimal Perfect Hashing for Billions of URLs         1165
K. Chellapilla, A. Mityagin, D. Charles (Microsoft Live Laboratories)

How NAGA Uncoils: Searching with Entities and Relations         1167
G. Kasneci, F. M. Suchanek, M. Ramanath, G. Weikum (Max-Planck-Institut)

Identifying Ambiguous Queries in Web Search         1169
R. Song (Shanghai Jiao Tong University & Microsoft Research Asia),
Z. Luo (Fudan University),
J.-R.
Wen (Microsoft Research Asia),
Y.
Yu (Shanghai Jiao Tong University),
H.-W.
Hong (Microsoft Research Asia)

Web Page Classification with Heterogeneous Data Fusion         1171
Z. Xu, I. King, M. R. Lyu (The Chinese University of Hong Kong)

Learning Information Diffusion Process on the Web         1173
X. Wan, J. Yang (Peking University)

MedSearch: A Specialized Search Engine for Medical Information         1175
G. Luo, C. Tang, H. Yang (IBM T.J. Watson Research Center),
X. Wei (University of Massachusetts at Amherst)

Mining Contiguous Sequential Patterns from Web Logs         1177
J. Chen (Queens College, CUNY),
T.
Cook (City University of New York)

Monitoring the Evolution of Cached Content in Google and MSN         1179
I. Anagnostopoulos (University of the Aegean)

Multi-factor Clustering for a Marketplace Search Interface         1181
N. Sundaresan, K. Ganesan, R. Grandhi (eBay Research Laboratories),

On Ranking Techniques for Desktop Search         1183
S. Cohen, C. Domshlak, N. Zwerdling (Technion—Israel Institute of Technology)

Query-Driven Indexing for Peer-to-Peer Text Retrieval         1185
G. Skobeltsyn, T. Luu (Ecole Polytechnique Fédérale de Lausanne),
I. P. Žarko (University of Zagreb),
M.
Rajman, K. Aberer (Ecole Polytechnique Fédérale de Lausanne)

Query Topic Detection for Reformulation         1187
X. He (Peking University),
J. Yan (Microsoft Research Asia),
J. Ma (Peking University),
N. Liu, Z. Chen (Microsoft Research Asia)

Review Spam Detection         1189
N. Jindal, B. Liu (University of Illinois at Chicago)

SCAN: A Small-World Structured P2P Overlay for Multi-Dimensional Queries         1191
X. Sun (Graduate School of Chinese Academy of Sciences)

SRing: A Structured Non DHT P2P Overlay Supporting String Range Queries         1193
X. Sun, X. Chen (Graduate School of Chinese Academy of Sciences)

Search Engine Retrieval of Changing Information         1195
Y. S. Kim, B. H. Kang (University of Tasmania),
P.
Compton (The University of New South Wales),
H.
Motoda (Osaka University)

Search Engines and Their Public Interfaces: Which APIs are the Most Synchronized?         1197
F. McCown, M. L. Nelson (Old Dominion University)

Spam and Popularity Ratings for Combating Link Spam         1199
M. Dalal (LDI)

Summary Attributes and Perceived Search Quality         1201
D. E. Rose (A9.com Inc.),
D. Orr, R. G. P. Kantamneni (Yahoo! Inc.)

Tag Clouds for Summarizing Web Search Results         1203
B. Y.-L. Kuo (The University of British Columbia),
T.
Hentrich (The University of British Columbia & Simon Fraser University),
B. M.
Good, M. D. Wilkinson (The University of British Columbia)

Towards Efficient Dominant Relationship Exploration of the Product Items on the Web         1205
Z. Yang, L. Li, B. Wang, M. Kitsuregawa (University of Tokyo)

Understanding Web Search via a Learning Paradigm         1207
B. J. Jansen, B. Smith, D. L. Booth (The Pennsylvania State University)

Using d-gap Patterns for Index Compression         1209
J. Chen (Queens College, CUNY),
T.
Cook (City University of New York)

Utility Analysis for Topically Biased PageRank         1211
C. Kohlschütter, P.-A. Chirita, W. Nejdl (L3S/University of Hannover)

Sliding Window Technique for the Web Log Analysis         1213
N. Buzikashvili (Russian Academy of Science)

A Password Stretching Method using User Specific Salts         1215
C. Lee (INITECH), H. Lee (Korea University)

Simple Authentication for the Web         1217
T. W. van der Horst, K. E. Seamons (Brigham Young University)

Topic: Semantic Web

A Management and Performance Framework for Semantic Web Servers         1219
M. Mesarina, V. K. Srinivasmurthy, N. Lyons, C. Sayers (Hewlett-Packard)

A Probabilistic Semantic Approach for Discovering Web Services         1221
J. Ma (Victoria University), J. Cao (La Trobe University), Y. Zhang (Victoria University)

Acquiring Ontological Knowledge from Query Logs         1223
S. Sekine (New York University), H. Suzuki (Microsoft Research)

Altering Document Term Vectors for Classification - Ontologies as Expectations of Co-occurrence         1225
M. Nagarajan, A. Sheth (Wright State University),
M.
Aguilera, K. Keeton, A. Merchant, M. Uysal (Hewlett-Packard Laboratories)

Building and Managing Personalized Semantic Portals         1227
M. Şah, W. Hall (University of Southampton)

Deriving Knowledge from Figures for Digital Libraries         1229
X. Lu, J. Z. Wang, P. Mitra, C. L. Giles (The Pennsylvania State University)

Development of a Semantic Web Based Mobile Local Search System         1231
J.-S. Jeon, G.-J. Lee (KTF R&D Group)

Estimating the Cardinality of RDF Graph Patterns         1233
A. Maduko, K. Anyanwu (University of Georgia),
A.
Sheth (Wright State University),
P.
Schliekelman (University of Georgia)

Extending WebML towards Semantic Web         1235
F. M. Facca, M. Brambilla (Politecnico di Milano)

Image Annotation by Hierarchical Mapping of Features         1237
Q. Zhao, P. Mitra, C. L. Giles (The Pennsylvania State University)

Integrating Web Directories by Learning their Structures         1239
C. C. Yang, J. Lin (The Chinese University of Hong Kong)

Learning Ontologies to Improve the Quality of Automatic Web Service Matching         1241
H. Guo (Stony Brook University),
A.
Ivan, R. Akkiraju, R. Goodwin (IBM T.J. Watson Research Center)

Ontology Engineering Using Volunteer Labor         1243
B. M. Good, M. D. Wilkinson (The University of British Columbia)

Semantic Personalization of Web Portal Contents         1245
C. Tziviskou, M. Brambilla (Politecnico di Milano)

The Largest Scholarly Semantic Network… Ever.         1247
J. Bollen, M. A. Rodriguez, H. Van de Sompel, L. L. Balakireva, A. Hagberg (Los Alamos National Laboratory)

Topic: Services

A Kernel based Structure Matching for Web Services Search         1249
J. Yu, S. Guo, H. Su, H. Zhang, K. Xu (Beihang University)

A Novel Collaborative Filtering-Based Framework for Personalized Services in M-Commerce         1251
Q. Li, C. Wang, G. Geng, R. Dai (Chinese Academy of Sciences)

Towards Service Pool Based Approach for Services Discovery and Subscription         1253
X. Liu, L. Zhou, G. Huang, H. Mei (Peking University)

Crawling Multiple UDDI Business Registries         1255
E. Al-Masri, Q. H. Mahmoud (University of Guelph)

Discovering the Best Web Service         1257
E. Al-Masri, Q. H. Mahmoud (University of Guelph)

Mobile Shopping Assistant: Integration of Mobile Applications and Web Services         1259
H. Wu, Y. Natchetoi (SAP Laboratories)

On Automated Composition for Web Services         1261
Z. Shen, J. Su (University of California at Santa Barbara)

Providing Session Management as Core Business Service         1263
I. Ari, J. Li, R. Ghosh, M. Dekhil (Hewlett-Packard Laboratories)

Towards Automating Regression Test Selection for Web Services         1265
M. Ruth, S. Tu (University of New Orleans)

Towards Environment Generated Media: Object-participation-type Weblog in Home Sensor Network         1267
T. Maekaw, Y. Yanagisawa, T. Okadome (NTT Communication Science Laboratories)

Topic: Social Networks

BlogScope: Spatio-temporal Analysis of the Blogosphere         1269
N. Bansal, N. Koudas (University of Toronto)

EOS: Expertise Oriented Search Using Social Networks         1271
J. Li, J. Tang, J. Zhang (Tsinghua University),
Q.
Luo, Y. Liu (The Hong Kong University of Science & Technology),
M. Hong (Tsinghua University)

Exploring Social Dynamics in Online Media Sharing         1273
M. Halvey, M. T. Keane (University College Dublin)

Finding Community Structure in Mega-scale Social Networks [Extended Abstract]         1275
K. Wakita, T. Tsurumi (Tokyo Institute of Technology)

Life is Sharable: Mechanisms to Support and Sustain Blogging Life Experience         1277
Y.-M. Cheng (Tatung University),
T.-C.
Chou (Academia Sinica),
W.
Yu (Queen's University Belfast),
L.-C.
Chen, C.-L. Yeh (Tatung University),
M.-C.
Chen (Academia Sinica)

Measuring Credibility of Users in an E-learning Environment         1279
W. Wei (The Royal Institute of Technology),
J.
Lee, I. King (The Chinese University of Hong Kong)

Modeling User Behavior in Recommender Systems based on Maximum Entropy         1281
T. Iwata, K. Saito, T. Yamada (NTT Communication Science Laboratories)

Parallel Crawling for Online Social Networks         1283
D. H. Chau, S. Pandit, S. Wang, C. Faloutsos (Carnegie Mellon University)

Personalized Social & Real-Time Collaborative Search         1285
M. Dalal (LDI)

Towards Extracting Flickr Tag Semantics         1287
T. Rattenbury, N. Good, M. Naaman (Yahoo! Research Berkeley)

Topic: Systems

A No-Frills Architecture for Lightweight Answer Retrieval         1289
M. Paşca (Google Inc.)

AutoPerf: An Automated Load Generator and Performance Measurement Tool for Multi-tier Software Systems         1291
S. Shirodkar, V. Apte (ITT Bombay)

Construction by Linking: The Linkbase Method         1293
J. Meinecke, F. Majer (University of Karlsruhe),
M.
Gaedke (Chemnitz University of Technology),

Image Collector III: A Web Image-Gathering System with Bag-of-Keypoints         1295
K. Yanai (The University of Electro-Communications)

Mirror Site Maintenance Based on Evolution Associations of Web Directories         1297
L. Chen (L3S/University of Hannover),
S.
Bhowmick (Nanyang Technological University),
W.
Nejdl (L3S/University of Hannover)

On Building Graphs of Documents with Artificial Ants         1299
H. Azzag, J. Lavergne, G. Venturini (Laboratoire d'Informatique de I'Université de Tours),
C.
Guinot (CE.R.I.E.S.)

Towards a Scalable Search and Query Engine for the Web         1301
A. Hogan, A. Harth, J. Umbrich, S. Decker (National University of Ireland)

Web4CE: Accessing Web-based Applications on Consumer Devices         1303
W. Dee, P. Shrubsole (Philips Research Laboratories)

Web Mashup Scripting Language         1305
M. Sabbouh, J. Higginson, S. Semy, D. Gagne (The MITRE Corporation)

Topic: User Interfaces & Accessibility

A Browser for a Public-Domain SpeechWeb         1307
R. A. Frost, X. Ma, Y. Shi (University of Windsor)

A Novel Clustering-based RSS Aggregator         1309
X. Li (Peking University),
J. Yan (Microsoft Research Asia),
Z. Deng (Peking University),
L. Ji (Microsoft Research Asia),
W. Fan (Virginia Polytechnic Institute & State University),
B. Zhang, Z. Chen (Microsoft Research Asia)

Adaptive Faceted Browser for Navigation in Open Information Spaces         1311
M. Tvarožek, M. Bieliková (Slovak University of Technology)

An Assessment of Tag Presentation Techniques         1313
M. Halvey, M. T. Keane (School of Computer Science & Informatics, UCD)

An Information State-Based Dialogue Manager for Making Voice Web Smarter         1315
M. Gatius, M. González, E. Comelles (Technical University of Catalonia)

Behavior Based Web Page Evaluation         1317
G. Velayathan, S. Yamada (National Institute of Informatics)

Generating Efficient Labels to Facilitate Web Accessibility         1319
L. Spalteholz, K. F. Li, N. Livingston (University of Victoria)

Generation, Documentation and Presentation of Mathematical Equations and Symbolic Scientific Expressions Using Pure HTML and CSS         1321
K. Alabi (State University of New York at Stony Brook)

GeoTV: Navigating Geocoded RSS to Create an IPTV Experience         1323
Y.-F. Chen, G. Di Fabbrizio, D. Gibbon, R. Jana, S. Jora, B. Renger, B. Wei (AT&T Laboratories – Research)

Summarization of Online Image Collections via Implicit Feedback         1325
S. Ahern, S. King, M. Naaman, R. Nair (Yahoo! Research Berkeley)

System for Reminding a User of Information Obtained through a Web Browsing Experience         1327
T. Morita, T. Hidaka, A. Tanaka, Y. Kato (NTT Corporation)

The ScratchPad: Sensemaking Support for the Web         1329
D. Gotz (IBM T.J. Watson Research Center)

Towards Multi-granularity Multi-facet E-Book Retrieval         1331
C. Huang (Chinese Academy of Sciences),
Y.
Tian (Chinese Academy of Sciences & Peking University),
Z. Zhou (Chinese Academy of Sciences),
T. Huang (Chinese Academy of Sciences & Peking University)

Visualizing Structural Patterns in Web Collections         1333
M. S. Ali, M. P. Consens, F. Rizzolo (University of Toronto)

Topic: XML

Adaptive Record Extraction From Web Pages         1335
J. Park, D. Barbosa (University of Calgary)

Exploit Sequencing Views in Semantic Cache to Accelerate Xpath Query Evaluation         1337
J. Feng, N. Ta, Y. Zhang, G. Li (Tsinghua University)

Extensible Schema Documentation with XSLT 2.0         1339
F. Michel (ETH Zürich),
E.
Wilde (University of California at Berkeley)

Preserving XML Queries during Schema Evolution         1341
M. M. Moro (University of California at Riverside),
S. Malaika (IBM Silicon Valley Laboratory),
L. Lim (IBM T.J. Watson Research Center)

SPath: A Path Language for XML Schema         1343
E. Wilde (University of Calfornia at Berkeley),
F. Michel (ETH Zürich)

The Use of XML to Express a Historical Knowledge Base         1345
K. T. Nakahira, M. Matsui, Y. Mikami (Nagaoka University of Technology)

U-REST: An Unsupervised Record Extraction SysTem         1347
Y. K. Shen, D. R. Karger (Massachusetts Institute of Technology)

XML-Based Multimodal Interaction Framework for Contact Center Applications         1349
N. Anisimov, B. Galvin, H. Ristock (Genesys Telecommunication Laboratories)

XML-Based XML Schema Access         1351
E. Wilde (University of California at Berkeley), F. Michel (ETH Zürich)

AUTHOR INDEX         1353