ANI
21 Feb 2026, 01:01 GMT+10
Guwahati (Assam) [India], February 20 (ANI): Indian Institute of Technology Guwahati Researchers have developed a multilingual and scalable method to identify and correct Surface Name Errors (SNEs) in Wikipedia, thus helping improve information reliability for both human users and artificial intelligence systems.
Wikipedia is a free, multilingual online encyclopaedia created and maintained by a global community of volunteers through open collaboration.
In a press statement, IIT Guwahati stated that a surface name refers to the text used in Wikipedia articles to mention or link to another entity.
'A Surface Name Error (SNE) occurs when this text is incorrect. For example, using a misspelt word like 'Parise' to link to the page for Paris. A study conducted by the IIT Guwahati research team found that about 3% to 6% of all entity mentions in Wikipedia contain Surface Name Errors. While these errors may appear minor, they have significant implications,' said the press statement.
For human users, an incorrect surface name can reduce the perceived credibility and reliability of the information provided.
Similarly, many machine learning and deep learning models use Wikipedia as a core dataset. Such errors in surface names can negatively impact AI tasks and model performance.
To address this challenge, Associate Professor of the Department of Computer Science and Engineering Amit Awekar, along with then M.Tech student Anuj Khare (batch of 2022), built a method that uses mathematical frequency patterns, making it adaptable across languages.
The first step included scanning Wikipedia and converting every link into a quadruplet containing information on - The page where the link appears, the page it points to, the surface name used in the link, and the surrounding textual context.
In the next step, the developed method reviewed the surface name and considered it correct only if it appeared at least 10 times, and it accounted for at least 5% of all links pointing to a specific page.
Surface names that did not meet these criteria were flagged as potential errors.
In the final step, it categorised the detected errors into 'typing mistakes', such as 'Gawahati' instead of 'Guwahati', or 'entity span errors', where extra or incorrect words are mistakenly included in the link.
The researchers tested the developed method on eight languages, including English, Sanskrit, German, Italian, Urdu, Hindi, Marathi, and Gujarati, and found accurate outcomes.
Speaking about the real-world application of the developed method, Awekar said, 'This work shows us that we should not be trusting the data from the web blindly, both for human use and training AI models. Good data is the beginning of any good AI model and downstream application.'
To validate the developed method, the research team compared snapshots of English Wikipedia from 2018 and 2022 and found that about 30% of the errors predicted by the method had been corrected on Wikipedia over four years, confirming its accuracy.
Wikipedia is maintained by volunteers worldwide, and the developed method can help editors identify hidden typos and linking errors that might otherwise remain unnoticed for years.
To further validate the accuracy of this method, it is notable that the Wikipedia community has accepted more than 99% of the manual corrections suggested by the researchers.
By combining scalable data processing with practical validation through the Wikipedia community, the IIT Guwahati team has demonstrated an effective approach to strengthening digital knowledge systems. (ANI)
Get a daily dose of Professional Autos news through our daily email, its complimentary and keeps you fully up to date with world and business news as well.
Publish news of your business, community or sports group, personnel appointments, major event and more by submitting a news release to Professional Autos.
More InformationREDMOND, Washington: Microsoft said it remains on track to invest US$50 billion by the end of the decade to expand artificial intelligence...
NEW YORK CITY, New York: Wrestling and social media star Logan Paul has set a new world record for the auction price of a trading card,...
SAN FRANCISCO, California: In dining rooms from San Francisco's Chinatown to New York's Hell's Kitchen, Chinese American chefs are...
NEW YORK, New York - U.S. stocks were sold off Thursday, as U.S. President Donald Trump kept the world dangling, as to whether he intends...
NEW YORK CITY, New York: U.S. health regulators are set to examine the safety classification of dozens of processed food ingredients,...
NEW DELHI, India: India is hosting an artificial intelligence summit this week, drawing heads of state, senior officials and technology...
SINGAPORE: Facing mounting jet engine repair backlogs across the aviation industry, GE Aerospace is turning to automation and Lean...
New Delhi [India], February 20 (ANI): Union Information Technology (IT) Minister Ashwini Vaishnaw on Friday said India received over...
Guwahati (Assam) [India], February 20 (ANI): Indian Institute of Technology Guwahati Researchers have developed a multilingual and...
Shimla (Himachal Pradesh) [India], February 20 (ANI): The Himachal Pradesh PWD team, impressed by NHAI's slope protection work, studied...
Poonch (Jammu and Kashmir) [India], February 20 (ANI): Snow clearance operations on National Highway 701A (Peer Ki Gali) have been...
New Delhi [India], February 20 (ANI): Union Minister Ashwini Vaishnaw on Friday said India received 'a lot of' investment pledges at...
