This site uses modern web standards that aren't supported by your browser. For best results, please upgrade to Google Chrome, Microsoft Edge, or Mozilla Firefox.


Patent Research: The Importance of Normalizing Patent Data

|

154075784In a previous article, we looked at the importance of normalizing publication, application and priority numbers and dates in bibliographic patent data, now let’s look at the issues database vendors and, hence, users face when working the various names, classes and other data elements found on the first page of the patent document.

Most patent authorities publish records in a single language only.  The main exceptions to this rule being publications issued by the WIPO and the EPO, which may publish the titles and abstracts in Arabic, Chinese, English, French, German, Japanese, Korean, Portuguese, Russian or  Spanish (WIPO), or English, French and German (EPO).  This presents searchers with the challenge of how to find documents not published in the user’s preferred language.  Databases in which the title and abstract fields are translated either by humans or high-quality machine translation can help.  As an alternative, users can seek out databases that use equivalent family members in order to search in their preferred language.

When considering bibliographic data, most people will think of Assignee/Applicants and Inventors, but this type of content often includes the name of the Law Firm prosecuting the application, sometimes the names of the Attorney(s) involved and occasionally the names of the patent office Examiners.  Moreover, during the lifetime of a patent application, it can be bought and sold (re-assigned) many times and, of course, might have been submitted by more than one assignee (joint assignees) and often by multiple inventors.  In order to improve search, retrieval and analysis, it is preferable to normalize as many of these names as possible.

Normalizing Assignee/Applicant names means applying a series of rules to the elements, such as correcting typographical errors, removing all punctuation, ignoring the legal entities (PLC, INC, BV, SA etc.) and then grouping organizations together so that, for example, a search or analysis of the Dutch technology company Philips, can be found regardless of whether it is published as Philips Electronics, Philips Gloeilampenfabrieken N.V. or Koninilijke Philips.

Numerous patent classification systems have been developed over the years, but today the majority of patent publications fall into either the International Patent Classification (IPC) or the Cooperative Patent Classification (CPC) schemes.  Previous systems like the USPTO’s Patent Classification (USPC) and the EPO’s European Classification (ECLA) have been merged into the CPC, and other national schemes are expected to follow suit.  Whereas the IPC follows a strictly hierarchical system and is therefore simpler in which to find related technologies, CPC is bigger than the IPC and more complex.  Is it important that users are able to search classes at the highest levels, Section and Class (e.g. A01), as well as more granularly, including Subclass, Group and Subgroup (e.g. A01B33/00). The ability to “truncate” IPC and CPC codes will allow users to retrieve all records in a specific or broad technological category.

These are just some of the issues database vendors and, hence, users face when working the various names, classes and other data elements found on the first page of the patent document.




New Call-to-action




Comments