About: Online recruitment industry holds large amount of user-generated content in the form of job postings, resumes etc. This content finds its way in the knowledge bases (KB) causing duplicate and non-standard representations of entities (like company names, institute names, designations, skills etc.) These non-standard entity representations impact various applications such as search, recommendations and information retrieval. Therefore, KB canonicalization i.e, mapping multiple references of same entities into unique clusters is imperative for online recruitment platforms. Research suggests various approaches that use enriched semantic context or external context (from sources like Freebase) to perform KB Canonicalization. In fields where such external sources of context do not exist the problem remains challenging. To address these challenges, we propose a novel deep Siamese architecture with character-based attention and word embeddings that (a) estimates pairwise similarity between all entity mentions, and (b) then uses these similarity (scores) to create canonical clusters representing unique entity in the KB. Our experiments on recruitment domain dataset comprising of 62,288 unique entities of various types such as companies, institutes, skills, and designations demonstrate the effectiveness of our approach. We also provide insights on different network architectures, each of which encapsulate a different set of variation while performing canonicalization.   Goto Sponge  NotDistinct  Permalink

An Entity of Type : fabio:Abstract, within Data Space : covidontheweb.inria.fr associated with source document(s)

AttributesValues
type
value
  • Online recruitment industry holds large amount of user-generated content in the form of job postings, resumes etc. This content finds its way in the knowledge bases (KB) causing duplicate and non-standard representations of entities (like company names, institute names, designations, skills etc.) These non-standard entity representations impact various applications such as search, recommendations and information retrieval. Therefore, KB canonicalization i.e, mapping multiple references of same entities into unique clusters is imperative for online recruitment platforms. Research suggests various approaches that use enriched semantic context or external context (from sources like Freebase) to perform KB Canonicalization. In fields where such external sources of context do not exist the problem remains challenging. To address these challenges, we propose a novel deep Siamese architecture with character-based attention and word embeddings that (a) estimates pairwise similarity between all entity mentions, and (b) then uses these similarity (scores) to create canonical clusters representing unique entity in the KB. Our experiments on recruitment domain dataset comprising of 62,288 unique entities of various types such as companies, institutes, skills, and designations demonstrate the effectiveness of our approach. We also provide insights on different network architectures, each of which encapsulate a different set of variation while performing canonicalization.
Subject
  • Concepts in logic
  • Information retrieval
  • Natural language processing
  • User interfaces
  • Knowledge bases
  • Computing terminology
  • Language modeling
  • Text user interface
part of
is abstract of
is hasSource of
Faceted Search & Find service v1.13.91 as of Mar 24 2020


Alternative Linked Data Documents: Sponger | ODE     Content Formats:       RDF       ODATA       Microdata      About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data]
OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software