About: Word embeddings, made widely popular in 2013 with the release of word2vec, have become a mainstay of NLP engineering pipelines. Recently, with the release of BERT, word embeddings have moved from the term-based embedding space to the contextual embedding space—each term is no longer represented by a single low-dimensional vector but instead each term and its context determine the vector weights. BERT’s setup and architecture have been shown to be general enough to be applicable to many natural language tasks. Importantly for Information Retrieval (IR), in contrast to prior deep learning solutions to IR problems which required significant tuning of neural net architectures and training regimes, “vanilla BERT” has been shown to outperform existing retrieval algorithms by a wide margin, including on tasks and corpora that have long resisted retrieval effectiveness gains over traditional IR baselines (such as Robust04). In this paper, we employ the recently proposed axiomatic dataset analysis technique—that is, we create diagnostic datasets that each fulfil a retrieval heuristic (both term matching and semantic-based)—to explore what BERT is able to learn. In contrast to our expectations, we find BERT, when applied to a recently released large-scale web corpus with ad-hoc topics, to not adhere to any of the explored axioms. At the same time, BERT outperforms the traditional query likelihood retrieval model by 40%. This means that the axiomatic approach to IR (and its extension of diagnostic datasets created for retrieval heuristics) may in its current form not be applicable to large-scale corpora. Additional—different

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Word embeddings, made widely popular in 2013 with the release of word2vec, have become a mainstay of NLP engineering pipelines. Recently, with the release of BERT, word embeddings have moved from the term-based embedding space to the contextual embedding space—each term is no longer represented by a single low-dimensional vector but instead each term and its context determine the vector weights. BERT’s setup and architecture have been shown to be general enough to be applicable to many natural language tasks. Importantly for Information Retrieval (IR), in contrast to prior deep learning solutions to IR problems which required significant tuning of neural net architectures and training regimes, “vanilla BERT” has been shown to outperform existing retrieval algorithms by a wide margin, including on tasks and corpora that have long resisted retrieval effectiveness gains over traditional IR baselines (such as Robust04). In this paper, we employ the recently proposed axiomatic dataset analysis technique—that is, we create diagnostic datasets that each fulfil a retrieval heuristic (both term matching and semantic-based)—to explore what BERT is able to learn. In contrast to our expectations, we find BERT, when applied to a recently released large-scale web corpus with ad-hoc topics, to not adhere to any of the explored axioms. At the same time, BERT outperforms the traditional query likelihood retrieval model by 40%. This means that the axiomatic approach to IR (and its extension of diagnostic datasets created for retrieval heuristics) may in its current form not be applicable to large-scale corpora. Additional—different—axioms are needed. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : covidontheweb.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	Word embeddings, made widely popular in 2013 with the release of word2vec, have become a mainstay of NLP engineering pipelines. Recently, with the release of BERT, word embeddings have moved from the term-based embedding space to the contextual embedding space—each term is no longer represented by a single low-dimensional vector but instead each term and its context determine the vector weights. BERT’s setup and architecture have been shown to be general enough to be applicable to many natural language tasks. Importantly for Information Retrieval (IR), in contrast to prior deep learning solutions to IR problems which required significant tuning of neural net architectures and training regimes, “vanilla BERT” has been shown to outperform existing retrieval algorithms by a wide margin, including on tasks and corpora that have long resisted retrieval effectiveness gains over traditional IR baselines (such as Robust04). In this paper, we employ the recently proposed axiomatic dataset analysis technique—that is, we create diagnostic datasets that each fulfil a retrieval heuristic (both term matching and semantic-based)—to explore what BERT is able to learn. In contrast to our expectations, we find BERT, when applied to a recently released large-scale web corpus with ad-hoc topics, to not adhere to any of the explored axioms. At the same time, BERT outperforms the traditional query likelihood retrieval model by 40%. This means that the axiomatic approach to IR (and its extension of diagnostic datasets created for retrieval heuristics) may in its current form not be applicable to large-scale corpora. Additional—different—axioms are needed.
Subject	Problem solving methods Information retrieval Natural language processing Artificial neural networks Mexican cuisine Language modeling
part of	Diagnosing BERT with Retrieval Heuristics
is abstract of	Diagnosing BERT with Retrieval Heuristics
is hasSource of	covid:ann/target/0a7bea3363c1aa8b5b4cef7492c21eb938ee26aa covid:ann/target/0e3cee21bec8abcd3c29cf691bd336fa1622addd covid:ann/target/82f835e5a30d5bdbab5003e9ea6c809c44ca500d covid:ann/target/613f951ee4249eae2f16aeb86ad099affb480167 covid:ann/target/4a57c8a168abd562d57f992aedd34429e19124c4 covid:ann/target/04bb4013bad980576bae81deab7cc32aa07493ab covid:ann/target/3fb59932a6ef8929296cc59541fd3f122b07e224 covid:ann/target/8c8583dc4b66949fdd9ce51735f1e7f4c4a06590 covid:ann/target/fc394aafc32c3022426991b78d9088e179b7f326 covid:ann/target/fee1248c6b408c98fd5dd8b8da41c803b0803bfc covid:ann/target/dc6e19827c1facfa12d58654813b85de62ec77ed covid:ann/target/0dee3db6d5ed511b852b2460908df863e9b747b2 covid:ann/target/bbee274ba1b792d0a4dfb01684ae78e237d2d9a4 covid:ann/target/2cc98da90c2f68fcda2fdcbfe594554d24aaa837 covid:ann/target/4d0342730fd401ef822fe04b0d6fa84dc4dbaa59 covid:ann/target/c1e486453b702124099fe002aff689a5c3a5363c covid:ann/target/92446be4179b76ad862d0e0f6773ae9d90fcc8ff covid:ann/target/40c2cdce920bc3d9bd5184fdbc02cb1667beeee2 covid:ann/target/f301a4643cb6b2e3b1e4e2c6cade16ac02b6a2b6 covid:ann/target/d04d952562837313d6f940cff166f7242913a49a covid:ann/target/8ab3718945d4c712d9ab6436bf7fc1bb9fa07891 covid:ann/target/bd8964598e4b770966e3eb671b92fdbb09fd7f0f covid:ann/target/589e88d274b23d69b6fa7e7283ec2dd4678fc21a covid:ann/target/a3ef657544247025c4236fc3b01adc040f5a913d covid:ann/target/69d7b8291b35a3bd8dacc280168a05fba440e28a covid:ann/target/94c970610a28be972c414b440f75a2711a912d00 covid:ann/target/f5d0f8daa3f7c116012eaa9f8083ae277908bec6 covid:ann/target/0772fe9dc6fc20107612930e59a81d4e4f25e40f covid:ann/target/81cd43b54fe444a2ff5430e6dee051896b948c1e covid:ann/target/c81cfe58d5db67e46eb0d428114fce9a034f8f21 covid:ann/target/1cb2b43316b809e142d3e92401f843ec51976a52 covid:ann/target/90873bc21c87a69ad254e764b3813f48f685606b covid:ann/target/bacd43765adaf1dd137da2d43662e627c47c6e73 covid:ann/target/4484f262b23d9dcdd3bb41ed407c8d90bdbf2749 covid:ann/target/2bf0bdc066ca6a7df584cb50af98ea0684d4fcf5 covid:ann/target/31982937490a3f65627da3c750414e23bee47173 covid:ann/target/d505e1a52e9eb36b75cb052fa42aae772c91d4cd covid:ann/target/ec5bf606d44933da3ae5596b0248da5f3a3573af covid:ann/target/434b0e6cb6c4bd91e2323fcdd494aa2b4742ef55 covid:ann/target/c49450475c2272ccb7be5d0e371ea7926b1f175d covid:ann/target/487472fed56d1377c81127c13063b37bbad728e4 covid:ann/target/5cbe5b9959e60f007db89993d986e4d89decea76 covid:ann/target/120a6ec3dccf3c6d40a33e6482546edcc6fda480 covid:ann/target/b3f25e08d6e76b643accf54c5d7cfde3ed420acc covid:ann/target/c294bda658e720231cef16b0e74fc1a37137dbc2 covid:ann/target/06fd9c3febdbc9818fe9abec7fc3aedd9ef6a371 covid:ann/target/f3d27e4506bf487d1d75523b0d04531f8f2313d6 covid:ann/target/685f762615a07e953b4a2fcb617d33cd3792e382 covid:ann/target/b677d9a59f5e937a04111da3f4c5760e4f996846 covid:ann/target/ba9d27dba1c6c32dfeb10d3c905cf96efc72eaa7 covid:ann/target/40b9b714be4c4960e1edaa97b8f2bd0b58ebf670 covid:ann/target/48d1b2027492468201c0ff844fb74d5215acd494 covid:ann/target/97d2ee11d1bb0a9c7258f5323d07ae8e8bf3dcd6 covid:ann/target/135feca9e02f04c7c5debc02e7a674ce662d639f covid:ann/target/5572b7f32ef6aaac5b240099a72357fec792678e covid:ann/target/07d92d808c3d02bf9badff5f303073a76cb55231 covid:ann/target/12c23f8d385d5e95e50730f5fa3f55cbf853ee23 covid:ann/target/217192620eaf8dbf2d27d392202eb8491c8ea41a covid:ann/target/17dd76e4d5b0017196752d991b6b63fd318bc760 covid:ann/target/3cac09167a6c3e7da221111fe8e64b8c060bfc28 covid:ann/target/431b038c13dac631a1a313f144c68dcbb9850daf covid:ann/target/516408965d9890cd7019d4279314843ca40537ee covid:ann/target/89ec7e678ceac745afd90c2116dae9182431be00 covid:ann/target/97e4fc15e22414ef073630a7de1fb1c9e87e78f2 covid:ann/target/af9e0c6224b3604581a2a33dfe127a95a059889c covid:ann/target/d6941e08e5ac6b02aad0344911ae9ae6f797d06d covid:ann/target/3d9ceb78f066504c787ae186a92b934e36ee620d covid:ann/target/85f83540ba5b6488a1428fabcdf0fbee919c894d covid:ann/target/1dbdb063f3d4edb003eb79cb04be7660ab993941 covid:ann/target/5893ac5546a858584df7785c56f8e17b48e58682 covid:ann/target/cc80a2cd19ceb82afa73598ac14fa3c8b243b19d covid:ann/target/dbe9980b33abd966dad4c49c8cd8ee14419dee35 covid:ann/target/5f1f44259ca55716e0b8553b2d03a94cdc4b3adf covid:ann/target/8f055c31cd735b578963356aa8a248c273607b15 covid:ann/target/e2ece87e957433af7ab8a5fbde70bdcd5deb00d2 covid:ann/target/27ff7f234d07a356ff65a77756ebe74723566e9a covid:ann/target/5f5983a28474158ace48ae457d7da628b3e06537 covid:ann/target/7524c366798cdad5ad1f9c00e7d3cdb65cef341b covid:ann/target/a8dfb711f3cf09c2df44efc6c5ba95383fae214d covid:ann/target/17b0117aba55bad7a3ec539003de8ca176d65973 covid:ann/target/61f9be9d4db3ffa31fa2d3aae5227851e93b34c3 covid:ann/target/9d6b51526fa9e0d5042f68042666c39bf1ffc350 covid:ann/target/c5738a0101a2ae7357634fdbdf9b75758825e93b covid:ann/target/3eb8d80e766385ac504b754f767896e759ef6a2b covid:ann/target/405e1ce77f52648b53fa7d8a9bdb6dd6b698b306 covid:ann/target/5d54f667c66d53643fc3c9076da2a9179c750f21 covid:ann/target/6470f41200326822bf2702ed68972f4fef47e095 covid:ann/target/8637ce156f9115c82dee6159d06ac9c4bdd89c96 covid:ann/target/918482a1b0c72c9dac2e622c9e06e40955b4f908 covid:ann/target/02e01a1651a6e1b55d023b4bcda0fbcd42bb4b94

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software