|
The Tokenization and Named Entity recognition of the Text is of primary importance to feeding the data to the BERT Input layer of the Model.
Moreover, the conversion of the text to numbers must be done in a way where the tokenization of a specific named entity in the text should be established with a specific id.
For the Named Entity Recognition, the test sequences were tokenized and each token was labelled with Five (5) types of classes (O, B-Comp, I-Comp, B-Task, I-Task).
Table below shows the distribution of each class and also the definition of each class to which a specific token is mapped.
|