CANINE is a pre-trained encoder for language understanding that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy with soft inductive biases in place of hard token boundaries. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context.
Source: CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language RepresentationPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Phishing Website Detection | 1 | 25.00% |
Document Classification | 1 | 25.00% |
Specificity | 1 | 25.00% |
Malware Classification | 1 | 25.00% |