About 3,620,000 results
Open links in new tab
  1. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  2. Looking for a clear definition of what a "tokenizer", "parser" and ...

    A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context to the tokens -- …

  3. python - AutoTokenizer.from_pretrained fails to load locally saved ...

    from transformers import AutoTokenizer, AutoConfig tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') config = …

  4. what's difference between tokenizer.encode and …

    The main difference is stemming from the additional information that encode_plus is providing. If you read the documentation on the respective functions, then there is a slight difference for …

  5. Saving Tokenizer object to file for scoring - Stack Overflow

    tokenizer_from_json doesnt seem to be available in Keras anymore, or rather its not listed in their docs or available in the package in conda @Max you still do it this way?

  6. How to do Tokenizer Batch processing? - HuggingFace

    Jun 7, 2023 · in the Tokenizer documentation from huggingface, the call fuction accepts List [List [str]] and says: text (str, List [str], List [List [str]], optional) — The sequence or batch of …

  7. How to download punkt tokenizer in nltk? - Stack Overflow

    How to download punkt tokenizer in nltk? Asked 2 years, 2 months ago Modified 7 months ago Viewed 24k times

  8. How does one set the pad token correctly (not to eos) during fine ...

    Jul 7, 2023 · tldr; what I really want to know is what is the official way to set pad token for fine tuning it wasn't set during original training, so that it doesn't not learn to predict EOS. colab: …

  9. pytorch - Tokens to Words mapping in the tokenizer decode step ...

    Jun 11, 2020 · Is there a way to know the mapping from the tokens back to the original words in the tokenizer.decode() function? For example: from transformers.tokenization_roberta import …

  10. java - Why is StringTokenizer deprecated? - Stack Overflow

    From the javadoc for StringTokenizer: StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that …