Category: Tokenization
-
Word2Vec Mexican Spanish Model: Lyrics, News Documents
A Corpus That Contains Colloquial Lyrics & News Documents For Mexican Spanish This experimental dataset was developed by 4 Social Science specialists and one industry expert, myself, with different samples from Mexico specific news texts and normalized song lyrics. The intent is to understand how small, phrase level constituents will interact with larger, editorialized style […]
-
Tokenizing Text, Finding Word Frequencies Within Corpora
One way to think about tokenization is to consider it as finding the smallest possible unit of analysis for computational linguistics tasks. As such, we can think of tokenization as among the first steps (along with normalization) in the average NLP pipeline or computational linguistics analysis. This process helps break down text into a manner […]
Recent Posts
- The Crucial Role of Linear Algebra in Natural Language Processing
- Artificial Intelligence Could Help Engage Homeless Populations, But Organizations Must Be Ready
- Alucinación – el termino para cuando los modelos de inteligencia artificial se equivocan
- Barbie debuta en Mexico, Colombia, Uruguay y Estados Unidos este viernes
- Air Quality in Chicago Declines to “Very Unhealthy” Levels as Alert Issued
Tags
Alejandra del Moral Apple Artificial Intelligence Bill Gates Boxing business California Canada ChatGPT China Colombia crime culture Delfina Gomez Dollar Donald trump drugs drug traffic Economy EE.UU. Elon Musk environment fentanyl Florida france Gustavo Petro Human rights Italy Joe Biden Latin music LGBTQ Mexico New York Peso Pluma Policy Ron DeSantis RUSSIA Security Shooting technology Ukraine Ukraine crisis united states Violence WAR