Category: Tokenization
-
Word2Vec Mexican Spanish Model: Lyrics, News Documents
A Corpus That Contains Colloquial Lyrics & News Documents For Mexican Spanish This experimental dataset was developed by 4 Social Science specialists and one industry expert, myself, with different samples from Mexico specific news texts and normalized song lyrics. The intent is to understand how small, phrase level constituents will interact with larger, editorialized style […]
-
Tokenizing Text, Finding Word Frequencies Within Corpora
One way to think about tokenization is to consider it as finding the smallest possible unit of analysis for computational linguistics tasks. As such, we can think of tokenization as among the first steps (along with normalization) in the average NLP pipeline or computational linguistics analysis. This process helps break down text into a manner […]
Recent Posts
- Artificial Intelligence Could Help Engage Homeless Populations, But Organizations Must Be Ready
- Alucinación – el termino para cuando los modelos de inteligencia artificial se equivocan
- Barbie debuta en Mexico, Colombia, Uruguay y Estados Unidos este viernes
- Air Quality in Chicago Declines to “Very Unhealthy” Levels as Alert Issued
- Agentes de policía en Colombia capturados por intentar tragarse pruebas de extorsión
Tags
Alejandra del Moral Apple Artificial Intelligence Bill Gates business California Canada ChatGPT China Colombia Corridos belicos crime culture Delfina Gomez Dollar drug traffic Economy EE.UU. Elon Musk environment fentanyl Florida france Human rights inmigrants Joe Biden Latin music LGBTQ Mexico New York Peso Pluma Policy RUSSIA San Francisco Security Shooting Sports technology Ukraine Ukraine crisis united states Violence Vladimir Putin WAR Wildfires