Category: Linguistics

Linguistics

AI Computational Linguistics

If You’re A Linguist, Dont’ Sweat The Formal Math (Too Much) in Natural Language Processing

In the realm of Natural Language Processing (NLP), a foundational understanding of linear algebra does help (somewhat) when trying to make money with NLP. Fundamentally, the concern for all us Philosophers, Linguists and Psychologists is to earn a decent living while not being penalized for our lack of immediate gratification in the workforce. The overarching […]

Ricardo Lezama 
AI Computational Linguistics Data NLP

Alucinación – el termino para cuando los modelos de inteligencia artificial se equivocan

Aunque es impresionante el hecho de que un chatbot responde a un input, académicos, científicos y expertos en la aplicación de la inteligencia artificial no han definido su postura con respeto al IA en términos psicológicos. La ciencia cognitiva bien fue la inspiración para las llamadas ‘redes neuronales’ que definen la arquitectura de algunos de […]

Ricardo Lezama 
Linguistics Mexico Technology

Italy Blocks OpenAI Chatbot for Violating Consumer Data Protection Law

Italy has taken a decisive step to protect its citizens from the potential misuse of artificial intelligence technology, blocking the ChatGPT tool, belonging to the American technology company OpenAI. On Friday, the Italian Data Protection Authority (GPDP) suspended the ChatGPT tool with immediate effect, citing a data breach on March 20 concerning its users and […]

Ricardo Lezama 
Computational Linguistics Data Linguistics Python Tokenization

Word2Vec Mexican Spanish Model: Lyrics, News Documents

A Corpus That Contains Colloquial Lyrics & News Documents For Mexican Spanish This experimental dataset was developed by 4 Social Science specialists and one industry expert, myself, with different samples from Mexico specific news texts and normalized song lyrics. The intent is to understand how small, phrase level constituents will interact with larger, editorialized style […]

Ricardo Lezama 
Computational Linguistics Linguistics Tokenization Tutorial

Tokenizing Text, Finding Word Frequencies Within Corpora

One way to think about tokenization is to consider it as finding the smallest possible unit of analysis for computational linguistics tasks. As such, we can think of tokenization as among the first steps (along with normalization) in the average NLP pipeline or computational linguistics analysis. This process helps break down text into a manner […]

Ricardo Lezama 
Computational Linguistics Linguistics Named Entity Recognition Python Tutorial

Frequency Counts For Named Entities Using Spacy/Python Over MX Spanish News Text

On this post, we review some straightforward code written in python that allows a user to process text and retrieve named entities alongside their numerical counts. The main dependencies are Spacy, a small compact version of their Spanish language model built for Named Entity Recognition and the tabular data processing library, Matplotlib, if you’re looking […]

Ricardo Lezama