Author: Ricardo Lezama

Computational Linguistics Data Linguistics Python Tokenization

Word2Vec Mexican Spanish Model: Lyrics, News Documents

A Corpus That Contains Colloquial Lyrics & News Documents For Mexican Spanish This experimental dataset was developed by 4 Social Science specialists and one industry expert, myself, with different samples from Mexico specific news texts and normalized song lyrics. The intent is to understand how small, phrase level constituents will interact with larger, editorialized style […]

Ricardo Lezama 
Boxing California Chicano Mexican

Mikey Garcia – A Manny Pacquiao Style Loss

The Sandor Martin upset is reminiscent of the great Manny “Pacman” Pacquiao upset against Jeff Horn. In Fresno, California, Mikey Garcia delivered a slow and methodical performance against Sandor Martin. For his part, Martin delivered on the expected southpaw style, consistent jab and constant lateral movement – this proved more valuable to the California judges […]

Ricardo Lezama 
California Chicano Demographics Hispanic Mexican

Survey: US Based Mexican Average Salary Between 47k to 67k Annually

From May 2nd to May 5th of 2020, I gathered data with an online survey administered with the help of Mexican based Data Analysts who helped me recruit participants and review the data. My goal was to understand how COVID-19 affected my community’s economic status and employment prospects. The descriptions here apply to 77 confirmed […]

Ricardo Lezama 
Computational Linguistics Linguistics Tokenization Tutorial

Tokenizing Text, Finding Word Frequencies Within Corpora

One way to think about tokenization is to consider it as finding the smallest possible unit of analysis for computational linguistics tasks. As such, we can think of tokenization as among the first steps (along with normalization) in the average NLP pipeline or computational linguistics analysis. This process helps break down text into a manner […]

Ricardo Lezama 
Computational Linguistics Linguistics Named Entity Recognition Python Tutorial

Frequency Counts For Named Entities Using Spacy/Python Over MX Spanish News Text

On this post, we review some straightforward code written in python that allows a user to process text and retrieve named entities alongside their numerical counts. The main dependencies are Spacy, a small compact version of their Spanish language model built for Named Entity Recognition and the tabular data processing library, Matplotlib, if you’re looking […]

Ricardo Lezama