Ricardo Lezama https://ricardolezama.com Computational Linguistics, Chicano/Mexican Politics & FAQ Linguistics In An Enterprise Thu, 30 Dec 2021 07:09:54 +0000 en hourly 1 https://wordpress.org/?v=5.8.3 https://ricardolezama.com/wp-content/uploads/2019/07/cropped-califlag-1-32x32.jpg Ricardo Lezama https://ricardolezama.com 32 32 Mexican Sports Update https://ricardolezama.com/demographics/mexican/chicano/mexican-sports-update/ Thu, 30 Dec 2021 06:57:59 +0000 http://ricardolezama.com/?p=794 Chicano boxers are always making a splash in the boxing scene. For more updates, check out this constantly updated and curated tweet feed on this page. Mexican liga mentions, Mexico related opinions (good, bad and offensive) are also included in this feed. If anything crosses the line, I suppose one can reach out to Twitter but sports tends to inflame the passions. So, ‘do not be offended easily’ is my recommendation.

]]>
Chicano Culture Update https://ricardolezama.com/demographics/mexican/chicano/chicano-culture-update/ Mon, 27 Dec 2021 23:23:11 +0000 http://ricardolezama.com/?p=786 Mentions Cultural events in the news, on peoples minds within the Twitter space.

Updates consist of 20ish most recent or available tweets that were deemed newsworthy for the Chicano community. Lots of discussions about skin tones, reactions against hollywood and other

]]>
Explicit Content Related To Mexicans – Please Review https://ricardolezama.com/demographics/mexican/chicano/explicit-content-related-to-mexicans-please-review/ Mon, 27 Dec 2021 02:36:34 +0000 http://ricardolezama.com/?p=770 For cultural news, please see here: Chicano Culture.

In this page, we review possibly objectionable content related to Mexicans. We have stored these tweets in a database. Many people make statements on Twitter ‘with a pinch of salt’. However, therein lies a powerful question: who gets to define what is simply a cheeky reference and what crosses the line as cemented or fomenting detrimental worldviews? A few simple questions will help the reader of the tweets figure out what to potentially report to Twitter:

  • Can this person make this statement in front of the demographic mentioned?
  • Is this a member of the community referenced?
  • Can this person make this statement at work without an HR consult afterwards or some other kind of censure?

If the answer to any of the above mentioned is “no”, then this tweet is likely objectionable and worth passing along to Twitter.

The social media representation of communities is important. While Freedom of Speech is important too and we should not seek to prevent statements from being uttered/tweeted, we can check their propagation; a racially biased or offensive view must always be countered by a concerted rebuttal.

]]>
Chicano Chatter On Twitter https://ricardolezama.com/demographics/mexican/chicano/chicano-updates-from-around-twitter/ Sun, 26 Dec 2021 21:59:39 +0000 http://ricardolezama.com/?p=754 Check out the latest chatter from people using the word ‘Chicano’ on twitter.

In an effort to highlight more content, we developed a few database queries to routinely retrieve uncontroversial tweets. Some of these contain frivolous references or insightful comments. Unfortunately, in many social media platforms, some of the least informed content often gets more elevated in the general public’s conscience. This page is an effort to add visibility to the reactions, concerns and ideas of the less prominent (“unliked”, less indexed) voices on Twitter, which are equally valid.

In this page, you can monitor content that contains the keyword ‘Chicano’ without any explicit content. For that more flagrant content, please visit this link. This relatively neutral content should be easy enough to follow along. I sort this list of tweets programmatically; using the Twitter Search API, I am able to amass a daily sampling of tweets on the concepts most.

At any rate, these pages allow one to observe what topics are on the mind of the more vocal members of the community. Feel free to report to twitter any objectionable content. The tweets are shown in their entirety and the views expressed their do not express my own or those of my employer.

The content is refreshed roughly every 24 hours. You will either get today’s results or the day before.

]]>
US Employee Pensions Finance PEGASUS Software; University of California, CALPERS Among Group https://ricardolezama.com/politica/us-employee-pensions-finance-pegasus-software-university-of-california-calpers-among-group/ Fri, 17 Dec 2021 05:08:12 +0000 http://ricardolezama.com/?p=744 This article is reshared with permission from La Cartita. Originally published in that platform 12/16/2017.

La Cartita — (6/30/2017) — PEGASUS is the worlds most advanced spyware, a special type of software designed to spy on cellular phones and computers without the user’s permission. The software is most often used to target a victim’s phone camera and microphone. The audio and video are recorded and then leveraged against the victim in some way. PEGASUS is designed by the NSO Group, a team of former and current Israeli soldiers from UNIT 8200, a signals intelligence unit from the Israeli army (Israeli Defense Forces or IDF). The company was (and may still be) subsidized by the Israeli government. All of the funds that develop the Israeli’s espionage capacity is ultimately from the large military aid package provided by the US government.

Francisco Partners LP, the real owners of PEGASUS

PEGASUS was recently the subject of a highly circulated article from the NY Times detailing how the NSO Group’s software was found to have been used by the Mexican government against activist lawyers and journalists. The NY Times article was based primarily on a report from Citizen’s Lab group in Toronto. NSO Group works exclusively with governments. The first documented use of the software was against Ahmed Mansoor, a respected legal scholar who speaks out against torture.

Unfortunately, NSO Group does not operate independently of private capital. NSO Group was acquired by a private equity firm: Francisco Partners LP. The firm has several technology holdings, for instance, a software unit from Dell Computers that was spun off to Francisco Partners LP.

II. CALPERS puts 100 Million on Pegasus’ Owner; UC Regents 25 million
CALPERS funds Francisco Partners LP, owners of the NSO Group

Francisco Partners LP has two publicly listed locations that function as their corporate offices. There is 1 Letterman Drive, C Suite 410, San Francisco, California and another office in London. Their holdings are valued at 8 billion dollars. Ironically, they are increasingly in a better position to exploit commercial software since they own increasingly ubiquitous software and hardward platforms to which the NSO Group can presumably gain privileged access.

Francisco Partners LP has many government contacts. At least, one can assume this to be case with the high number of public pension funds that have invested in the company. Most notably for some of our readers, CALPERS has paid into a 100,000,000 dollars into a Francisco Partners LP fund. The following is a cursory review of the amount invested in Francisco Partners LP’s funds from US public pensions.

The Following Public Pensions Pay Into Francisco Partners LP Fund: *** How To Interpret Figures: The amount invested is to the right. The rightmost section contains the latest known investment made from the Public Pension funds to the Franscico Partners LP funds that finance company operations, e.g. capitalization, providing loan collateral, operating costs. etc.

    California Public Employees’ Retirement System USD 100,000,000 9/30/2016
    Oregon Public Employees Retirement System USD 100,000,000 12/31/2016
    University of Texas Investment Management Co/The USD 75,000,000 5/31/2016
    California State Teachers’ Retirement System USD 75,000,000 9/30/2016
    Florida Retirement System USD 75,000,000 9/30/2016
    New York City Fire Pension Fund USD 75,000,000 6/30/2016
    Colorado Public Employees’ Retirement Association USD 50,000,000 12/31/2015
    School Employees Retirement System of Ohio USD 40,000,000 12/31/2016
    Regents of the University of California/The USD 25,000,000 9/30/2014
    West Midlands Pension Fund USD 30,008,541 3/31/2016
    University of Michigan USD 20,000,000 9/17/2009
    Pennsylvania State Employees’ Retirement System USD 20,000,000 12/31/2015
    Ohio Police & Fire Pension Fund USD 15,000,000 6/30/2014

III. The profit model for NSO Group: Hack More, Pay Less: Realizing Scale

Documents leaked to the NY Times revealed the NSO Group’s external clients and their fee structure. The NSO group charges USD 500,000 dollars to a client state that wishes to install their software in some piece of hardware. An additional USD 650,000 dollars is assessed to intercept/hack 10 I-Phones or 10 Androids. Finally, a client may be charged USD 800,000 dollars more to hack 100 phones of any make or model. This pricing model reflects a disposition to hack more in order for a government to ‘get its money’s worth’.

IV. Government of Mexico: Ayotzinapa Hacks

The Government of Mexico – even before it had a massive fiasco in its hands with the Ayotzinapa case of 2014 – has, at least, 80 million dollars invested in projects with the NSO group since 2013. That figure could only have gone up since the EPN administration struggles to maintain power.

The Ayotzinapa case involves many dozens of lawyers and activist groups. A rough estimate from the Inter American Commission on Human Rights claims that at least 196 people were affected on the night of September 26, 2014. These people and their extended families should presume themselves to be subjects of surveillance in one shape or another because of their legal connection and right to claim restitution. At the time of writing, many of the direct family member’s of the disappeared 43 have phones that exhibit strange behavior.

]]>
WebScraping As Sourcing Technique For NLP https://ricardolezama.com/linguistics/webscraping-as-sourcing-technique-for-nlp/ Tue, 07 Dec 2021 21:49:00 +0000 http://ricardolezama.com/?p=708 Introduction

In this post, we provide a series of web scraping examples and reference for people looking to bootstrap text for a language model. The advantage is that a greater number of spoken speech domains could be covered. Newer vocabulary or possibly very common slang is picked up through this method since most corporate language managers do not often interact with this type of speech.

Most people would not consider Spanish necessarily under resourced. However, considering the word error rate in products like the Speech Recognition feature on a Hyundai, Mercedes Benz or text classification generally on social media platforms, which is skewed towards English centric content, there seems to certainly be a performance gap between contemporary #Spanish speech in the US and products developed for that demographic of speakers.

Excellent example of a ML model struggling because of lack of an exclusion list.



Lyrics are a great reference point for spoken #speech. This contrasts greatly with long form news articles, which are almost academic in tone. Read speech also carries a certain intonation, which does not reflect the short, abbreviated or ellipses patterning common to spoken speech. As such, knowing how to parse the letras.com pages may be a good idea for those refining and expanding language models with “real world speech”.

Overview:

  • Point to Letras.com
  • Retrieve Artist
  • Retrieve Artist Songs
  • Generate individual texts for songs until complete.
  • Repeat until all artists in artists file are retrieved.

The above steps are very abbreviated and even the description below perhaps too short. If you’re a beginner, feel free to reach out to lezama@lacartita.com. I’d rather deal with the beginner more directly; experienced python programmers should have no issue with the present documentation or modifying the basic script and idea to their liking.

Sourcing

In NLP, the number one issue will never be a lack of innovative techniques, community or documentation for commonly used libraries. The number one issue is and will continue to be a proper sourcing and development of training data.

Many practitioners have found that the lack of accurate, use case specific data are better than a generalized solution, like BERT or other large language models. These issues are most evident in languages, like Spanish, that do not have as high of a presence in the resources that compose BERT, like Wikipedia and Reddit.

Song Lyrics As Useful Test Case

At a high level, we created a list of relevant artists: Artists then looped through the list to search in lyrics.com whether they had any songs for them. Once we found that the request yielded a result, we looped through the individual songs for each artists.

Lyrics are a great reference point for spoken speech. This contrasts greatly with long form news articles, which are almost academic in tone. Read speech also carries a certain intonation, which does not reflect the short form, abbreviated or ellipsis that characterizes spoken speech. As such, knowing how to parse the https://letras.com resource may be a good idea for those refining and expanding language models with “real world speech”.

Requests, BS4

The proper acquisition of data can be accomplished with BeautifulSoup. The library has been around for over 10 years and it offers an easy way to process HTML or XML parse trees in python; you can think of BS as a way to acquire the useful content of an html page – everything bounded by tags. The requests library is also important as it is the way to reach out to a webpage and extract the entirety of the html page.

# -*- coding: utf-8 -*-
"""
Created on Sat Oct 16 22:36:11 2021
@author: RicardoLezama.com
"""
import requests
artist = requests.get("https://www.letras.com").text

The line `’requests.get(“https://letras.com”).text` does what the attribute ‘text’ implies; the call obtains the HTML files content and makes it available within the python program. Adding a function definition helps group this useful content together.

Functions For WebScraping

Creating a bs4 object is easy enough. Add the link reference as a first argument, then parse each one of these lyric pages on DIV. In this case, link=”letras.com” is the argument to pass along for the function. The function lyrics_url returns all the div tags with a particular class value. That is the text that contains the artists landing page, which itself can be parsed for available lyrics.

def lyrics_url(web_link):
    """
    This helps create a BS4 object. 
    
    Args: web_link containing references. 
    
    return: text with content. 
    """
    artist = requests.get(web_link).text
    check_soup = BeautifulSoup(artist, 'html.parser')
    return check_soup.find_all('div', class_='cnt-letra p402_premium')
letras.com the highlight portion is contained within <div> tag.

The image above shows the content within a potential argument for lyrics_url “https://www.letras.com/jose-jose/135222/”. See the github repository for more details.

Organizing Content

Drilling down to a specific artist requires basic knowledge of how Letras.com is set-up for organizing songs into a artists home page. The method artists_songs_url involves parsing through the entirety of a given artists song lists and drilling down further into the specific title.

In the main statement, we can call all these functions to loop through and iterate through the artists page and song functions to generate unique files, names for each song and its lyrics. The function generate_text will write into each individual one set of lyrics. Later, for Gensim, we can turn each lyrics file into a single coherent gensim list.



def artist_songs_url(web_link):
    """
    This helps land into the URL's of the songs for an artist.'
    
    Args: web link is the 
    
    Return songs from https://www.letras.com/gru-;/
    """
    artist = requests.get(web_link).text
    print("Status Code", requests.get(web_link).status_code)
    check_soup = BeautifulSoup(artist, 'html.parser') 
    songs = check_soup.find_all('li', class_='cnt-list-row -song')
    return songs
#@ div class="cnt-letra p402_premium

def generate_text(url):
    import uuid 
    songs = artist_songs_url(url)
    for a in songs:
        song_lyrics = lyrics_url(a['data-shareurl'])
        print (a['data-shareurl'])
        new_file = open(str(uuid.uuid1()) +'results.txt', 'w', encoding='utf-8')
        new_file.write(str(song_lyrics[0]))
        new_file.close()
        print (song_lyrics)
    return print ('we have completed the download for ', url )


def main():
    artistas = open('artistas', 'r', encoding='utf-8').read().splitlines()
    url = 'https://www.letras.com/'
    for a in artistas : 
        generate_text(url + a +"/")
        print ('done')
#once complete, run copy *results output.txt to consolidate lyrics into a single page. 


if __name__ == '__main__':
    sys.exit(main())  # 
]]>
New B.1.1.529 Coronavirus Variant Poised To Be Deadlier Than Delta https://ricardolezama.com/covid-19/new-b-1-1-529-coronavirus-variant-poised-to-be-deadlier-than-delta/ Fri, 26 Nov 2021 08:21:16 +0000 http://ricardolezama.com/?p=687 Markets, medical experts and governments are raising concerns over the latest Coronavirus variant.


As the public in the United States gathers in observance of Thanksgiving, South African experts and global governments are alarmed at the B.1.1.529 variant of the Coronavirus. Enough concern has been raised to shutdown air-traffic partially between UK/Europe and parts of Africa. The new variant was reported Wednesday, 8:11 pm PST and will eventually receive a Greek letter name, according to Bloomberg News.

Origin

First spotted in Botswana, the B.1.1.529 variant appears to have more of the protein spikes associated with more aggressive viruses, like the Delta variant. Roughly speaking, the spike protein allows for the virus to penetrate the cellular membrane of a healthy human cell. Afterwards, it inserts RNA into healthy human cells, which then causes issues in vulnerable lung tissues e.g Covid-19.

The B.1.1.529 variant is thought to have evolved from an untreated HIV/AIDS patient, according to Francois Balloux via Bloomberg News. Unfortunately, people who are immunocompromised can carry the coronavirus for longer periods of time, allowing for significantly different variants to emerge and eventually infect others.

This when paired with the fact other healthy individuals who are unvaccinated will contract this variant makes for a perfect storm of conditions to raise the prospect of a new wave of Covid-19 infections.

New Variant B.1.1.529 Raises Concerns Globally (Source: Guardian)

India Increases Testing, Israel Reacts

The country of India is now increasing testing for foreign travelers out of fear that this deadlier, more transmissible than the Delta variant will reach its already vulnerable populace.

Israel is also testing for the variant.

Affects Those Under 25

In places with fewer vaccinations, the populace under 25 is expected to see a spike of B.1.1.529 infections. For instance, South Africa has around 1/4 of its under 25 populace vaccinated and it is this population most affected with the variant in Gauteng. South African authorities have several confirmed cases, with laboratories expecting to confirm additional ones after sequencing is performed on new samples.

According to South African public health authorities:

“This variant is reported to have a significantly high number of mutations, and thus, has serious public health implications for the country, in view of recently relaxed visa restrictions and opening up of international travel.”

National Centre for Disease Control (NCDC) via OdhisaTv

Flights Between South Africa, UK Halted

The UK is now banning flights from six African countries. A strict quarantine will take place from these six countries: South Africa, Lesotho, Botswana – the original site for the variant – Mozambique, Namibia and Eswatini.

Markets React

Shares in Intercontinental are now down 6.7 percent in stock futures after a busy holiday season in the US, according to Dow Jones. If past behavior is an indicator, travelers will now do a double take on travel plans globally and in the domestic US as Christmas travel was gearing up as a windfall for airlines, hotels and oil companies/gas retailers.

]]>
Another Zero Day Exploit For Microsoft https://ricardolezama.com/programming-languages/windows-commands/another-zero-day-exploit-for-microsoft/ Tue, 23 Nov 2021 03:23:03 +0000 http://ricardolezama.com/?p=679 Even Windows 11 is affected.

Apparently, one can open a command line window and deploy an exploit to raise permissions on a machine using a .exe file freely available on Github. Nice.

The exploit works on Windows 10, Windows 11 and Windows Server versions of this OS. The exploit consists of a low privileged user raising their own privileges by running basic commands on the CMD prompt. Fascinating.

Bleeping Computer Blog Finds Exploit

The exact issue is described by BleepingComputer yesterday in a much circulated blog post:

[BP] has tested the exploit and used it to open to command prompt with SYSTEM privileges from an account with only low-level ‘Standard’ privileges.

– Bleeping Computer
]]>
After Success Unifying Super Middleweight Division, Canelo Calculates Legacy With Cruiserweight Challenge https://ricardolezama.com/sports/boxing/after-success-unifying-super-middleweight-division-canelo-calculates-legacy-with-cruiserweight-challenge/ Sat, 20 Nov 2021 06:33:02 +0000 http://ricardolezama.com/?p=671 Saul “Canelo” Alvarez is now contemplating challenging a much bigger man who is the champion at the Cruiserweight weight division in the WBC.

The Canelo legacy keeps rising as the 31 year old Mexican enters his prime and relishes success compounded repeatedly after multiple successful defenses of titles. Most recently, the Mexican has unified the competitive 168lb pound division.

His fanbase is expanding globally, with English speakers placing support behind the ‘face of boxing’ amidst the usual controversies and biases that all combat sports tend to manifest.

The Mexican fanbase too looks on as they toil away at their jobs being the backbone of multiple regional and national economies. Every Canelo fight affirms some of the positive image that exists globally of the Mexican man. At least, that is how most sports watchers interpret the presence of Canelo in media depictions. To talk about this legend in development is to talk about the importance of boxing within the Mexican community. Thus, the moves he makes will define the sport for decades to come.

Early Details On Canelo’s Move To Cruiserweight

According to Michael Benson, Canelo is planning to weigh at 180lbs as he faces Ilunga Makabu, an opponent with a significant weight and height advantage, 200lbs and much taller.

]]>
Canelo vs Plant Is Finally Here https://ricardolezama.com/sports/boxing/canelo-versus-plant-is-finally-here/ Sat, 06 Nov 2021 04:47:38 +0000 http://ricardolezama.com/?p=642 Canelo is now set to face his last and potentially most difficult fight for Super Middleweight supremacy: Caleb Plant.

The Super Middleweight unification bout is set to kick off tomorrow at around 6pm PT from Las Vegas, Nevada. It’s at 75 dollars, which is not terrible alongside a decent undercard. Already, Canelo is a four-division world champion but Plant is IBF champion. Whoever wins is the first undisputed super middleweight world champion in boxing history. The stakes can not be higher.

Canelo marks 168lbs vs Plant at 167lbs – fight night rehydration may add 10 pounds, but the muscle density is on Canelo’s side.

Weigh-In For #CaneloPlant.

At 168 pounds, Canelo looked bulky and ready to deliver powerful blows. He made weight spot-on, 168lbs is the Super Middleweight limit, even going as far as to still wear a heavy gold pendant at the scale. For his part, Caleb Plant weighed in at 167 pounds:

The current IBF title holder at 168 pounds looked muscular as well, but thinner and trim as he is over 6ft tall – a bit of a liability when fighting a compact, explosive opponent.Our best guess is that Caleb Plant’s 167lb frame is an indication that he will fight at distance – “run” as some detractors say – during the fight:

Regardless, this looks to be an historic night with one man ready to unify all the belts. Reportedly, Al Haymon and Eddy Reynoso have been planning or open to additional fights.

Resumes Heading In To Fight

Each fighter has a respectable resume, but the best belongs to the current Pound-for-Pound king, Saul “Canelo” Alvarez. He has most recently defeated 2 previously unbeaten Super Middleweights and defended his title against a formidable challenger in Avni Yildrim.

With respect to Plant, he does have 4 title defenses with his best win being over Jose Uzcategui. Mike Lee was also a respectable opponent, but one would be hard pressed to compare either one with Billy Joe Saunders or Callum Smith – the two Brits defeated by Alvarez.

Prediction

Canelo must KO Plant because as the PBC fighter, Plant is likely set to get the judges nod. Canelo must realize this and is predicting an 8th round KO. It’s tough to take this type of assertion seriously, with many cautioning other fighters about making such predictions. However, in Canelo’s case, most make an exception.

Personally, my fear is that this fight will be boring, with Plant excessively moving as his track leg physique indicated at the weigh-in. I hope I am wrong, but think I may not be.

]]>