tag:blog.barac.at,2013:/posts Ahmad Baracat 2022-11-12T12:46:34Z Ahmad Baracat tag:blog.barac.at,2013:Post/1885079 2022-09-30T20:06:30Z 2022-10-15T13:53:38Z [Request for Funding] A Business Experiment in Data Dignity

Resources (TL;DR)

Link to Pitch Deck

AI Grant 1.5 mins Pitch

The problem

  • AI-focused products/startups lack a business model aligning the incentives of both the company and the domain experts (Data Dignity)
  • With the current advancements in AI generated images from text, we are going to focus on logo generation from text as a testbed for a new business model

The product

  • Building Stability.ai DreamStudio for logo generation
  • The main hypothesis for building this startup is figuring out what would happen if the ML builders and the domain experts had co-ownership of the data/model.
  • Essentially, this is a business experiment in Data Dignity (a term coined by Satya Nadella and popularized by Jaron Lanier) economics.

Why focus on Data Dignity?

  • Lack of business model experimentation
    • Not many companies (if any) are experimenting with business models, which don’t alienate domain experts and provide co-ownership of the value created
  • Better data quality
    • We have seen, first hand, how big companies like Amazon will use services like MTurk to label data cheaply and we have seen how this affects the quality of the data
  • Build better AI-products
    • We believe we can build better AI products if we can get better alignment both financially (co-ownership) and directionally (what do we want a specific AI product to do) between the ML community and the domain experts

Why logo generation?

  • Intuitively, it is easier to tackle than generic image generation (the domain of Stability.ai), thus needing less data
  • There is big demand for logo generation services
    • Fiverr has 300k+ logo generation services listed

Who are the customers?

  • Logo designers looking for creative unblock
  • Businesses looking to buy unique/inexpensive logos

Differences from current offerings

  • Stability AI DreamStudio
    • You can use the generated images for commercial purposes
  • Fiverr. (Ex: you get 2 concepts & 3 revisions for $13 + 2 days delivery)
    • You get 1k concepts for $10
    • You get Instant delivery
    • You get to play around with the service

Action Plan

For the initial launch, we will have a lot of self-imposed constraints to allow us to focus on the meat of the business experiment, which is “can we get it to work economically?” i.e. are we able to get experts to share their data and for customers to use the created model to solve their use case while making sure we provide experts with financial incentive and for the AI-company to be profitable (check below, Simple Business Model).

  1. Figure out which community of artists/experts are the easier to access and are more likely to participate in this business experiment
    1. We need an image generation task/gap that once cracked will prove profitable (as other non domain experts are likely to rely on to generate images for their use cases)
    2. We have potential communities in mind that span different parts of the spectrum
      1. Anime/cartoon style image generation for the general public
        1. Experts: cartoonists and anime artists
        2. Customers: general public
      2. Realistic image generation for the general public
        1. Experts: general public photos taken with their smartphones
        2. Customers: general public
      3. Cartoon/Diagram generation for scientific use cases (ex: to generate illustrations for papers and posters)
        1. Experts: scientific illustrators
        2. Customers: Masters/PhD students in STEM fields
      4. Logo ideas generation
        1. Experts: logo designers
        2. Customers: other logo designers looking for creativity unblock or businesses looking to buy unique/inexpensive logos)
    3. Cherry pick experts by hand and referrals from the ones already vetted
      1. For v1, only onboard experts who have good reviews/ratings and who we can easily pay (ease of sending money, tax implications, etc.)
    4. Build the data capturing tool
      1. Need to check for collisions for similarly submitted data (we need to make sure that the data is valuable to the model before accepting it)
      2. If the data is similar to already submitted data, we show the expert examples of similar works (either in terms of image or caption)
      3. Vetted experts can choose to either upload new data or to vote on the validity/quality of already uploaded data
    5. Train the model
      1. We will first train a baseline model on public domain data
      2. Fine tune on the provided expert data
    6. Deploy the model as a web app (similar to what Stability AI has built)
    7. Charge by API request (bulk or singular calls)
    8. Distribute the earnings

Why bother?

  • We need alternative business models for how the current AI companies are operating
  • We need to experiment and share the outcomes of these experiments for other companies to learn 
  • We want to be as open as possible about the experiment/revenue/etc so business dashboards will be publicly accessible

Worst Case Outcomes

Best Case Outcomes


Differences from 

  • Current offerings (Stability AI)
    • You can use the generated images for commercial purposes
  • Why not use Fiverr (example). Ex: you get 2 concepts (3 revisions) for $13 + 2 days delivery
    • You get 1k revisions/images for $10, instant delivery and you get to play around

Simple Business Model

Ambiguities
  • How many images/labels do we need for the v1 model/service?
    • Stability AI CLIP model is trained on 400M images
    • But most of these images/annotations are garbage (need to cite the paper whose authors managed to train a model on 10x or 100x less iterations by discarding unneeded data)
    • We don’t need the same scale for the number of images because we will first focus on niche (logo generation) which is easier for the model to learn + we will pay money/effort in vetting the data → i.e. high quality data
    • Let’s assume 100k logos/captions needed for v1
  • How long does it take to create one logo?
    • 2-20 hours (Source: Quora)
    • Assuming 10 hours/logo
  • How long would it take to gather and vet the v1 dataset?
    • Let’s assume we want to create v1 in 1 month
  • How many experts?
    • Assuming 1 month, each expert will create ~20 logos
    • We need 5k experts (there are 300k logo services on Fiverr that we may leverage, source)
  • Initial dataset cost
    • Assuming that we pay the experts 50% of their current rate/logo because most of the value comes from the royalties
    • Currently, 2 concepts → $13
    • 1 logo is $7
    • 50% of that is $3.5
    • Assuming that we need 20k unique logos 
    • Cost for getting the 20k logos → $70k
    • Assuming 5 captions per logo
    • Cost for each caption is $0.1 → $10k (How much did we pay MTurks?)
    • Assuming 5 votes per logo/caption
    • Cost for each vote on validity of caption/logo $0.05 → $25k (How much did we pay MTurks?)
    • Total is $70k + $10K + $25k = $105k
  • Initial training cost
    • Cost for training CLIP model on 400M images → $221k - $767k (depending on the model, cite paper)
    • Let’s take the lower bound as a starting model, so ~$200k
    • We first train on 1M public domain logo images or 10M public domain images/captions 
    • Then we fine-tune on the 100k logos/captions
    • Assuming 10M + 100k
    • $5k initial training cost (assuming we don’t use the idea of the paper that trained the models in 10x or 100x less iterations)
  • Total v1 cost
    • $110k

Assumptions

  • Assuming same pricing as Stability AI DreamStudio ($10 → 1k image generations)
  • 3 ways experts can contribute (make instant money + have ownership of the model/data which means long term revenue/ )
    • Upload an image + a corresponding caption
    • Write new captions for existing images
    • Vote on the validity of an image/caption pair
  • Assuming 1k hardcore users (i.e. they use the service extensively → pay us $100/month → 10k image generations)
    • Revenue will be $100k/month
    • Number of images generated: 10M
    • Images generated evenly over the 30 days and 12 hours/days (to account for “peak” hours :D)
      • ~30K images/hour
      • 500 images/min
      • ~10 images/sec (10 TPS)
    • Assuming a single V100 GPU can handle that load
    • We will rent 2 v100 GPU machines for redundancy
    • Cost on Azure is $3/hr/1xV100
    • Cost per month for 1 machine (reverting to 24 hours/day) → ~$2k
    • Cost of 2 machines → ~$4k
    • Need to estimate cost of bandwidth and other API services used, but let’s assume they are $1k per month for this load
    • So total is $5k/month
  • For safety margin, let’s assume the costs per month are $10k +$10k (each founder take $5k salary → £3k net salary per month per founder) + partial cost of training/gathering the dataset ($10k)
    • We are including the partial cost of training/gathering the dataset because we want to model the initial investment as a cost that we need to recoup in 12 months
  • Profit will be $70k
  • Split it 10%/90% with company/experts
  • $7k for company & $63k for experts
  • We assumed 5k experts → each expert will get $12 per month (they were already paid $70 each for providing 20 logos for the v1 dataset)
]]>
Ahmad Baracat
tag:blog.barac.at,2013:Post/1868138 2022-08-14T16:46:11Z 2022-08-14T16:46:11Z [Book Notes] 12 Rules for Life

Quotes:

“Before you help someone, you should find out why that person is in trouble. You shouldn't merely assume that he or she is a noble victim of unjust circumstances and exploitation. It's the most unlikely explanation, not the most probable. In my experience —clinical and otherwise— it's just never been that simple. Besides, if you buy the story that everything terrible just happened on its own, with no personal responsibility on the part of the victim, you deny that person all agency in the past (and, by implication, in the present and future, as well). In this manner, you strip him or her of all power.”

“Success: that's the mystery. Virtue: that's what's inexplicable. To fail, you merely have to cultivate a few bad habits. You just have to bide your time. And once someone has spent enough time cultivating bad habits and biding their time, they are much diminished. Much of what they could have been has dissipated, and much of the less that they have become is now real.”

“Maybe I should at least wait, to help you, until it's clear that you want to be helped. Carl Rogers, the famous humanistic psychologist, believed it was impossible to start a therapeutic relationship if the person seeking help did not want to improve.”

“Thus, I continue helping you, and console myself with my pointless martyrdom. Maybe I can then conclude, about myself, "Someone that self-sacrificing, that willing to help someone -that has to be a good person." Not so. It might be just a person trying to look good pretending to solve what appears to be a difficult problem instead of actually being good and addressing something real.

Maybe instead of continuing our friendship I should just go off somewhere, get my act together, and lead by example.

And none of this is a justification for abandoning those in real need to pursue your narrow, blind ambition, in case it has to be said.”

“Here's something to consider: If you have a friend whose friendship you wouldn't recommend to your sister, or your father, or your son, why would you have such a friend for yourself? You might say: our of loyalty. Well, loyalty is not identical to stupidity. Loyalty must be negotiated, fairly and honestly. Friendship is a reciprocal arrangement. You are not morally obliged to support someone who is making the word a worse place. Quite the opposite.”

“It is for this reason that every good example is a fateful challenge, and every hero, a judge. Michelangelo's great perfect marble David cries out to its observer: "You could be more than you are." When you dare aspire upward, you reveal the inadequacy of the present and the promise of the future. Then you disturb others, in the depths of their souls, where they understand that their cynicism and immobility are unjustifiable. You play Abel to their Cain. You remind them that they ceased caring not because of life's horrors, which are undeniable, but because they do not want to lift the world up on to their shoulders, where it belongs.”

“Does that mean that what we see is dependent on our religious beliefs? Yes! And what we don't see, as well! You might object, «But I'm an atheist." No, you're not (and if you want to understand this, you could read Dostoevsky's Crime and Punishment, perhaps the greatest novel ever written, in which the main character, Raskolnikov, decides to take his atheism with true seriousness, commits what he has rationalized as a benevolent murder, and pays the price). You're simply not an atheist in your actions, and it is your actions that most accurately reflect your deepest beliefs--those that are implicit, embedded in your being, underneath your conscious apprehensions and articulable attitudes and surface-level self-knowledge. You can only find out what you actually believe (rather than what you think you believe) by watching how you act. You simply don't know what you believe, before that. You are too complex to understand yourself.”

“Pay attention. Focus on your surroundings, physical and psychological. Notice something that bothers you, that concerns you, that will not let you be, which you could fix, that you would fix.”

“Here's a fifth and final and most general principle. Parents have a duty to act as proxies for the real world —merciful proxies, caring proxies— but proxies, nonetheless. This obligation supersedes any responsibility to ensure happiness, foster creativity, or boost self-esteem. It is the primary duty of parents to make their children socially desirable. That will provide the child with opportunity, self-regard, and security. It's more important even than fostering individual identity. That Holy Grail can only be pursued, in any case, after a high degree of social sophistication has been established.”

]]>
Ahmad Baracat
tag:blog.barac.at,2013:Post/1867513 2022-08-12T16:12:34Z 2022-08-12T16:13:21Z On providence

Quotes

“The moment one definitely commits oneself, then providence moves too. All sorts of things occur to help one that would never otherwise have occurred… Unforeseen incidents, meetings, and material assistance, which no man could have dreamed would have come his way." J.W. Goethe

]]>
Ahmad Baracat
tag:blog.barac.at,2013:Post/1867509 2022-08-12T16:01:02Z 2022-08-15T13:38:44Z On fighting Distraction

Other keywords: phone-dependence / phone addiction / interruption addiction

Figured out that most of the below and more are amazingly articulated in Center for Humane Technology's "Take Control" article.

  • Processes
    • remove apps
    • deactivate social accounts
    • add screen time limits
    • activate Do Not Disturb modes that don’t allow any form of instant notification, even multiple successive calls
      • you will be surprised by how infrequently or rarily something truly urgent needs you and how ppl will get very creative in reaching you if absolutely needed
    • disable unuseful notifications from appearing on home screen
    • remove irrelevant and unuseful apps from screens
    • declutter/limit the number of apps pages
    • limit number of apps/icons per screen
    • disable email notifiactions from social accounts (if necessary, leave on DMs notifiactions)
    • use social accounts from the browser vs. dedicated app
    • write through proxy services (ex: Buffer to post on social media, if absolultely needed)
  • Go out without your phone for long walks to explore, try new things
  • Be unreachable
  • Go to a book store and read table of contents, find a section that catches your curiosity and start reading it
]]>
Ahmad Baracat
tag:blog.barac.at,2013:Post/1867508 2022-08-12T15:57:41Z 2022-08-12T16:07:19Z Problems of our generation ]]> Ahmad Baracat tag:blog.barac.at,2013:Post/1863983 2022-08-04T11:13:56Z 2022-08-04T11:13:56Z We only need to plant 1 trillion trees
  1. Global CO2 emissions from fossil fuels are ~35 billion tonnes (35 trillion kgs) [1]
  2. Over one year, on average, a mature tree will absorb about 22 kg of CO2 from the atmosphere [2]

Following from 1 & 2, we need to plant 1 trillion trees to offset global CO2 emissions from fossil fuels.

This doesn’t mean that we stop the other initiatives, it just means that with a simple (not easy) solution such as planting mini forests, we can offset our CO2 emissions and reap additional benefits.

The EU forest strategy for 2030 plans to plant an additional 3 billion trees. The report is very detailed and addresses concerns such as space needed to plant the trees, etc. as well as the tree planting benefits such as lowering temperature in cities by 2-8°C, diminishing air conditioning by 30% and saving 20-50% energy used for heating. [3]

The picture below is from Afforestt, a company following the Miyawaki Technique for planting dense mini forests. More than 3,000 forests have been created globally so far. One of the main benefits of this technique is that after 3 years the forests become self-sustainable and require no maintenance. [4]

Turns out that there is already a Trillion Tree Campaign. Worth checking out.

References

  1. https://ourworldindata.org/co2-emissions#global-co2-emissions-from-fossil-fuels-and-land-use-change
  2. https://www.eea.europa.eu/articles/forests-health-and-climate-change/key-facts/
  3. https://ec.europa.eu/environment/pdf/forests/swd_3bn_trees.pdf
  4. https://www.afforestt.com/methodology
]]>
Ahmad Baracat
tag:blog.barac.at,2013:Post/1717279 2021-07-25T15:39:19Z 2022-08-14T12:32:35Z Ask yourself these questions before starting something new
  • If no one is doing it, will you still do it?
  • If everyone is doing it, will you still do it?


  • If no one knows, will you still do it?
  • If everyone knows, will you still do it?
]]>
Ahmad Baracat
tag:blog.barac.at,2013:Post/1717267 2021-07-25T14:26:42Z 2021-07-25T14:26:42Z Here’s a challenge for You

Put your phone on airplane mode. 

Shut it down. 

Put in in your wardrobe. 

Leave it there for 7 days.


(the first 2 days will be hard)

(for cases of emergencies, however you define them, make sure you are still reachable somehow. Ex: email or landline or neighbors, etc.)


What’s in it for you? There is only one way to know the answer.

]]>
Ahmad Baracat
tag:blog.barac.at,2013:Post/1709110 2021-07-12T19:33:42Z 2022-10-16T16:46:28Z My tips for landing a software engineering job

Read all this list before you start applying, especially Patrick McKenzie's article (linked in the salary negotiation section).

Get your resume in shape

Show don't tell. Focus on the objective data. Cut the weasel words. Exercise attention to detail.

Let's look at a few examples from real resumes and how we can improve them:
  • "Developed advanced image recognition algorithms using opencv-python." 
    • Why not tell the reader what algorithm have you used? 
    • You may also add more details such as: the metric used for evaluation, the final performance on validation or testing, etc.
  • "Developed alongside a team of python developers an AI powered assistant bot that can recognize speech patterns and respond or help out with tasks accordingly." 
    • I would add how big the team was and would use "engineers" instead of "python developers". Ex: Designed, alongside a team of 3 engineers,...
    • I would be more specific about what I mean by AI. Ex: a logistic regression powered bot...
    • I would be more specific about what the bot did. Ex: to execute PC command using voice input...
    • I would highlight the results, if they exist. Ex: allowing engineers to save 20% of time when setting up new repositories.
    • The final sentence would then be "Designed, alongside a team of 3 engineers, a logistic regression powered bot to execute PC command using voice input allowing engineers to save 20% of time when setting up new repositories."
For more concrete examples, have a look at Gayle McDowell's (author of Cracking the Coding Interview book) resume advice and this article by Lewis Lin describing the writing culture at Amazon.

Land an interview

Don't apply directly, get referred instead. Genuinely reach out to people on LinkedIn. Attend conferences or meetups or host ones if you don't find meetups locally.

  • Usually, applying directly to job openings online should be your last resort and not your first. Instead, you want to get referred by friends, former colleagues or same-university alumni who work at the company/startup you want to work or who know someone who does. 
  • If finding friends or friends of friends to refer you is not possible, you can ask current employees on LinkedIn for their time. Ask them genuine questions (that you hopefully have) about the company, its culture, etc. They may tell you about other positions/tracks inside the company that you didn't know might be a better fit, they may even suggest to refer you.
  • Attend conferences or meetups to expand your network (i.e. get to know other people who share your interests). 
  • If you can't find local meetups about topics you care about, you can create your own meetup.

Prepare for interview

  • For the time being, you can expect the interview process, especially from junior to senior levels, to involve a lot of LeetCode-style questions. So make sure to practice these types of questions on your preferred platform (not necessarily LeetCode though).
  • Additionally, you can expect design interviews. Depending on the position you are applying to, these might be machine learning design or system design or other types
  • There is also the "behavioral" aspect of the interview process. There could be a dedicated interview to assess cultural-fit, talk about previous work situations and how you dealt with them and what did you learn.
  • Finally, it is important to know that implicitly, all your interviewers are assessing your communication skills, ability to deal with ambiguity, etc.

Pro Tip: watch mockup interviews on Youtube (I personally like this channel), but also try to do mockup interviews with others (either friends or other software engineers whom you might pay for their time).

Salary negotiation

    I am not an expert :), but I highly recommend going through the following 2 articles:

    Other resources

    If you are Arabic speaking, I have created 2 short courses to get you started in Data Science as well a short course with career tips (the course I wish existed when I graduated university), which are free to access.
    ]]>
    Ahmad Baracat
    tag:blog.barac.at,2013:Post/1709093 2021-06-30T21:24:38Z 2022-07-14T13:43:04Z Things I wish I did when I first moved to the UK

    This will be a live document, which I will keep adding to as I learn.

    Open 2 bank accounts

    Open a bank account with a well known "physical" bank (ex: HSBC, Barclays, etc.) as well as a digital bank (Ex: Monzo, Revolut, etc.). The idea is to have your income deposited in the physical bank along with rent and utilities payments whereas the digital bank is to be used for your day-to-day payments. This setup has a couple of benefits:

    Why have a physical bank account with a reputable bank

    • Helps later on with getting a mortgage, loan, etc.
    • Ability to visit the branch and talk with humans for financial advice

    Why have a digital bank account

    • The digital bank account is used as proxy especially in online payments (so that the physical account is not exposed online)
    • Since you have to move money from the physical account to the digital, you get more aware about your spending habits as opposed to using a single account
    • Digital banks, being the new kid on the block, are usually way ahead of physical banks in terms of online/app features

    Open an ISA account as soon as you start earning

    Get into the habit of investing a portion of your monthly income, be it in stocks or other instruments. These small monthly investments will compound over the long run.

    Escape the cities to recharge

    It is easy to get caught up in the day-to-day routine and to forget to recharge. In my case, this led several times to mild depressive episodes. Remember that the UK offers easy access (hop on a train and you are good to go) to the countryside and seaside.

    Don't do sports alone all the time

    For the most part in the UK, I got used to run alone, which helps in clearing my mind and to listen to my own thoughts patterns. However, very recently, I came to realize that the UK is full of sport clubs (running, cycling, etc.) and it helps, from time to time, to practice in a group. It is also a great way to connect with others who share similar sport habits/passions.

    ]]>
    Ahmad Baracat
    tag:blog.barac.at,2013:Post/1571113 2020-07-09T14:17:42Z 2020-07-11T11:54:23Z How Machine Learning Pipelines Evolve Based on your Business Maturity

    Disclaimer

    • This article is intentionally high-level and devoid from technical jargon.
    • That said, at times, I will mention a few technologies/services to give you more concrete examples. For these examples, I will be referring AWS services but you should be able to find similar offerings on other cloud providers.

    The audience

    This article is an overview of how an ML pipeline would look like depending on the stage your team or company is in and what to keep in mind at each stage. The main audience are engineers & scientists working on ML productionization and executives or their equivalent in startups (ex: CTOs) who want to have a broad overview of the topic.

    Background

    During my tenure at Amazon, I was very lucky to be involved in multiple projects designing and building ML pipelines.

    In 2017-18, I have designed and built an end-to-end ML pipeline with the different components discussed below for Amazon Advertising: the team responsible for running all personalized display advertising on behalf of Amazon worldwide with millions of Transactions per Second (TPS) and <100 ms latency.

    In 2018-19, while working on Amazon Alexa, we had to manage the deployment of our models, but being a newly formed team, we faced different trade-offs as compared to Amazon Advertising.

    During the past few years, I have also consulted startups and individuals on managing their ML engineering-related problems.

    Why Invest in an ML Pipeline

    Imagine you are Amazon Advertising and you are using ML models to predict how much to bid on a given Ad slot, or you are Amazon Alexa and you want to filter down offensive images on multi-modal devices, or you are a startup in Egypt building eKYC solutions and you need to have models to analyze national ID cards. In all of these cases, ML models are a core component of your value proposition and technical barrier to entry.

    For these reasons, you need a reliable way of managing them. Managing in this case would not only mean building a reproducible way of deploying them to production, but also keeping an eye on these models using monitoring and alarming as well as other aspects discussed below.

    What is an ML Pipeline

    From a bird’s-eye view, an ML pipeline has a few components.
    1. Dataset generation: responsible for fetching, joining and aggregating data from the different sources, cleaning the data, applying different filters (ex: subsampling) and generating cleaned training and validation datasets
    2. Training: responsible for training the ML model, performing hyper parameter optimization and generating a trained model 
    3. Validation: responsible for running the trained model on the validation dataset and generating validation metrics
    4. Deployment: responsible for copying the trained model to production service (if the model passes certain validation criteria and is outperforming the current model in production)
    5. Monitoring & Alarming: there are at least 2 sets of metrics and alarms depending on the role:
      • Engineering: responsible for metrics such as model latency, memory usage, CPU usage, etc.
      • Science: responsible for metrics such as model bias, input or output distribution drift, etc.

    Business Maturity/Stages

    Note: It is crucial to understand and be explicit about the trade-offs involved in technical decisions. That’s why I will highlight a number of trade-offs associated with each stage

    Starting Up

    (doing things that don’t scale is the name of the game)

    If you are just starting up, don’t be ashamed of using the least friction path for having an ML pipeline. At this stage, you are not sure yet if it is worth it to invest engineering effort building a robust ML pipeline. What you need at this stage is a cheap & quick way of getting models from development environments to production. This will allow you to assess how much value is it creating for your business. In essence, you are trading speed of experimentation with robustness.

    For that reason, dataset generation, training, validation and deployment could just be a series of Python scripts that you run manually following a runbook (checked-in in your Git repo or posted on your internal wiki). Depending on your use case, monitoring & alarming may be a bonus point. Whichever path you choose, make sure that there is a reproducible and documented way of getting models into production. This is crucial and will come handy when things go awry.

    Note that if you are already familiar and can move fast by using off-the-shelf services like SageMaker then by all means use it. The important thing at this stage is to optimize for moving fast and experimenting.

    Short to Medium Term

    Once you have validated that having an ML model in production actually provides value for your business and hopefully have a sense of how much value it is creating, you can justify investing engineering effort in making the ML pipeline more robust.

    Note that, at this stage, you still value moving as fast as you possibly can and thus you want to spend the least time possible on engineering or maintenance of ML pipelines. For example, you would rather spend more engineering time building more valuable features for your customers.

    For these reasons (improving robustness while keeping the investment low), you should consider using off-the-shelf services like SageMaker (which supports all core components of ML pipelines mentioned above). Such services will allow you to have robust ML pipelines with auto-scaling, ease of rollback, etc. without investing too much engineering time. Note that there there is still effort to be invested in setting things up, but for the most part, once that cost is paid, it should be “hands off the wheel” experience in terms of deployment and scaling.

    However, SageMaker will cost more than running your own infrastructure on EC2 for example. The trade-off then is the cost on one hand and robustness & speed of experimentation on the other hand. That cost can be justifiable at this stage since you are saving precious engineering time.

    Long-Term Investment (Custom Build)

    Assuming everything goes well and your business is thriving (ex: hundreds, thousands or millions of TPS), you may start to care more about cost as well as having more control over your ML pipeline since the off-the-shelf solutions may not be enough in terms of feature support. You are also willing to invest more engineering effort to build a custom solution and to maintain it. You are essentially trading off customization, cost and adding new features with engineering & maintenance effort.

    Concretely, you will borrow the features that you care about from the off-the-shelf services and add more advanced custom ones depending on your use case. For example, at this stage, the pipeline may look like follows:
    1. Dataset generation is automated and run at specified schedule (ex: daily or dynamically when input or output model distribution changes in production): a cluster (ex: Spark) is spun up, runs a script to gather, transform and output the data to a durable place (ex: S3)
    2. Once a new dataset is generated, this will trigger the training process. The output of the training step may not only be the trained model, but also other meta information (ex: hyperparameters used, dataset version, model version, etc.)
    3. Once the model is trained, a validation step is triggered and it compares how the newly trained model performance against the current model in production (ex: comparing their performance on a Golden Standard dataset but beware of dataset drift)
    4. If model passes validation, the model gets deployed to production
    5. Monitoring service then kicks in and keeps an eye on the important metrics and triggers alarms when an anomaly is detected and may support rolling back a model automatically
    6. You also have the ability to manually intervene and roll back your models

    Wrap-up

    I hope you got a good overview of what an ML pipeline is, why build one and a few points to keep in mind depending on the stage your company or team is in.

    If you are interested in learning more about the topics discussed above or related ones, please let me know in the comments below. You can also subscribe below to get notified when I post a new article.

    If you are Arabic speaking, I have created 2 short courses to get you started in Data Science as well a short course with career tips (the course I wish existed when I was in the last few years of university or just graduating), which are free to access.
    ]]>
    Ahmad Baracat
    tag:blog.barac.at,2013:Post/1571172 2020-06-06T14:12:00Z 2022-11-12T12:46:34Z My Experience Earning the Tier 1 Global Talent UK Visa via TechNation

    Logistics

    Let’s get the logistics out of the way:

    • This is not an official guide, for official guidance check gov.uk and Tech Nation guide
    • This is just my personal experience so take it with a grain of salt
    • Exceptional Talent (Tier 1) visa got renamed to Global Talent visa in Feb. 2020, but they are essentially the same visa. I personally like the Exceptional Talent naming as it feels more prestigious 😇

    Background Story

    You can safely skip this section if you are here for the meat 😅

    In Jan 2019, a year before I actually applied to the visa, I took a step back to think about where I want to live and what I want to do moving forward. At that point in time, I have been in the UK for 1.5 years. I have just moved to London a few months earlier to work for Amazon Alexa as a software engineer whereas previously I was working for Amazon Dynamic Advertising in Edinburgh, Scotland.

    Back then I reached the conclusion that I want to work independently to answer a few questions in my mind, gain more freedom, which I really missed by working as employee and just to explore. Equally important, I needed to maximize my ability to travel back and forth between wherever I am living and Cairo to see my family. The issue was that being on a Tier 2 as well as working for Amazon constrain me from registering a business or working as self-employed or travel as I please.

    I started exploring the potential of moving to different countries and cities that would allow me to work independently, but that’s not the topic of this essay. Eventually, I reached the conclusion that the UK/London was probably the best country/city, which fits my personal living preferences (warm weather, travel time to Cairo, opportunities, liveliness, among other criteria) as well as the ability to work independently if I switch to another visa.

    In the beginning, I quickly discarded the Tier 1 Exceptional visa because the official requirements on the UK gov. website made me realize that I needed to be, ahem, “exceptional”, which I didn’t think I am at the time (or now to be honest 😅). Unique, yes, but exceptional, I don’t think so. This left me with only one option, which was the Startup visa. In order to get that visa, you need to be endorsed and although I tried contacting a couple of the endorsement bodies, they either didn’t reply back, haven’t started accepting applications (since it was a new visa at that time), and so on and so forth.

    Since I couldn’t progress on the Startup visa, I started digging again and try to understand if I can by any chance fit the Tier 1 Exceptional visa requirements. My viewpoint shifted from “I can’t apply to this visa” to “I can potentially get endorsed” happened when I was reading the Tech Nation very detailed guide on how to apply. In there, they listed the kind of example evidences that I can present in my application. Going through these, I realized I can present many of these evidences.

    Impostor Syndrome

    The most important idea I want to leave you with is don’t assume that you can’t apply, validate it (i.e. read the requirements, talk to other people who applied, look at previous applications, etc.), which I believe is an important lesson in life in general. As I said, if you have a quick look at the gov.uk website, you can easily reach the conclusion that the Tier 1 visa is not for you, but if you start digging a little deeper, you may realize that you can do it.

    “Don’t assume, validate” - Ahmad Baracat

    Benefits of Switching

    To me, the main benefits of switching from tier 2 General visa to tier 1 Exceptional Talent visa are:

    • Not needing sponsorship, which makes it easier to switch companies
    • Freedom to apply to companies and startups that don’t offer sponsorship
    • Freedom to start your own businesses in the UK or be self-employed
    • Freedom to leave your company, stay in the UK or travel outside and be able to come back

    The Process

    1. Application preparation
      1. Gather evidences: If you are a software engineer, these might include open source contributions, side-projects, invitations to speak at Conferences, articles written, etc.
      2. Write your personal statement
      3. Ask for reference letters
    2. Stage 1: you are essentially uploading your prepared application to a portal (to be assessed by Tech Nation if you are applying within the digital technology field)
    3. Stage 2: you are applying for the visa and filling immigration related questions such as criminal convictions, being deported from countries, etc. as well as choosing a biometric appointment
    4. Biometric appointment
    5. Receive your new Biometric Residence Permit (BRP)

    The Timeline

    Even though, your timeline may vary, I wanted to share mine to give you an idea of how it might look like:

    • Few days of side-work to prepare the application
    • 28th Dec 2019: Apply to stage 1
    • 20th Jan 2020: Receive Tech Nation endorsement
    • 23rd Jan 2020: Apply to stage 2 + Visa Appointment
    • 29th Jan 2020: Stage 2 successful
    • 31st Jan 2020: Receive BRP

    Fees

    • £456: Stage 1 (if you don’t get endorsed you only lose this unless you applied to both stages together)
    • £1200: Health Surcharge (I applied for 3 years 3 x £400)
    • £171: Stage 2
    • £112: Visa Appointment

    If you think £2K is a lot to switch visa, let me just remind you that you are essentially buying benefits of switching, which to me justifies the investment.

    Tips

    • If you decide to apply, don’t wait, requirements may change and may get harder. For example, when I was reading the guidelines back in June 2019, I only needed 2 reference letters to apply. When I was applying in Dec. 2020, I got surprised that they increased it to 3 😅.
    • When asking for reference letters, make sure you make it so easy for referee to write it:
      • Remind them of what you did and what they should write about
      • Provide them with a template that they can easily fill
    • If you don’t have the referee resume, link to their LinkedIn on the reference letter instead. For example, one of my referee, being a professor, got very busy and it was hard to get them to reply to my emails.

    Application Example

    If you have checked the gov.uk as well as the Tech Nation guide and you think you might benefit from looking at how a complete application might look like, check my application example.

    ]]>
    Ahmad Baracat