Journal Entries

Journal Entry, January 12th to January 19th 2024

January 19, 2024 · 3 min read

Laravel

I ran into an issue using the groupBy() method for collections where I wanted to "roll up" line items to list a single SKU and the quantity instead of one item per entry. This worked fine until "SKU sharing" was introduced where the same SKU could have different price points. In this case, it was a free opening night but the concept of "early bird pricing" was also introduced, hoping to share the same SKU. I'm unsure if this is against typical eCommerce rules but this is to synchronize orders into a centralized system, not a widespread concern.

A Deeper Look

The code looked something like:

return $registrations->groupBy('ticketLedgerCode')->map(function (Collection $registrations, string $key) {
    $quantity = $registrations->count();
    $registration = $registrations->first();

    return [
        'line_type' => 1,
        'item_number' => $registration?->ticketLedgerCode,
        'description' => $registration?->ticketName,
        'unit_price' => $this->getAmountFromStripe($registration?->price),
        'discount_code' => $registration?->promoCode,
        'discount_price' => $registration?->price,
        'discount_charge' => $registration?->charge,
        'quantity_ordered' => $quantity,
        'quantity_UOM' => "EA", // default
        'start_date' => date('Y-m-d'),
        'duration' => 12,
    ];
})->values()->toArray();

You may notice part of the problem immediately. The groupBy method converts an existing collection into a subcollection where $registration is the group of distinct ticketLedgerCode. Calling first() here is where the problem originates as this collection can have different prices that I didn't account for. After mapping over the result we use values() because the collection key is the ticketLedgerCode which we don't really need, we only care about the underlying result.

Correction

The groupBy method takes an array but it was harder to reason about for me and more difficult to traverse to the point I cared about. In the example above, $registrations would have the price as a key and a collection of the items that matched that price.

The result of the nesting looks something like:

{
  "123456789001": {
    "25900": [
      (registration)
    ]
  }
}

The approach that was easier to reason about was to nest another groupBy clause for the price field:

return $registrations->groupBy('ticketLedgerCode')->map(function (Collection $itemNumbers, string $key) {
    return $itemNumbers->groupBy('price')->map(function (Collection $prices, string $key) {
        $quantity = $prices->count();
        $registration = $prices->first();

        return [
            'line_type' => 1,
            'item_number' => $registration?->ticketLedgerCode,
            'description' => $registration?->ticketName,
            'unit_price' => $this->getAmountFromStripe($registration?->price),
            'discount_code' => $registration?->promoCode,
            'discount_price' => $registration?->price,
            'discount_charge' => $registration?->charge,
            'quantity_ordered' => $quantity,
            'quantity_UOM' => "EA",
            'start_date' => date('Y-m-d'),
            'duration' => 18,
        ];
    })->values();
})->values()->flatten(1)->toArray();

I had originally used first() instead of flatten(1) as I incorrectly assumed the outer array would always be the same but in the case of different SKUs the output is different than the JSON above:

{
  "123456789000": {
    "19900": [
      (registrations)
    ],
    "25900": [
      (registrations)
    ],
  },
  "123456789001": {
    "19900": [
      (registrations)
    ],
    "25900": [
      (registrations)
    ]
  }
}

The JSON notation is a little misleading using objects vs arrays but in Laravel and PHP in general this is represented as nested associative arrays.

Aside

Turns out there is an approach using groupBy(['ticketLedgerCode', 'price']) that I just worked out by creating this journal entry. Instead of $itemNumbers->groupBy('price')->map the internal section should just be $itemNumbers->map. If I were to nest a 3rd or 4th time, I would need to map down to those levels and flatten(1) would roll everything back together.

Conclusion

The real superstar here is flatten(1) as I incorrectly assumed it was the same as flatten(), but it isn't. The method defaults to infinity and will turn an array of records into an array of every single property, which is not what I wanted.

I'm not going to change the code to use an array as I don't mind the explicit groupBy call.

I had also tried using both Cody and ChatGPT to explain the issue and while Cody would generally do better during its beta, it wouldn't produce what I was looking for. ChatGPT 3.5, on the other hand, correctly intuited that flatten(1) was what I needed. AI code tools are great for explanations, and correct completions in Cody are like magic, but the hallucinations range from subtle to extremely problematic.

Journal Entry, July 14th to July 21st 2023

July 21, 2023 · 2 min read

MariaDB

Wed Jul 19 I ran into an issue where I needed to process a number of Stripe webhooks stored in a webhook_calls table. To mark progress, I thought it would be useful to join the invoices table but I ran into a snag. In the webhook payload, the JSON has double quotes so an invoice shows as something like "in_" and in the invoices table it isn't quoted as the varchar covers it. I looked into the QUOTE() function first but it turns out TRIM('"' FROM) is what I was after. This trims the leading and ending quote characters to match the invoices datatype.

An example of the SQL statement used:

SELECT
 webhook_calls.id,
 payload_type,
 JSON_EXTRACT(webhook_calls.payload, '$.data.object.id') AS `type_id`,
 JSON_EXTRACT(webhook_calls.payload, '$.data.object.object') AS `type`,
 JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].metadata.user_id') AS `user_id`,
 JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].plan.nickname') AS `nickname`,
 JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].plan.metadata.base_sku') AS `base_sku`,
 webhook_calls.created_at
FROM
 webhook_calls
LEFT JOIN
 invoices ON invoices.invoice_id = TRIM('"' FROM JSON_EXTRACT(webhook_calls.payload, '$.data.object.id'))
WHERE
 MONTH(webhook_calls.created_at) = 7
 AND YEAR(webhook_calls.created_at) = 2023
 AND JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].plan.metadata.base_sku') IS NOT NULL
 AND invoices.invoice_id IS NULL
ORDER BY
 JSON_EXTRACT(webhook_calls.payload, '$.data.object.lines.data[0].metadata.user_id')
 DESC,
 payload_type ASC

This gets the webhook_calls records that have been created July 2023 with a base_sku not null as this processor ignores any products that aren't explicitly configured. This paired down thousands of events into a manageable number. We process only invoice events so this was enough to create a list where I could manually match up the invoice.created to its pair invoice.payment_succeeded event. I thought about automating this but it wasn't that many records to deal with and I wasn't sure what would happen if this was outside of artisan tinker. I needed the jobs to be synchronous to make sure the invoices record was created first, followed by the subscriptions that gets created from the second event.

If I were to automate this I would sort by created_at, find the first event, dispatch, then find the second and sleep for a second before dispatching again. As a sanity check I could check that the invoices was created and subscriptions after that.

I'm not the biggest fan of MariaDB as I really wish it had MySQL's -> and ->> shorthand for the JSON functions but this SQL is compatible with both systems.

Journal Entry, June 23rd to June 30th 2023

June 30, 2023 · 2 min read

Digital Ocean Spaces

Due to an issue with Digital Ocean Spaces in the NYC3 region, we migrated a 16TB bucket to their newest SFO2 region. The primary bottleneck we faced was ListObjectsV2 was exceptionally slow after the bucket filled and the user load began hammering it. We tried numerous troubleshooting steps on the same region and would see some improvement before periodically crashing. It also was not consistent where things would be fine some days but not others, and I suspect load was a contributing factor.

Why mention the backstory? Because I think it's worth highlighting that sometimes the quickest solution in any cloud offering is to move regions. Hindsight was 20/20 on the whole ordeal.

k6

Where does k6 fit into this? My task was to load-test the new data center for peace of mind, knowing we couldn't simulate the type of load our users brought. I could test our droplet that leveraged Spaces, but the rate limits made that more troublesome than necessary.

This issue describes the problem I had, as well as my naivete showing some rookie mistakes. Understanding how to alter the host was crucial as Spaces do not conform to the s3.region.amazonaws.com format of AWS but use region.digitaloceanspaces.com instead.

With that out of the way, I tinkered with reimplementing listBuckets to see if I could isolate the problem. In a stroke of dumb luck and getting Insomnia functional using the insomnia-plugin-aws-iam-v4 plugin for authentication, I stumbled on the fix.

It turns out that instead of signedRequest.body || '' the snippet should be:

const res = http.request(method, signedRequest.url, signedRequest.body || null, {
    headers: signedRequest.headers,
})

Spaces cannot seemingly process a blank body, but the body can be null.

Conclusion

The problem was a fluke we've never seen in 6 years of leveraging Digital Ocean Spaces. It was intermittent to where we would never see it happen until recently when it affected everyone. Region switching wasn't easy, as several processes are slower due to droplets being in NYC3. Most problems did not require this response for some of our most annoying corrections.

I had also given up numerous times on finding a solution with k6 because it's an implementation of JavaScript on a Go runtime, so we can't always throw in random JS packages and hope for the best. I also concluded we didn't need to load test services that we couldn't stress too much.

I'm not a fan of giving up on anything though I know I can't win every battle. I don't technically have a solution here either, as my attempts to parse the XML using the same methods aren't producing the results I expected. I believe this change would possibly be a net positive for S3-compatible stores like Minio, where I would want to stress-test an instance on Fly.io or other droplets.

Journal Entry, February 24th to March 3rd 2023

March 3, 2023 · 2 min read

Postgres

I have a project that compliments Laravel Forge by storing a centralized copy of the .env file for all sites and servers in the fleet of Digital Ocean droplets.

There is an environment_raw text column for the .env file with an embedded environment jsonb column that has an id, key, and value tuple when we look at the field in something like TablePlus.

Get a distinct list of environment variable keys

WITH fields_rowset AS (
    SELECT
        jsonb_array_elements(environment) AS fields
    FROM
        sites
) SELECT DISTINCT
    fields -> 'key' AS key_name
    -- fields -> 'value' AS key_value
FROM
    fields_rowset
WHERE
    fields -> 'value' IS NOT NULL
ORDER BY
    key_name ASC

Get a list of sites where STRIPE_SECRET is the specified value

The following queries utilize a few different methodologies for searching the embedded schema. We start with a singular key-value pair.

SELECT
    *
FROM
    sites
WHERE
    environment @> '[{"key": "STRIPE_SECRET", "value": "sk_test_DUMMY"}]';

Get a list of sites where STRIPE_SECRET is one of the specified values

SELECT
    *
FROM
    sites
WHERE
    environment @> ANY (ARRAY['[{"key": "STRIPE_SECRET", "value": "sk_live_DUMMY"}]']::jsonb[]);

Get a list of sites where STRIPE_SECRET is any value

SELECT
    *
FROM
    sites
WHERE
    environment @> ANY (ARRAY['[{"key": "STRIPE_SECRET"}]']::jsonb[]);

Get a list of sites where any variable has the specified value

SELECT
    *
FROM
    sites
WHERE
    environment @> ANY (ARRAY['[{"value": "sk_live_DUMMY"}]']::jsonb[]);

Get a list of sites where STRIPE_KEY or STRIPE_SECRET are their specified values

The array and any keywords are more versatile approaches but require casting to jsonb explicitly. There is another syntax like environment -> 'key' @> '"APP_DEBUG"' but I can't produce the expected result. It's an empty result despite being configured in just about every Laravel environment.

SELECT
    *
FROM
    sites
WHERE
    environment @> ANY (ARRAY['[{"key": "STRIPE_KEY", "value": "pk_test_DUMMY"}]', '[{"key": "STRIPE_SECRET", "value": "sk_test_DUMMY"}]']::jsonb[]);

Conclusion

While it is redundant for an embedded schema of key/value pairs, it makes the data model easier to query than guessing which variables are present on any given system. Some machines have WordPress installed with totally different settings than Laravel. There are also multiple Laravel versions where certain variables have changed names throughout the years.

There are shortcomings to the approach as this is separate data to Forge or the servers themselves, but it is a much faster interface than making API calls to every server to find this information. One shortcoming was not having a versioned schema for historical data for auditing reasons.

After creating a very rudimentary version of this, I realized tools like Doppler or Infisical offer a more robust centralized place for secrets management far beyond what my initial version could deliver.

Journal Entry, October 22nd to October 29th 2021

October 29, 2021 · 1 min read

Phoenix Framework

Mon Nov 1 I took the weekend to work through how to add a name column to the scaffolding generated by mix phx.gen.auth to more closely match Laravel Jetstream's profile update flow. In Jetstream, the photo, name, and email address are part of the same change request.

For gen.auth, email changes are gated behind a validation request happening first. This makes sense, you want to make sure the email is valid and accessible before you change the user to it. The problem with this is that lumping the name into the same changset creates a slight code smell.

To carve out saving name involves somewhat of a side effect as we modify email_changeset in lib/opportunity_knocks/accounts/user.ex to allow the pattern %{changes: %{name: _}} = changeset -> changeset |> delete_change(:email) in the case statement. This allows name to go through but we push email out of the change in the event both name and email are updated at the same time. This lets us change apply_user_email in lib/opportunity_knocks/accounts.ex to use the snippet:
```
|> case do
  %{changes: %{name: _}} = changeset -> changeset |> Repo.update()
  %{changes: %{email: _}} = changeset -> changeset |> Ecto.Changeset.apply_action(:update)
```
This is where the smell is particularly strong as I'm doing the same pattern matching to allow the change in the case of name and ignore it for email. I believe as I gain more Elixir knowledge I may refactor this but I also know how deviation can be due to my exposure to Laravel Shift. Shift uses a series of patches to perform framework upgrades. Files that are not altered get changed more easily and when a deviation is detected (merge conflict), Shift goes with its version so you are consistently applying your changes on top of the now improved version. This is the approach we would all take given time or time machines.

Journal Entry, November 27th to December 4th 2020

December 4, 2020 · 1 min read

Laravel

Tue Dec 1
- Forgot to transfer environment variables for the following:
```
QUEUE_CONNECTION=redis
REDIS_DB=0
REDIS_CACHE_DB=1
REDIS_HORIZON_DB=2
```
- This caused 2 issues, horizon was no longer split between databases but since 0, 1, and 2 are the defaults this really didn't impact anything.
- The bigger problem was QUEUE_CONNECTION defaulted to synchronized jobs instead of via redis so Horizon had nothing to do.
- Getting the environment variables and horizon working didn't show any change, I believe because the data seeded in the redis database was trying to mix earlier version Horizon 2.x with 3.x usage patterns.
- Flushing the db with flushdb reset horizon and now running jobs via dispatch() show up as expected.

Journal Entry, November 13th to November 20th 2020

November 20, 2020 · 1 min read

MySQL

Tue Nov 17
- Research into keeping the latest x rows
- You can't use skip without a take
- You cant delete with a skip/take (some mysql limitation)
```
DELETE FROM `apiLogs`
WHERE id <= (
    SELECT id
    FROM (
        SELECT id
        FROM `apiLogs`
        ORDER BY id DESC
        LIMIT 1 OFFSET 42 -- keep this many records
    ) foo
)
```
- Ended up going with a more generic version as rather than start from now we can just start at zero and delete up to the count we're looking for.
```
SELECT
    `id`
FROM
    `apiLogs`
WHERE
    `application_id` NOT in('SAGESYNC', 'SAGESYNCMS', 'SAGESYNCXC4')
ORDER BY
    `created_at` ASC
LIMIT 3500
```

Journal Entry, November 6th to November 13th 2020

November 13, 2020 · 2 min read

Laravel

Mon Nov 9

To introduce a blocklist to filter out apiLog records, we first filter by application_id and then 1 or more arrays of values.
This lets us do thing like filter multiple statuses, methods, or uris.
Ideally we could take it a step further and filter by only status 422 or DELETE methods but the 1 or more aspect makes things seemingly more difficult.
Initially my brain halted using nested conditionals in a "if value is present then check the other fields" repeated.
The null coalescing operator is doing this for us by either returning null when nothing is found or the result of the in_array check if present.
Because something like null ?? false ?? true returns true, it feels almost hackish but I suspect works for any field we throw at it.

My first attempt was to have an array of application_id but this made the Collection->where() usage more difficult.

{
    "blocklist": [
        {
            "application_id": ["SAGESYNC", "SAGESYNCMS"],
            "status": ["422"]
        },
        {
            "application_id": ["SAGESYNCXC4"]
        }
    ]
}

Ultimately we had to settle on this form as the where() clause looks for values.

{
    "blocklist": [
        {
            "application_id": "SAGESYNC",
            "status": ["422"]
        },
        {
            "application_id": "SAGESYNCMS",
            "status": ["422"]
        },
        {
            "application_id": "SAGESYNCXC4"
        }
    ]
}

/**
 * Filter columns against configured blocklist.
 *
 * @param string|null $application_id
 * @param string|null $method
 * @param string|null $uri
 * @param string|null $status
 * @return void
 */
public function isBlocked($application_id, $method, $uri, $status)
{
    $result = false;
    $blocklist = collect($this->getBlocklist());
    $blocked = $blocklist->where('application_id', $application_id);
    $blocked->each(function ($block) use (&$result, $method, $uri, $status) {
        $methods = Arr::get($block, 'method', null);
        $uris = Arr::get($block, 'uri', null);
        $statuses = Arr::get($block, 'status', null);
        $filtered = $this->hasValue($methods, $method) ?? $this->hasValue($uris, $uri) ?? $this->hasValue($statuses, $status);
        if (is_null($filtered) || $filtered === true) {
            // $filtered is true when a value is found or null when all fields are null.
            $result = true;
        }
    });

    return $result;
}

/**
 * An array has value (has normally checks for keys not values).
 *
 * @param array|null $array
 * @param mixed $value
 * @return boolean
 */
private function hasValue($array, $value)
{
    if (is_null($array)) {
        return null;
    }

    return in_array($value, $array);
}

/**
 * Retrieve blocklist section from config file.
 *
 * @return null|array
 */
private function getBlocklist()
{
    $localDisk = \Storage::disk('local');
    $contents = $localDisk->get("data/apiLogs/config.json");
    $json = json_decode($contents, true);

    if (!is_null($json)) {
        return $json['blocklist'];
    }

    return null;
}

Journal Entry, October 30th to November 6th 2020

November 6, 2020 · 1 min read

Laravel

Mon Nov 2
- The process has been signaled with signal "9".
- Ran out of virtual memory, the local php setup is having issues and should be restarted. When running in a VM, the VM memory may be exhausted.
- See https://laracasts.com/discuss/channels/laravel/the-process-has-been-signaled-with-signal-9
```
valet restart
```

Elixir

Fri Nov 6
- Elixir Koans upgraded with broken functionality.
- Took a minute to try to install Erlang 20.0 (see below).
- Once I got Erlang 20.x installed, realized Elixir 1.5.x is no longer required.
- Recent changes introduced a break where now Elixir 1.10+ is required.
- The docker image and my local .tool-versions for asdf no longer work.
  - Google: asdf erlang make[4]: *** [obj/x86_64-apple-darwin19.6.0/opt/smp/inet_drv.o] Error 1
  - Clicked https://github.com/asdf-vm/asdf-erlang/issues/158
  - Follow to https://github.com/asdf-vm/asdf-erlang/issues/157
  - Correct via https://github.com/asdf-vm/asdf-erlang/issues/157#issuecomment-696037378
```
export CFLAGS="-O2 -g -fno-stack-check -Wno-error=implicit-function-declaration"
asdf install erlang 23.0.3
```

Journal Entry, March 20th to March 27th 2020

March 27, 2020 · 1 min read

MariaDB

Wed Mar 25

MariaDB generated JSON columns are possible if we unpack the syntax.

// $table->string('payload_type')->storedAs('payload->>"$.type"')->after('name');
$table->string('payload_type')->storedAs("JSON_UNQUOTE(JSON_EXTRACT(`payload`, '$.type'))")->after('name');