cbloom rants

5/21/2015

05-21-15 - Umm

I sent a lawyer an email yesterday.

Today they sent me back an email saying :

"I need your email address so I can send you the documents you need to sign"

Umm... you are not inspiring great confidence in your abilities.

Also, pursuant to my last post about spam - pretty much all my correspondence with lawyers over the past few months, Google decides to put in the spam folder. I keep thinking "WTF why didn't this lawyer get back to me - oh crap, go check the spam". Now, I'm totally down with the comic social commentary that Google is making ("ha ha, all email from lawyers is spam, amirite? lol"). But WTF your algorithms are insanely broken. I mean, fucking seriously you suck so bad.

05-21-15 - Software Patents are Fucking Awesome

Awesome. Spotted on encode.ru. It was inevitable I suppose :

"System and method for compressing data using asymmetric numeral systems with probability distributions"

By these tards :

Storleap

Someone in the UK go over and punch them in the balls.

For those not aware of the background, ANS is probably the biggest invention in data compression in the last 20 years. Its inventor (Jarek Duda) has explicitly tried to publish it openly and make it patent-free, because he's awesome.

In the next 10 years I'm sure we will get patents for "using ANS with string-matching data compression", "using ANS with block mocomp data compression", "using ANS as a replacement for Huffman coding", "deferred summation with ANS", etc. etc. Lots of brilliant inventions like that. Really stimulating for innovation.

(as has happened over and over in data compression, and software in general in the past; hey let's take two obvious previously existing things; LZ string matching + Huffman = patent. LZ + hash table = patent. JPEG + arithmetic = patent. Mocomp + Huffman = patent. etc. etc.)

(often glossed over in the famous Stac-Microsoft suit story is the question of WHAT THE FUCK the LZS patent was supposed to be for? What was the invention there exactly? Doing LZ with a certain fixed bit encoding? Umm, yeah, like everyone does?)

Our patent system is working great. It obviously protects and motivates the real inventors, and doesn't just act as a way for the richest companies to lock in semi-monopolies of technologies they didn't even invent. Nope.

Recently at RAD we've made a few innovations related to ANS that are mostly in the vein of small improvements or clever usages, things that I wouldn't even imagine to patent, but of course that's wrong.

I've also noticed in general a lot of these vaporware companies in the UK. We saw one at RAD a few years ago that claimed to use "multi-dimensional curve interpolation for data compression" or some crackpot nonsense. There was another one that used alternate numeral systems (not ANS, but p-adic or some such) for god knows what.

They generally get some PhD's from Cambridge or whatever to be founders. They bring a bunch of "industry luminaries" on the board. They patent a bunch of nonsense. And then ...

... profit? There's a step missing where they actually ever make anything that works. But I guess sometimes they get bought for their vapor, or they manage to get a bullshit patent that's overly-general on something they didn't actually invent, and then they're golden.

I wonder if these places are getting college-backed "incubation" incentives? Pretty fucking gross up and down and all around. Everyone involved is scum.


On a more practical note, if anyone knows the process for objecting to a patent in the UK, chime in.

Also, shame on us all for not doing more to fight the system. All our work should be going in the Anti-Patent Patent Pool.

05-21-15 - LZ-Sub

LZ-Sub decoder :

delta_literal = get_sub_literal();

if ( delta_literal != 0 )
{
    *ptr++ = delta_literal + ptr[-lastOffset];
}
else // delta_literal == 0
{
    if ( ! get_offset_flag() )
    {
        *ptr++ = ptr[-lastOffset];
    }
    else if ( get_lastoffset_flag() )
    {
        int lo_index = get_lo_index();
        lastOffset = last_offsets[lo_index];
        // do MTF or whatever using lo_index
        
        *ptr++ = ptr[-lastOffset];
        // extra 0 delta literal implied :
        *ptr++ = ptr[-lastOffset];
    }
    else
    {
        lastOffset = get_offset();
        // put offset in last_offsets set
        
        *ptr++ = ptr[-lastOffset];
        *ptr++ = ptr[-lastOffset];
        // some automatic zero deltas follow for larger offsets
        if ( lastOffset > 128 )
        {
            *ptr++ = ptr[-lastOffset];
            if ( lastOffset > 16384 )
            {
                *ptr++ = ptr[-lastOffset];
            }
        }   
    }

    // each single zero is followed by a zero runlen
    //  (this is just a speed optimization)
    int zrl = get_zero_runlen();
    while(zrl--)
        *ptr++ = ptr[-lastOffset];
}

This is basically LZMA. (sub literals instead of bitwise-LAM, but structurally the same) (also I've reversed the implied structure here; zero delta -> offset flag here, whereas in normal LZ you do offset flag -> zero delta)

This is what a modern LZ is. You're sending deltas from the prediction. The prediction is the source of the match. In the "match" range, the delta is zero.

The thing about modern LZ's (LZMA, etc.) is that the literals-after-match (LAMs) are very important too. These are the deltas after the zero run range. You can't really think of the match as just applying to the zero-run range. It applies until you send the next offset.

You can also of course do a simpler & more general variant :

Generalized-LZ-Sub decoder :


if ( get_offset_flag() )
{
    // also lastoffset LRU and so on not shown here
    lastOffset = get_offset();
}

delta_literal = get_sub_literal();

*ptr++ = delta_literal + ptr[-lastOffset];

Generalized-LZ-Sub just sends deltas from prediction. Matches are a bunch of zeros. I've removed the acceleration of sending zero's as a runlen for simplicity, but you could still do that.

The main difference is that you can send offsets anywhere, not just at certain spots where there are a bunch of zero deltas generated (aka "min match lengths").

This could be useful. For example when coding images/video/sound , there is often not an exact match that gives you a bunch of exact zero deltas, but there might be a very good match that gives you a bunch of small deltas. It would be worth sending that offset to get the small deltas, but normal LZ can't do it.

Generalized-LZ-Sub could also give you literal-before-match. That is, instead of sending the offset at the run of zero deltas, you could send it slightly *before* that, where the deltas are not zero but are small.

(when compressing text, "sub" should be replaced with some kind of smart lexicographical distance; for each character precompute a list of its most likely substitution character in order of probability.)

LZ is a bit like a BWT, but instead of the contexts being inferred by the prefix sort, you transmit them explicitly. Weird.

5/17/2015

05-17-15 - The Google Spam Filter is Intentionally Bad

I'm convinced at this point that Google intentionally filters spam wrong.

Not in a nefarious way, like haha we're going to send your good mails to "spam" and let the crap through! Take that!

But actually in a sort of more deeply evil way. A capitalist way. They specifically *want* to allow through mass-mailings from corporations that are they do not consider spam.

In my opinion, those are all spam. There is not a single corporate mass-mailing that I ever intentionally subscribed to.

Basically there's a very very easy spam filtering problem :

Easy 1. Reject all mass-mailings. Reject all mailings about sales, products, offers. Reject all mailings about porn or penises or nigerian princes.

Easy 2. Allow through all mail that's hand-written by a human to me. Particularly to one that I have written to in the past.

That would be fine with me. That would get 99.99% of it right for me.

They don't want to solve that problem. Instead they try to solve the much-harder problem of allowing through viagra offers that are for some reason not spam. For the email user who *wants* to get mass-mail offers of 50% off your next order.

I just don't understand how "yeah, let's go out to dinner" from my friend, who is responding with quote to a fucking mail that I sent goes in the in the Spam box, but "get direct email mass-marketing secrets to double your business!" goes in my inbox. How can it be so bad, I just really don't understand it. Fucking the most basic keyword include/exclude type of filter could do better.

I should have just written my own, because it's the kind of problem that you want to be constantly tweaking on. Every time a mail is misclassified, I want to run it through my system and see why that happened and then try to fix it.

It would be SOOO fucking easy for them. Being in a position as a central mail processor, they can tell which mails are unique and which are mass-sent, and just FUCKING BLOCK ALL THE MASS-SENT MAIL. God dammit. You are fucking me up and I know you're doing it intentionally. I hate you.


I mean, fuck. It's ridiculous.

They are responding to a mail I sent. The mail I sent is fucking quoted right there. I sent the fucking mail from gmail so you can confirm it's for real. I sent to their address with gmail. AND YOU PUT THEIR REPLY IN SPAM. WTF WTF WTF

But this is not spam :

Report: creative teamwork is easier with cloud-based apps

Businesses Increase Screening of Facebook, Twitter Before Hiring

Trying to solve the Prospecting Paradox?

I'd like to add you to my professional network on LinkedIn


Maybe I'm being a bit overly simplistic and harsh. Maybe there are mass-mailings that look spammish, but you actually want to get? Like, your credit card bill is due?

I'm not sure. I'm not sure that I ever need to get any of that. I don't need those "shipping confirmation" emails from Amazon. If they just all got filed to the "mass mail" folder, I could go look for them when I need them.


I want to make my own private internet. And then not allow anyone else to use it because you'd all just fuck it up.

5/16/2015

05-16-15 - Threading Primitive - monitored semaphore

A monitored semaphore allows two-sided waiting :

The consumer side decs the semaphore, and waits on the count being positive.

The producer side incs the semaphore, and can wait on the count being a certain negative value (some number of waiting consuming).

Monitored semaphore solves a specific common problem :

In a worker thread system, you may need to wait on all work being done. This is hard to do in a race-free way using normal primitives. Typical ad-hoc solutions may miss work that is pushed during the wait-for-all-done phase. This is hard to enforce, ugly, and makes bugs. (it's particularly bad when work items may spawn new work items).

I've heard of many ad-hoc hacky ways of dealing with this. There's no need to muck around with that, because there's a simple and efficient way to just get it right.

The monitored semaphore also provides a race-free way to snapshot the state of the work system - how many work items are available, how many workers are sleeping. This allows you to wait on the joint condition - all workers are sleeping AND there is no work available. Any check of those two using separate primitives is likely a race.

The implementation is similar to the fastsemaphore I posted before.

"fastsemaphore" wraps some kind of underlying semaphore which actually provides the OS waits. The underlying semaphore is only used when the count goes negative. When count is positive, pops are done with simple atomic ops to avoid OS calls. eg. we only do an OS call when there's a possibility it will put our thread to sleep or wake a thread.

"fastsemaphore_monitored" uses the same kind atomic variable wrapping an underlying semaphore, but adds an eventcount for the waiter side to be triggered when enough workers are waiting. (see who ordered event count? )

Usage is like this :


To push a work item :

push item on your queue (MPMC FIFO or whatever)
fastsemaphore_monitored.post();

To pop a work item :

fastsemaphore_monitored.wait();
pop item from queue

To flush all work :

fastsemaphore_monitored.wait_for_waiters(num_worker_threads);

NOTE : in my implementation, post & wait can be called from any thread, but wait_for_waiters must be called from only one thread. This assumes you either have a "main thread" that does that wait, or that you wrap that call with a mutex.

template <typename t_base_sem>
class fastsemaphore_monitored
{
    atomic<S32> m_state;
    eventcount m_waiters_ec;
    t_base_sem m_sem;

    enum { FSM_COUNT_SHIFT = 8 };
    enum { FSM_COUNT_MASK = 0xFFFFFF00UL };
    enum { FSM_COUNT_MAX = ((U32)FSM_COUNT_MASK>>FSM_COUNT_SHIFT) };
    enum { FSM_WAIT_FOR_SHIFT = 0 };
    enum { FSM_WAIT_FOR_MASK = 0xFF };
    enum { FSM_WAIT_FOR_MAX = (FSM_WAIT_FOR_MASK>>FSM_WAIT_FOR_SHIFT) };

public:
    fastsemaphore_monitored(S32 count = 0)
    :   m_state(count<<FSM_COUNT_SHIFT)
    {
        RL_ASSERT(count >= 0);
    }

    ~fastsemaphore_monitored()
    {
    }

public:

    inline S32 state_fetch_add_count(S32 inc)
    {
        S32 prev = m_state($).fetch_add(inc<<FSM_COUNT_SHIFT,mo_acq_rel);
        S32 count = ( prev >> FSM_COUNT_SHIFT );
        RR_ASSERT( count < 0 || ( (U32)count < (FSM_COUNT_MAX-2) ) );
        return count;
    }

    // warning : wait_for_waiters can only be called from one thread!
    void wait_for_waiters(S32 wait_for_count)
    {
        RL_ASSERT( wait_for_count > 0 && wait_for_count < FSM_WAIT_FOR_MAX );
        
        S32 state = m_state($).load(mo_acquire);
        
        for(;;)
        {
            S32 cur_count = state >> FSM_COUNT_SHIFT;

            if ( (-cur_count) == wait_for_count )
                break; // got it
        
            S32 new_state = (cur_count<<FSM_COUNT_SHIFT) | (wait_for_count << FSM_WAIT_FOR_SHIFT);
            
            S32 ec = m_waiters_ec.prepare_wait();
            
            // double check and signal what we're waiting for :
            if ( ! m_state.compare_exchange_strong(state,new_state,mo_acq_rel) )
                continue; // retry ; state was reloaded
            
            m_waiters_ec.wait(ec);
            
            state = m_state($).load(mo_acquire);
        }
        
        // now turn off the mask :
        
        for(;;)
        {
            S32 new_state = state & FSM_COUNT_MASK;
            if ( state == new_state ) return;
        
            if ( m_state.compare_exchange_strong(state,new_state,mo_acq_rel) )
                return; 
                
            // retry ; state was reloaded
        }
    }

    void post()
    {
        if ( state_fetch_add_count(1) < 0 )
        {
            m_sem.post();
        }
    }

    void wait_no_spin()
    {
        S32 prev_state = m_state($).fetch_add((-1)<<FSM_COUNT_SHIFT,mo_acq_rel);
        S32 prev_count = prev_state>>FSM_COUNT_SHIFT;
        if ( prev_count <= 0 )
        {
            S32 waiters = (-prev_count) + 1;
            RR_ASSERT( waiters >= 1 );
            S32 wait_for = prev_state & FSM_WAIT_FOR_MASK;
            if ( waiters == wait_for )
            {
                RR_ASSERT( wait_for >= 1 );
                m_waiters_ec.notify_all();
            }
            
            m_sem.wait();
        }
    }
    
    void post(S32 n)
    {
        RR_ASSERT( n > 0 );
        for(S32 i=0;i<n;i++)
            post();
    }
       
    bool try_wait()
    {
        // see if we can dec count before preparing the wait
        S32 state = m_state($).load(mo_acquire);
        for(;;)
        {
            if ( state < (1<<FSM_COUNT_SHIFT) ) return false;
            // dec count and leave the rest the same :
            //S32 new_state = ((c-1)<<FSM_COUNT_SHIFT) | (state & FSM_WAIT_FOR_MASK);
            S32 new_state = state - (1<<FSM_COUNT_SHIFT);
            RR_ASSERT( (new_state>>FSM_COUNT_SHIFT) >= 0 );
            if ( m_state($).compare_exchange_strong(state,new_state,mo_acq_rel) )
                return true;
            // state was reloaded
            // loop
            // backoff here optional
        }
    }
     
       
    S32 try_wait_all()
    {
        // see if we can dec count before preparing the wait
        S32 state = m_state($).load(mo_acquire);
        for(;;)
        {
            S32 count = state >> FSM_COUNT_SHIFT;
            if ( count <= 0 ) return 0;
            // swap count to zero and leave the rest the same :
            S32 new_state = state & FSM_WAIT_FOR_MASK;
            if ( m_state($).compare_exchange_strong(state,new_state,mo_acq_rel) )
                return count;
            // state was reloaded
            // loop
            // backoff here optional
        }
    }
           
    void wait()
    {
        int spin_count = rrGetSpinCount();
        while(spin_count--)
        {
            if ( try_wait() ) 
                return;
        }
        
        wait_no_spin();
    }

};

05-16-15 - LZ literals after match

Some vague rambling about LAMs.

LAMs are weird.

LAM0 , the first literal after a match, has the strong exclusion property (assuming maximum match lengths). LAM0 is strictly != lolit. (lolit = literal at last offset).

LAM1, the next literal after end of match, has the exact opposite - VERY strong prediction of LAM1 == lolit. This prediction continues but weakens as you go to LAM2, LAM3, etc.

In Oodle LZNA (and in many other coders), I send a flag for (LAM == lolit) as a separate event. That means in the actual literal coding path you still have LAM1 != lolit. (the LAM == lolit flag should be context-coded using the distance from the end of the match).

In all cases, even though you know LAM != lolit, lolit is still a very strong predictor for LAM. Most likely LAM is *similar* to lolit.

LAM is both an exclude AND a predictor!

What similar means depends on the file type. In text it means something like vowels stay vowels, punctuation stays punctuation. lolit -> LAM is sort of like substituting one character change. In binary, it often means that they are numerically close. This means that the delta |LAM - lolit| is never zero, but is often small.

One of the interesting things about the delta is that it gives you a data-adaptive stride for a delta filter.

On some files, you can get huge compression wins by running the right delta filter. But the ideal delta distance is data-dependent (*). The sort of magic thing that works out is that the LZ match offsets will naturally pick up the structure & word sizes. In a file of 32-byte structs made of DWORDs, you'll get offsets of 4,8,12,32,etc. So you then take that offset and forming the LAM sub is just a way of doing a delta with that deduced stride. On DWORD or F32 data, you tend to get a lot of offset=4, so LAM tends to just be doing delta from the previous word (note of course this bytewise delta, not a proper dword delta).

(* = this is a huge thing that someone needs to work on; automatic detection of delta filters for arbitrary data; deltas could be byte,word,dword, other, from immediate neighbors or from struct/row strides, etc. In a compression world where we are fighting over 1% gains, this can be a 10-20% jump.)

Experimentally we have observed that LAMs are very rapidly changing. They benefit greatly from very quickly adapting models. They like geometric adaptation rates (more recent events are much more important). They cannot be modeled with large contexts (without very sophisticated handling of sparsity and fast adaptation), they need small contexts to get lots of events and statistical density. They seem to benefit greatly from modeling in groups (eg. bitwise or nibblewise or other), so that events on one symbol also affect other probabilities for faster group learning. Many of these observations are similar for post-BWT data. LAM sub literals does seem to behave like post-BWT data to some extent, and similar principles of modeling apply.

So, for example, just coding an 8-bit symbol using the 8-bit lolit as context is a no-go. In theory this would give you full modeling of the effects of lolit on the current symbol. In practice it dilutes your statistics way too much. (in theory you could do some kind of one-count boosts other counts thing (or a secondary coding table ala PPMZ SEE), but in practice that's a mess). Also as noted previously, if you have the full 8-bit context, then whether you code symbol raw or xor or sub is irrelevant, but if you do not have the full context then it does change things.

Related posts :

cbloom rants 08-20-10 - Deobfuscating LZMA
cbloom rants 09-14-10 - A small note on structured data
cbloom rants 03-10-13 - Two LZ Notes
cbloom rants 06-12-14 - Some LZMA Notes
cbloom rants 06-16-14 - Rep0 Exclusion in LZMA-like coders
cbloom rants 03-15-15 - LZ Literal Correlation Images

5/13/2015

05-13-15 - Skewed Pareto Chart

It's hard to see just the decomp speed in the normal Pareto Chart. It gets squished down over at the far-right Y-intercept.

The obvious fix is just to magnify the right side. This is a linear scaling of the data; *1 on the far left, *10 on the far right :

The far-left is still proportional to the compression ratio, the far right is proportional to the decompression speed. The compressor lines are still speedups vs. memcpy, but the memcpy baseline is now sloped.

I'm not really sure how I feel about the warped chart vs unwarped.

The Pareto curves are in fact sigmoids (tanh's).


speedup = 1 / (1/compression_ratio + disk_speed / decompress_speed)

speedup = 1 / (1/compression_ratio + exp( log_disk_speed ) / decompress_speed)

(here they're warped sigmoids because of the magnification; the ones back here in the LZNA post are true sigmoids)

I believe (but have not proven) that a principle of the Pareto Frontier is that the maximum of all compressors should also be a sigmoid.


max_speedup(disk_speed) = MAX{c}( speedup[compressor c](disk_speed) );

One of the nice things about these charts is it makes it easy to see where some compressors are not as good as possible. If we fit a sigmoid over the top of all the curves :

We can easily see that LZHLW and LZNIB are not touching the curve. They're not as good as they should be in space/speed. Even thought nothing beats them at the moment (that I know of), they are algorithmically short of what's possible.

There are two things that constrain compressors from being better in a space/speed way. There's 1. what is our current best known algorithm. And then there's 2. what is possible given knowledge of all possible algorithms. #2 is the absolute limit and eventually it runs into a thermodynamic limit. In a certain amount of cpu time (cpu bit flips, which increase entropy), how much entropy can you take out of a a given data stream. You can't beat that limit no matter how good your algorithm is. So our goal in compression is always to just find improvements in the algorithms to edge closer to that eventual limit.

Anyway. I think I know how to fix them, and hopefully they'll be up at the gray line soon.

5/11/2015

05-11-15 - ANS Minimal Flush

A detail for the record :

ANS (TANS or RANS) in the straightforward implementation writes a large minimum number of bytes.

To be concrete I'll consider a particular extremely bad case : 64-bit RANS with 32-bit renormalization.

The standard coder is :


initialize encoder (at end of stream) :

x = 1<<31

renormalize so x stays in the range x >= (1<<31) and x < (1<<63)

flush encoder (at the beginning of the stream) :

output all 8 bytes of x

decoder initializes by reading 8 bytes of x

decoder renormalizes via :

if ( x < (1<<31) )
{
  x <<= 32;  x |= get32(ptr); ptr += 4;
}

decoder terminates and can assert that x == 1<<31

this coder outputs a minimum of 8 bytes, which means it wastes up to 7 bytes on low-entropy data (assuming 1 byte minimum output and that the 1 byte required to byte-align output is not "waste").

In contrast, it's well known how to do minimal flush of arithmetic coders. When the arithmetic coder reaches the end, it has a "low" and "range" specifying an interval. "low" might be 64-bits, but you don't need to output them all, you only need to output enough such that the decoder will get something in the correct interval between "low" and "low+range".

Historically people often did arithmetic coder minimum flush assuming that the decoder would read zero-valued bytes after EOF. I no longer do that. I prefer to do a minimum flush such that decoder will get something in the correct interval no matter what byte follows EOF. This allows the decoder to just read past the end of your buffer with no extra work. (the arithmetic coder reads some # of bytes past EOF because it reads enough to fill "low" with bits, even though the top bits are all that are needed at the end of the stream).

The arithmetic coder minimum flush outputs a number of bytes proportional to log2(1/range) , which is the number of bits of information that are currently held pending in the arithmetic coder state, which is good. The excess is at most 1 byte.

So, to make ANS as clean as arithmetic coding we need a minimal flush. There are two sources of the waste in the normal ANS procedure outlined above.

One is the initial value of x (at the end of the stream). By setting x to (1<<31) , the low end of the renormalization interval, we have essentually filled it with bits it has to flush. (the pending bits in x is log2(x)). But those bits don't contain anything useful (except a value we can check at the end of decoding). One way to remove that waste is to stuff some other value in the initial state which contains bits you care about. Any value you initialize x with, you get back at the end of decoding, so then those bits aren't "wasted". But this can be annoying to find something useful to put in there, since you don't get that value out until the end of decoding.

The other source of waste is the final flush of x (at the beginning of the stream). This one is obvious - the # of pending bits stored in x at any time is log2(x). Clearly we should be flushing the final value of x in a # of bits proportional to log2(x).

So to do ANS minimal flush, here's one way :


initialize encoder (at end of stream) :

x = 0

renormalize so x stays in the range x < (1<<63)

flush encoder (at the beginning of the stream) :

output # of bytes with bits set in x, and those bytes

decoder initializes by reading variable # of bytes of x

decoder renormalizes via :

if ( x < (1<<31) )
{
  if ( ptr < ptrend )
  {
    x <<= 32;  x |= get32(ptr); ptr += 4;
  }
}

decoder terminates and can assert that x == 0

This ANS variant will output only 1 byte on very-low-entropy data.

There are now two phases of the coder. In the beginning of encoding (at the ending of the stream), x is allowed to be way below the renormalization range. During this phase, encoding just puts information into x, and the value of x grows. (note that x can actually stay 0 and never hold any bits if your consists of entirely the bottom symbol in RANS). Once x grows up into the renormalization interval, you enter the next phase where bits of x are pushed to the output to keep x in the renormalization interval. Decoding, in the first phase you read bytes from the stread to fill x with bits and keep it in the renormalization interval. Once the decoder read pointer hits the end, you switch to the second phase, and now x is allowed to shrink below the renormalization minimum and you can continue to decode the remaining information held in it.

This appears to add an extra branch to the decoder renormalization, but that can be removed by duplicating your decoder into "not near the end" and "near the end" variants.

The #sigbit output of x at the head is just the right thing and should always be done in all variants of ANS.

The checking ptr vs. ptrend and starting x = 0 is the variant that I call "minimal ANS".

Unfortunately "minimal ANS" doesn't play well with the ILP multi-state interleaved ANS. To do interleaved ANS like this you would need an EOF marker for each state. That's possible in theory (and could be done compactly in theory) but is a pain in the butt in practice.

5/10/2015

05-10-15 - Did I ever mention that I fucking hate the fucking web-

(I might be a bit cranky today. Too much work and not enough sex. I should probably just go to a bar and talk about how I love Obama and taxes so I can get in a fight. Instead I'll rage about the fucking web.)

I'm trying to get together the photos of my baby to share with my mom. What a fucking nightmare. They're mostly on my phone, and auto-backed up to Google Photos. Should be easy, right?

The Google Photos web interface is fucking wrist-slashing insanity. It's SO FUCKING SLOW. It should not take so long to show me a few little thumbnails. Fucking quit all the fucking AJAX fancy bullshit whatever the fuck you're fucking doing oh my god.

It always only wants to show me "highlights". Who told you to fucking do that? I have never highlighted anything so I'm not sure how you decided what was a highlight and what wasn't. You fucking dicks.

Simple shit like making an album and trying to put the correct photos in the album just has no decent workflow. FUCK.

So I'm going to just download them and do it on my computer. Fine.

There's no download all. I'm supposed to what, click each fucking one and download? (which is a frustrating nightmare because the click is some super slow awful web popup).

Okay, I can use Google Takeouts to just get the whole thing. Hmm. Why are my photos fucking 8 GB? Oh, because it's giving me all my videos too. FUCK FUCK FUCK. I just want the photos not the videos. Nope, Takeouts gives you everything.

Okay, I'll just download the 8 GB. Oh awesome the download the failed. Oh awesome it failed again.

Okay, I'll get the download URL and give it to DownloadThemAll which is good and can do resumes and so on and the main reason I cling to Firefox.

NOPE the fucking download link is not an actual file it's some fucking redirect login bullshit that DTA can't handle. ARG ARG ARG.

And now fuck my fucking baby photos and fuck my mom (sorry mom) I'm not fucking dealing with this shit and I fucking hate the fucking web god dammit.


For some time I've been using Google Classic Maps ("https://maps.google.com/maps?output=classic&dg=opt"). And now it's been killed. Maybe I'll switch to Bing? Or fuck that. Maybe I should just buy a good set of paper maps. I'm not sure that even exists any more. Ever since the Thomas Guide switched to computer-generated maps they really suck, they're ugly and the layout is no good and hard to read.

The reason I saw on the web for killing it (Google Classic Maps) was that too many people were opting out of new maps. You killed it because people liked it. I don't know if that's true, but it is awesomely in character.

For a while I was on the Google Forums complaining about Blogger. Just about everyone who runs a blog at some point gets a troll and realizes that they need the ability to just ban an individual. Can't do it. So they go on the Google product forum and say hey can we get black listing and white listing? The Google response was "we know you want that, and fuck you".

REMINDER TO SELF : always download all the images made by Google Charts because that service will die at some point. (this would be good practice even if Google didn't randomly chop off its own limbs on a regular basis)


I don't keep any cookies or browse history. With everyone going to fucking two-phase login this is starting to get annoying. To login I now have to get a text code to my phone and enter that. It's tedious.

But the thing that really kills me is this stupid detail :

I get the numeric code sent to me. I go to Google Voice on my computer (because actually ever touching the phone is to be avoided at all cost). I double-click the number to copy it. I paste it in the two-phase entry.

It fails. Wrong code.

I try again. It fails.

The fucking double-click is selecting the space after the number, and the fucking login doesn't ignore the trailing space. It's lazy bad programmer shit like that which makes me furious.

Another one I hit often is using online payment thingies. I'll copy-paste the amount from my bill, something like "$1,234" and hit okay and I get

"invalid entry, please enter a numeric value"

IT's A FUCKING PAYMENT ENTRY BOX. You can fucking strip the leading $ and commas you piece of shit mother fucking asshole terrible programmers.


I'm trying to login to Skype on my phone.

(side note: summary of every Skype sessions I've ever had : "I can see you, can you see me? I can't hear you. Oh, you're upside down. Let me log off and disconnect. Now you're black. Try again. It's real glitchy, let's restart it. Hey, it's working! Hi! Hi! Okay, gotta go now.")

It says login with your skype account or your microsoft account.

So I tediously enter my microsoft account login which has a password like fucking @#$ASD@!#$<:22 and is fucking awful to type (and is starred out you fucking fucks the fucking fuck).

Skype says "oh, it looks like you entered a microsoft account, redirecting..."

And it pops up a new login page WITH BLANK FUCKING ENTRIES. I WANT TO STAB YOU IN THE COLON.


I'm so fucking sick of loading web pages and seeing "connecting to blah.. connecting to blah.." and seeing shit popping in slowly and reflowing and the focus popping and all this fucking shit.

Hey, fucking remedial loading school. You put all the content needed for the page in one package. Send me the one package. BOOM it loads.

Incremental is bullshit.

Back in the 90's some time, I worked for Eclipse on streaming 3d for the web. One of the things I did was a progressive wavelet image compressor so we could do things like send the first 5k of each image, then the next 10k, and because of the truncation property of bitplane-coded wavelets those were good low quality versions of the image that could just be tacked together.

So we tried to test it and demo it.

Everything just instantly loaded and you couldn't see the progressive wavelet load at all.

Because if you're not a fucking moron and you package together your content and just have a single download bundle to get your content, hey the internet is actually really fucking fast (even back in the 90's !!).

To show it off I put in a bunch of fake delays on the downloader to simulate slow hosts, so that you could see the wavelets gradually getting better, and that's what we showed to VC's or whatever.

I guess I could have just taken all the files and scattered them on different hosts around the world, THE WAY FUCKING NORMAL WEB PAGES DO. It's like they have very carefully gone through this process of intentionally slowing down the internet for no reason.


Sometimes I wish that I was like an air-cooled Porsche mechanic or something very stable and non-computer related, so I could just work away in my shop and not have to ever touch this fucking demon box.

5/09/2015

05-09-15 - Oodle LZNA

Oodle 1.45 has a new compressor called LZNA. (LZ-nibbled-ANS)

LZNA is a high compression LZ (usually a bit more than 7z/LZMA) with better decode speed. Around 2.5X faster to decode than LZMA.

Anyone who needs LZMA-level compression and higher decode speeds should consider LZNA. Currently LZNA requires SSE2 to be fast, so it only runs full speed on modern platforms with x86 chips.

LZNA gets its speed from two primary changes. 1. It uses RANS instead of arithmetic coding. 2. It uses nibble-wise coding instead of bit-wise coding, so it can do 4x fewer coding operations in some cases. The magic sauce that makes these possible is Ryg's realization about mixing cumulative probability distributions . That lets you do the bitwise-style shift update of probabilities (keeping a power of two total), but on larger alphabets.

LZNA usually beats LZMA compression on binary, slightly worse on text. LZNA is closer to LZHAM decompress speeds.


Some results :


lzt99

LZNA -z6 : 24,700,820 -> 9,154,248 =  2.965 bpb =  2.698 to 1
decode only      : 0.327 seconds, 43.75 b/kc, rate= 75.65 mb/s

LZMA : 24,700,820 -> 9,329,925 =  3.021 bpb =  2.647 to 1
decode           : 0.838 seconds, 58.67 clocks, rate= 29.47 M/s

LZHAM : 24,700,820 ->10,140,761 =  3.284 bpb =  2.435 to 1
decode           : 0.264 seconds, 18.44 clocks, rate= 93.74 M/s

(note on settings : LZHAM is run at BETTER because UBER is too slow. LZHAM BETTER is comparable to Oodle's -z6 ; UBER is similar to my -z7. LZMA is run at the best compression setting I can find; -m9 and lc=0,lp=2,pb=2 for binary data; with LZHAM I don't see a way to set the context bits. This is the new LZHAM 1.0, slightly different than my previous tests of LZHAM. All 64-bit, big dictionaries.).


baby_robot_shell

LZNA -z6 : 58,788,904 ->12,933,907 =  1.760 bpb =  4.545 to 1
decode only      : 0.677 seconds, 50.22 b/kc, rate= 86.84 mb/s

LZMA : 58,788,904 ->13,525,659 =  1.840 bpb =  4.346 to 1
decode           : 1.384 seconds, 40.70 clocks, rate= 42.49 M/s

LZHAM : 58,788,904 ->15,594,877 =  2.122 bpb =  3.769 to 1
decode           : 0.582 seconds, 17.12 clocks, rate= 100.97 M/s

I'm not showing encode speeds because they're all running different amounts of threading. It would be complicated to show fairly. LZHAM is the most aggressively threaded, and also the slowest without threading.


My "game testset" total sizes, from most compression to least :


Oodle LZNA -z8 :            57,176,229
Oodle LZNA -z5 :            58,318,469

LZMA -mx9 d26:lc0:lp2:pb3 : 58,884,562
LZMA -mx9 :                 59,987,629

LZHAM -mx9 :                62,621,098

Oodle LZHLW -z6 :           68,199,739

zip -9 :                    88,436,013

raw :                       167,495,105


Here's the new Pareto chart for Oodle. See previous post on these charts

This is load+decomp speedup relative to memcpy : (lzt99)

The left-side Y-intercept is the compression ratio. The right-side Y-intercept is the decompression speed. In between you can see the zones where each compressor is the best tradeoff.

With LZMA and LZHAM : (changed colors)

lzt99 is bad for LZHAM, perhaps because it's heterogeneous and LZHAM assumes pretty stable data. (LZHAM usually beats LZHLW for compression ratio). Here's a different example :

load+decomp speedup relative to memcpy : (baby_robot_shell)

old rants