Dw ([info]_dw) wrote,
@ 2003-09-16 17:38:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Entry tags:coding, computer science

Various events
I haven't got my disk back yet. The ones from which I bought it say their results are contradictory (some times reporting an error, some times not), and also that if there is a hardware problem with it on this level, it'll have to be returned to the manufacturer.

I also updated Holding. It now contains a program that improves bzip2 compression of English text by about 5%.




For those who care about consistency puzzles and NP-completeness (where are you, [info]lhexa?), here's another question.

You are given a different number of each of the puzzle pieces below.


Is there an easy way of putting them next to each other so that they all fit and you're using all of the given pieces? (Say you get 20 of A, 23 of B, 30 of C, and 32 of D)

What if the pieces can be rotated -- that is B can be rotated to become D and vice versa, or if made 2D (each of the four sides can be either red or green, and colors need to fit, instead of just two)?

Finally, for the advanced, is this NP-hard?



(Post a new comment)


[info]lhexa
2003-09-20 05:14 am UTC (link)
It can be solved recursively, and in polynomial time. First thing, define the blocks as the following letter pairs:

A = ba; B = bx; C = yx; D = ya

The puzzle is solvable if a string can be formed from those pairs, which begins with either b or y, has for its middle a number of the pairs 'ab' and 'xy' placed in any order, and ending with either a or x. Calling 'As' the number of 'a's in the available letter pairs all taken together, and similarly for the others, one of the following conditions must hold:

(Xs-- == Ys) && (As == Bs--)
(Xs == Ys--) && (As-- == Bs)
(Xs-- == Ys--) && (As == Bs)
(Xs == Ys) && (As-- == Bs--)

Yeah, I know those last two could be simpler, but they make the pattern clearer. Basically, if you take one trailing letter and one leading letter out of the mix, and end up with As and Bs equal, and Xs and Ys equal, and provided that the leading and trailing letter were taking from different pairs (if you have more than one pair), then the puzzle is solvable. Call this the "equality condition". To find the solution, given that the above condition is met, you use this algorithm:

If you have only one letter pair, it's solved.

If you have two pairs, there will be two possible combinations, and the one dictated by the equality condition will be a solution.

If you have three or more, select any leading and trailing letter that satisfy the above condition. You are then left with four possibilities for the combination of leading and trailing pairs, for that choice of leading and trailing letters. Each possibility of leading and trailing pairs will, in turn, dictate the leading and trailing letters of the segment contained within. Go through all four combinations, and consider the set of pairs you have when you take away the start and end pairs.
If, using the trailing and leading letters dictated by these bounding pairs, the equality condition is met, then that choice of bounding pairs is a possible solution. At least one of the four possibilities you have will have a remainder set of pairs that meets the equality condition. Once appropriate trailing and leading pairs are found, select them for those positions, and repeat the step with the reduced set of pairs you now have. So, for an example, say you have:

yx yx yx yx ba bx ya (C C C C A B D)

Taking one y and one x out of this meets the equality conditions, leaving you with four each of ys and xs, and two each of as and bs. The possibilities are:

yx...yx, leaving yx, yx, ba, bx, ya, dictating yxy...xyx
yx...bx, leaving yx, yx, yx, ba, ya, dictating yxy...abx
ya...yx, leaving yx, yx, yx, ba, bx, dictating yab...xyx
ya...bx, leaving yx, yx, yx, yx, ba, dictating yab...abx

The first three meet the equality condition, while the fourth one fails it (the b and a can't be taken from different pairs), so the first three have solutions, and the fourth doesn't. Taking the first, the possibilities are:

yx...yx, leaving yx, yx, ba, bx, ya,

yxyx...yxyx, leaving ba, bx, ya, dictating yxyxy...xyxyx
yxyx...bxyx, leaving yx, ba, ya, dictating yxyxy...abxyx
yxya...yxyx, leaving yx, ba, bx, dictating yxyab...xyxyx
yxya...bxyx, leaving yx, yx, ba, dictating yxyab...abxyx

Once again, the first three meet the equality condition while the fourth doesn't. Since each leaves you with only one possibility, you get:

yxyx...yxyx, leaving ba, bx, ya, dictating yxyxy...xyxyx
yxyx...bxyx, leaving yx, ba, ya, dictating yxyxy...abxyx
yxya...yxyx, leaving yx, ba, bx, dictating yxyab...xyxyx

yxyxyababxyxyx: CCDABCC
yxyxyxyababxyx: CCCDABC
yxyababxyxyxyx: CDABCCC

This procedure is doable in polynomial time, so I'm guessing the puzzle doesn't solve NP-Hard. Tell me if I've made a mistake.

With the ability to rotate one-eighty degrees, the puzzle is solvable whenever the number of Bs or Ds is higher than zero, or the number of As and Cs are not both higher than zero. You just string the As together on the left side, the Cs together on the right, and bridge them with Bs and Ds, rotated if necessary to fit.

The pieces would require changing for a 2D grid, so I can't answer for that...

(Reply to this) (Thread)


[info]lhexa
2003-09-20 05:17 am UTC (link)
I sure wish I could edit this thing. I left quite a few notes to myself stuck in the middle... the repeating lines, namely.

(Reply to this) (Parent)


[info]_dw
2003-09-20 11:45 am UTC (link)
I haven't verified it (that is, programmed a solver), but to me, your solution looks like a backtracking solver. Say for instance that you pick the first solution to every recursion, and you find at near the end that it wasn't right at all, then you have to back up all the way and try the next solution and so on.

It may be possible to do some pseudopolynomial (polynomial if one considers the number of "colors" constant) dynamic programming trick though.

Or maybe it is possible to do a divide and conquer.. that is, consider each block of two puzzle pieces, then consider these equivalent to a single puzzle piece and go on?

As for a 2D grid, imagine you have N different colors on a KxK grid, and each square (puzzle piece) is colored on each of its four faces. The objective then is to put the puzzle pieces on the grid so that two sided facing always are colored the same color.
E.g

#B# #G# #R#
R#R R#G G#B
#G# #B# #B#

#G# #B# #B#
R#G G#R R#R
#B# #R# #G#
where R is red, G is green, and B is blue. The pieces can be rotated clockwise or counterclockwise.

These problems are derived from a real puzzle I saw somewhere. It is similar, only has about 10 colors, and the playing field and puzzle pieces are hexagonal, not square.

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-09-23 03:40 am UTC (link)
No, the good thing about my algorithm is that any step which fully meets that "equality criterion" is guaranteed to hold a solution; thus when you finally get to the last segment, it is certain to be a solution. There's no backtracking, just careful selection from among the possibilities.

Geez, I don't think I can start thinking about the 2D problem without risking losing many hours to it... I remember that Cullatz conjecture you gave me, which consumed me for two days (fruitlessly). Nothing obvious comes to mind beyond the brute-force approach, and the 1D algorithm doesn't apply (each step would come upon a greater number of adjoining blocks, rather than just the two endpoints). Maybe I'll try again later.

(Reply to this) (Parent)(Thread)


[info]_dw
2003-09-23 10:08 am UTC (link)
I think I may have found an iterative approach too (to the 1D problem), but I have only generally pondered. It would consider different pieces as "lines" (e.g AAAA, CCCC, BCBC) and "bridges" (turns green to red or red to green).

Something like:
First sort to generate as long lines as possible.
Then see what pieces are left over and what bridges
we need.
Then remove from the lines without destroying the
line itself, unless destroying it would allow more pieces to be used as bridges.
If that is impossible, then there is no solution.


Have you encountered that hexagonal puzzle anywhere? If so, do you know what its name is?

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-10-06 03:49 am UTC (link)
Nope, I haven't encountered the hexagonal puzzle anywhere. Have you tried testing either my algorithm or one of your own?

(Reply to this) (Parent)(Thread)


[info]_dw
2003-10-06 09:11 pm UTC (link)
Nope, I haven't encountered the hexagonal puzzle anywhere.
Hm, I haven't found its name either. For a while I thought it was "Kinato", but that deals with patterns, not colors, and is triangular.

I did find a square version though. It is called Cube Brick and is on this page.

Have you tried testing either my algorithm or one of your own?
I have almost completed writing my own solver. There's a bug with its allocation of B and D tiles (it allocates AAA BD BD CCC when it should be D AAA BD B CCC instead), but other than that works well.. at least for the puzzles I have tried.

I haven't implemented your algorithm yet.

(Reply to this) (Parent)(Thread)


[info]_dw
2003-10-07 11:59 am UTC (link)
I rewrote it now, with comments. Basically, it creates a solution of the form
[zero or more As] [zero or more "BD"s] B [zero or more Cs] and puts a D at the beginning or end if necessary, reducing certain subsets so that it is possible to create such a solution, and handling exceptions (like no As, only As or Cs, etc).

For your example, the transcript goes as follows:
bash-2.05b$ ./a.out
Enter puzzle pieces:ccccabd
A: 1    B: 1    C: 4    D: 1
DABCCCC

It doesn't create all possible solutions, but if there is one, it'll find it (I think; I don't have a rigorous proof).

The source code is linked to at the bottom of Holding. Points if you figure out why I gave the solver that name.

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-10-22 12:48 pm UTC (link)
I'm damned if I know what this problem has to do with Polish revolutionaries.

(Reply to this) (Parent)(Thread)


[info]_dw
2003-10-22 10:34 pm UTC (link)
The pronounciation of the first name seems similar to that of your nick, sans the final "sa" part.

Convoluted, yes.

(Reply to this) (Parent)


[info]lhexa
2003-09-20 05:37 am UTC (link)
Oh, and congratulations about the bzip thing. It's always surprising to hear that something like that is still capable of significant improvement.

(Reply to this) (Thread)


[info]_dw
2003-09-20 11:32 am UTC (link)
It's at an amortized constant dictionary cost, though if stored correctly, the dictionary should be pretty compact. I haven't implemented such storage (tries or recognizers) yet, however.

I also briefly wondered on if it would be possible to create a sort of "lossy english compression" by finding redundancies in the grammatical structure (e.g replace verb tenses with infinitive where there's only one possible solution). It would be lossy because it'd "fix" spelling errors too unless counterindicated (by some sort of "don't process these areas of the file" thing).
The result of this was me finding out I don't know enough about the defined mechanics of English grammar.

By the way, where have you been?

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-09-23 03:34 am UTC (link)
Perhaps it could be done with Chomsky's generative grammar... basically, an encoding of some sentence in English would include:

1) A base sentence structure (I forget whether Chomsky's grammar can derive all sentence types from a single basic structure).
2) A list of the transformative steps to be consecutively applied to the sentence, corresponding to some preestablished table.
3) A list of the words to be consecutively placed in the available positions in the final structure (this would be lesser than the total number of words in the sentence, as some would be dictated by syntax).

Each list can be compressed as usual. However, my knowledge of Chomskyian grammar is shallow, so I don't know whether this final "encoding" would be less compressable than the original sentence.

Also, I just recalled that Chomsky famously argued that his grammar could not be used to judge whether sentences are grammatical; it could only produce grammatical sentences itself. In other words, the problem of algorithmically converting any grammatical English sentence to Chomskyian format might be unsolved, or unsolvable due to vagueness in the concept of "grammaticalness" (or whatever it really is).

If not that, it might turn to be NP-Hard...

(Reply to this) (Parent)(Thread)


[info]_dw
2003-09-23 10:03 am UTC (link)
Hmm, I was more thinking about expanding the "external dictionary" solution to a wordlist that contains nouns, verbs, etc separated.

As the grammatical structures of English are well documented, it should be possible to encode them externally and reduce any sentence that conforms to the rules to a tree, removing redundancy in the process.

Such a transform could be of use for more things than just compression -- language processing, for instance.

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-10-06 03:51 am UTC (link)
Actually, I wouldn't bet on the grammatical structures of English being completely documented. Violations of every rule seem permissible in the right circumstances.

How would splitting the dictionary according to parts of speech help?

Well, the Chomskyan (am I spelling that right?) syntactical structure definitely has lended itself well to language processing... I think that it's one of the major tools in trying to solve the recognition problem.

(Reply to this) (Parent)(Thread)


[info]_dw
2003-10-06 09:25 pm UTC (link)
Actually, I wouldn't bet on the grammatical structures of English being completely documented. Violations of every rule seem permissible in the right circumstances.

It would seem so, at least for some types of text. Press releases, laws, etc are probably going to adhere to common grammar, so to speak, pretty well, while things like journal posts, message board replies (and obviously poetry) will be breaking the rules often.

However, as long as we can use the information as a sort of predictive coder, and it is right more often than not, we should be able to compress better.

How would splitting the dictionary according to parts of speech help?
It would help for the sentence classifier. For instance, if it knows something is a verb, it's probably not a subject, and so on. (Tricky sentences like He wondered what the word "observing" meant would run into trouble, but meh.)

It would also be possible to split the dictionary in the other direction and create different dictionaries dealing with subsets of the english language, and indicate which to use with an integer at the start of the coded file. Redundancy could be coded only once in the dictionary set. I don't know if this would give more compression for each dictionary word, though.

Well, the Chomskyan (am I spelling that right?) syntactical structure definitely has lended itself well to language processing... I think that it's one of the major tools in trying to solve the recognition problem.

I think you are, and if it is, then maybe someone has already done the hard work and coded some sort of library relating to the usage of such grammars / syntactical structure representations..

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-10-11 04:00 am UTC (link)
Hmm, I have nothing to add, except that you might want to check with me again in a year or two, by which time I should have learned more about formal syntax... *smiles*

(Reply to this) (Parent)(Thread)


[info]_dw
2003-10-11 08:55 pm UTC (link)
.. and if you want to tinker with my star encoder to implement whatever, the source is at the page.

Though it may have changed a year from now :)

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-10-22 12:49 pm UTC (link)
That may presuppose a greater knowledge of programming than I'll have.

(Reply to this) (Parent)(Thread)


[info]_dw
2003-10-22 10:37 pm UTC (link)
*shrugs* I thought you had at least as much knowledge of programming as I had.

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-10-30 01:12 am UTC (link)
Perhaps when we first met that was the case, but little has been added to my knowledge of programming since then.

(Reply to this) (Parent)


[info]lhexa
2003-09-23 03:36 am UTC (link)
Is the dictionary generated from each sample to be encoded?

It often takes me quite a while to reply to all of your messages (this has been true for years); the bunch before this spanned some three hours, I think. So I intentionally put a space of time between replies... and the previous delay, along with other distractions, was enough for me to forget about the posts until I started reading the journals again.

(Reply to this) (Parent)(Thread)


[info]_dw
2003-09-23 09:56 am UTC (link)
Is the dictionary generated from each sample to be encoded?

No, it is generated (as a frequency ordered wordlist[1]) from a large corpus beforehand, and distributed with the coder. That's what I mean with constant cost (in terms of bytes encoded) -- no matter how many bytes you encode, the space loss from the dictionary is the same.

Obviously, the compression rate improves as the dictionary gets larger, so there's a tradeoff here. The rate also improves if you get a more accurate sample corpus.

The tradeoff looks somewhat like this. (I have a bad plotting program, so it commits that statistical sin of putting a nonzero value at the intersection of the two axes)



The X axis is number of words. The Y axis is compressed bits per uncompressed byte (bpb).
It seems to me to be somewhat like a 1/x curve, which would make sense, never reaching 0.


[1] Though strangely, if one swaps "and" (with 700k entries) and "that" (with 200k entries), the compression rate improves. I don't know why, and it doesn't seem to hold true for other words. Maybe related to BWT's use of larger contexts..

(Reply to this) (Parent)(Thread)


[info]lhexa
2003-09-30 03:54 am UTC (link)
Well, might as well just make it an even 32767 words... I've no guess about the swapping of those two words, either.

By the way, the long delay this time was the result of the wait 'til my computer arrived here. I'll reply to the remaining posts tomorrow.

(Reply to this) (Parent)(Thread)

Re: Various events
[info]_dw
2003-09-30 08:29 am UTC (link)
Well, might as well just make it an even 32767 words... I've no guess about the swapping of those two words, either.

Hehe, only a limited set of people would call 32767 "even" :) *grins*

Actually, I made it half of that. 16384 words. Though with alternate coding methods, it would be possible to squeeze more words in the same space (and make use of the word-part-redundancy of english words), I haven't had any luck with the recognizer implementations to do such..

(Reply to this) (Parent)(Thread)

Re: Various events
[info]lhexa
2003-10-11 04:03 am UTC (link)
*chuckles*

You know, there's a lot that can be said about how, as you learn more and more, you lose the ability to use your own language completely correctly.

(Reply to this) (Parent)


Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…