I make computers do thing that humans hate doing. I try to make them make things
195 stories
·
95 followers

Announcement

5 Comments and 14 Shares

 

Hello. 


This is an announcement. 


Out of all the places I could have put an announcement, I will admit that it probably doesn't make the most logistical sense to put it here. 


If I'd had the opportunity to put it somewhere that makes more sense for announcing things, such as Facebook or Twitter, I promise you I would have done that. However—and please don't become too distracted by this—but the vast majority of my social media accounts were hacked sometime back by an extremely persistent individual or entity whom I have thus far been unsuccessful at defeating, so I do not currently have anywhere else to put this. 


Moving on. 


As we discussed, there is an announcement. Soon it will be upon us. But first, a warm-up announcement: 



I told you this because I thought it might lend some credibility to the actual announcement, which is that I wrote a second book. For real this time. The book is finished. It has 518 pages. There is no going back at this point. There's a super official book page and everything. 


As is tradition, a variety of ordering experiences are available.


For example, if you wish to be taken directly to the book page with no extra fanfare, please click the regular button:



If you wish to use a larger button to go directly to the book page, please click the big button: 



If you wish to have a more difficult experience, the hard button is for you: 



If you do not wish to interact with the book any further, please follow this button to safety: 



If you want to feel slightly weirder than you currently do before visiting the book page, please click here: 



If you wish for me to apologize for writing the book and/or for the bird collage, that option is also available:



If you just want to click a bunch of buttons, please go here: 



---------------------------------------------------------------------------------


Okay. The announcement is complete. We may now proceed to the bonus phase. 


Perhaps this event was not promotional enough for you. Perhaps you wish to be subjected to a truly unnecessary level of marketing both related and unrelated to my book. Perhaps you simply wish to experience the future in whatever form it takes. If this is the case, I have great news for you: 


I recently learned how to use Instagram, and over the next several days, I intend to explore the limits of its potential, possibly even discovering new ways of using it. I will be relentless, and you will regret becoming involved, but you do have the opportunity to become involved if you wish. It's also completely possible that I decide against this and just post extreme close-ups of my belly button. Or something else could happen. One can never know these things. 



Thank you for your time and patience. I hope you can find it in your hearts to still respect me after this. 


-Allie


Read the whole story
gms8994
1472 days ago
reply
40291
popular
1472 days ago
reply
Share this story
Delete
5 public comments
iaravps
1471 days ago
reply
aaaaaaaaaa
Rio de Janeiro, Brasil
MaryEllenCG
1472 days ago
reply
*screams*
Greater Bostonia
sandge
1472 days ago
reply
Yay!
Atlanta, GA, USA
fancycwabs
1472 days ago
reply
Look look look
Nashville, Tennessee
glenn
1472 days ago
reply
Allie Brosh is back!
Waterloo, Canada
fxer
1472 days ago
And so is Glenn!
angelchrys
1472 days ago
Hooray!
glenn
1469 days ago
I didn't know anyone noticed! actually been lurking but generally felt like I need to disconnect from things for a while

Boat Puzzle

3 Comments and 12 Shares
'No, my cabbage moths have already started laying eggs in them! Send the trolley into the river!' 'No, the sailing wolf will steal the boat to rescue them!'
Read the whole story
gms8994
1483 days ago
reply
"MY CABBAGES!"
40291
popular
1487 days ago
reply
Share this story
Delete
2 public comments
cjheinz
1488 days ago
reply
Nice!
Lexington, KY; Naples, FL
alt_text_bot
1488 days ago
reply
'No, my cabbage moths have already started laying eggs in them! Send the trolley into the river!' 'No, the sailing wolf will steal the boat to rescue them!'

Using Machine Learning to Explore Neural Network Architecture

1 Share


At Google, we have successfully applied deep learning models to many applications, from image recognition to speech recognition to machine translation. Typically, our machine learning models are painstakingly designed by a team of engineers and scientists. This process of manually designing machine learning models is difficult because the search space of all possible models can be combinatorially large — a typical 10-layer network can have ~1010 candidate networks! For this reason, the process of designing networks often takes a significant amount of time and experimentation by those with significant machine learning expertise.
Our GoogleNet architecture. Design of this network required many years of careful experimentation and refinement from initial versions of convolutional architectures.
To make this process of designing machine learning models much more accessible, we’ve been exploring ways to automate the design of machine learning models. Among many algorithms we’ve studied, evolutionary algorithms [1] and reinforcement learning algorithms [2] have shown great promise. But in this blog post, we’ll focus on our reinforcement learning approach and the early results we’ve gotten so far.

In our approach (which we call "AutoML"), a controller neural net can propose a “child” model architecture, which can then be trained and evaluated for quality on a particular task. That feedback is then used to inform the controller how to improve its proposals for the next round. We repeat this process thousands of times — generating new architectures, testing them, and giving that feedback to the controller to learn from. Eventually the controller learns to assign high probability to areas of architecture space that achieve better accuracy on a held-out validation dataset, and low probability to areas of architecture space that score poorly. Here’s what the process looks like:
We’ve applied this approach to two heavily benchmarked datasets in deep learning: image recognition with CIFAR-10 and language modeling with Penn Treebank. On both datasets, our approach can design models that achieve accuracies on par with state-of-art models designed by machine learning experts (including some on our own team!).

So, what kind of neural nets does it produce? Let’s take one example: a recurrent architecture that’s trained to predict the next word on the Penn Treebank dataset. On the left here is a neural net designed by human experts. On the right is a recurrent architecture created by our method:

The machine-chosen architecture does share some common features with the human design, such as using addition to combine input and previous hidden states. However, there are some notable new elements — for example, the machine-chosen architecture incorporates a multiplicative combination (the left-most blue node on the right diagram labeled “elem_mult”). This type of combination is not common for recurrent networks, perhaps because researchers see no obvious benefit for having it. Interestingly, a simpler form of this approach was recently suggested by human designers, who also argued that this multiplicative combination can actually alleviate gradient vanishing/exploding issues, suggesting that the machine-chosen architecture was able to discover a useful new neural net architecture.

This approach may also teach us something about why certain types of neural nets work so well. The architecture on the right here has many channels so that the gradient can flow backwards, which may help explain why LSTM RNNs work better than standard RNNs.

Going forward, we’ll work on careful analysis and testing of these machine-generated architectures to help refine our understanding of them. If we succeed, we think this can inspire new types of neural nets and make it possible for non-experts to create neural nets tailored to their particular needs, allowing machine learning to have a greater impact to everyone.

References

[1] Large-Scale Evolution of Image Classifiers, Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Quoc Le, Alex Kurakin. International Conference on Machine Learning, 2017.

[2] Neural Architecture Search with Reinforcement Learning, Barret Zoph, Quoc V. Le. International Conference on Learning Representations, 2017.
Read the whole story
gms8994
2663 days ago
reply
40291
Share this story
Delete

Automatic OCR with Hazel and PDFPen

1 Share

I have a useful scanner as part of my networked HP printer that will scan directly to a shared directory on my computer. Once there, I want the file to be renamed to the current date and the document OCR'd so that I can search it.

To do this, I use Hazel and PDFPen and this is a note to ensure that I can remember to do it again if I ever need to!

Firstly, rename the file. My scanner names each file with the prefix scan, so the Hazel rule is quite simple:

If all the following conditions are met:
	Name starts with scan

Do the following to the matched file or folder:
	Rename with pattern: [date created][extension]

This is the screenshot:

Hazel1

Having renamed the file, we can use PDFPen's AppleScript support to perform an OCR of the document:

If all the following conditions are met:
	Extension is pdf
	Date Last Modified is after Date Last Matched

Do the following to the matched file or folder:
	Run AppleScript embedded script

The embedded AppleScript is:

tell application "PDFpen"
	open theFile as alias
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	quit
end tell

This is the screenshot of it in Hazel:

Hazel2

That's it. Scanning a document now results in a dated, OCR'd PDF file in my Scans folder.

Read the whole story
gms8994
2804 days ago
reply
40291
Share this story
Delete

Jeff Bezos assures employees that HR working 100 hours a week to address their complaints

1 Comment
I see what you did there!∞ Read this on The Loop
Read the whole story
gms8994
3309 days ago
reply
God damnit... I got hooked by The Onion again!
40291
Share this story
Delete

Profiling Ag. Writing My Own Scandir

1 Share

Although I benchmarked every revision of Ag, I didn’t profile them all. After looking at the graph in my previous post, I profiled some of the revisions where performance changed significantly.

This is a run of revision a87aa8f8; right before I reverted the performance regression. You can see it spends 80% of execution time in fnmatch().

This is tagged release 0.9. Much faster, and it only spends about half the time in fnmatch().

Finally, here’s a run after merging pull request #56. This fixed issue #43 and improved performance for many cases. I’m rather proud of that pull request, since it fixed a lot of issues. The rest of this post explains the specific changes I made to get everything working the way I wanted.

To start with, I should explain Ag’s old behavior. Before I merged that pull request, Ag called scandir() on each directory. Then scandir() called filename_filter() on every entry in the directory. To figure out if a file should be ignored, filename_filter() called fnmatch() on every entry in the global char *ignore_patterns[]. This set-up had several problems:

  1. scandir() didn’t let me pass any useful state to filename_filter(). The filter could only base its decision on the dirent and any globals.
  2. ignore_patterns was just an array of strings. It couldn’t keep track of a hierarchy of ignore files in subdirectories. This made some ignore entries behave incorrectly (issue #43). This also hurt performance.

Fixing these issues required rejiggering some things. First, I wrote my own scandir(). The most important difference is that my version lets you pass a pointer to the filter function. This pointer could be to say… a struct containing a hierarchy of ignore patterns.

Surprise surprise, the next thing I did was make a struct for ignore patterns:

struct ignores {
    char **names; /* Non-regex ignore lines. Sorted so we can binary search them. */
    size_t names_len;
    char **regexes; /* For patterns that need fnmatch */
    size_t regexes_len;
    struct ignores *parent;
};

This is sort of an unusual structure. Parents don’t have pointers to their children, but they don’t need to. I simply allocate the ignore struct, search the directory, then free the struct. This is done around line 340 of search.c. Searching is recursive, so children are freed before their parents.

The final change was to rewrite filename_filter(). It calls fnmatch() on every entry in the ignore struct passed to it. If none of those match and ig->parent isn’t NULL, it repeats the process with the parent ignore struct, and so-on until it reaches the top.

All-in-all, not a bad change-set. I fixed a lot of things I’d been meaning to fix for a while. I also managed to clean up quite a bit of code. If not for my re-implementation of scandir(), the pull request would have removed more lines than it added.

One last thing: I’d like to praise a piece of software and criticize another. I tip my hat to Instruments.app. I’ve found it invaluable for finding the causes of many memory leaks and performance issues. But I wag my finger at git. Git allows .gitignore files in any directory, and it allows these files to contain regular expressions. Worse, these regexes can reference sub-directories. For example, foo/*/bar is a valid ignore pattern. Regular expressions plus directory hierarchies translate to complicated implementations and confusing behavior for users. It’s no fun for anyone involved.

Go dark.

Read the whole story
gms8994
3313 days ago
reply
40291
Share this story
Delete
Next Page of Stories